Build Alpha Factors for Stock Portfolios

8 min read

Core idea

A trading edge is a persistent inefficiency you can name

A trading "factor" is a measurable characteristic of an asset that systematically predicts return — independent of, or partly orthogonal to, the broad market. The classical examples are the Fama-French three: market excess return, size (small minus big), value (high book-to-market minus low). Modern quant practice has added dozens more: momentum, quality, volatility, profitability, low-beta. The factor zoo is real, and most factors don't survive out-of-sample. But the few that do are what generate risk-adjusted return.

This topic teaches the operational toolkit for factor work: how to find latent factors (PCA), how to measure exposure to a known factor (linear regression / OLS), how to hedge out unwanted factor exposure (short the benchmark), how to assess predictive power (Spearman rank correlation between factor and forward returns), and how to rank a universe by a custom factor in production (Zipline Reloaded's Pipeline API). Each technique is a Python idiom (sklearn.decomposition.PCA, statsmodels.OLS, scipy.stats.spearmanr, zipline.pipeline.CustomFactor), but the deeper subject is the workflow they compose into.

Five recipes, one funnel

The five recipes are not five independent topics — they are five stages of one funnel:

  1. PCA identifies how many independent risk factors the universe has, regardless of whether they have economic names.
  2. OLS regression measures the sensitivity (beta) of a portfolio to a named factor like the market.
  3. Fama-French regression repeats the OLS exercise against the canonical academic factors (SMB, HML).
  4. Custom factors (Parkinson volatility) test whether a proposed factor has predictive power, via the Spearman rank correlation between the factor and forward returns.
  5. The Pipeline API turns a factor that survived steps 1–4 into a portfolio-building rule: rank the universe by factor, long the top decile, short the bottom decile.

Skipping a stage is what produces bad strategies. A factor that survives steps 1–4 deserves a Pipeline; a factor that doesn't, doesn't.

Why it matters

Diversification only works against the factors you've identified

A naive portfolio of "eight different stocks" looks diversified until you discover all eight are gold miners, all eight are loaded on the same factor, and they all draw down together when gold sells off. PCA is the mechanical answer to "what am I really holding?" It decomposes the covariance matrix into orthogonal components; the first principal component is, in most equity portfolios, just the market; the next few are typically sectors or styles. If 70% of your portfolio's variance is explained by the first two components, you have less diversification than you think.

A factor that doesn't survive the rank-correlation test should not get capital

The cookbook's framing is direct: before you build a strategy around a factor, you must verify that stocks with high factor values tend to have high forward returns (or low — direction matters but the test is the same). The right tool is the Spearman rank correlation between the factor and the forward return. A correlation of 0.04 with a p-value of 0.02 (the cookbook's Parkinson-volatility result) is statistically significant but economically weak; a correlation of 0.15 is the kind of signal that survives transaction costs. Without this gating step, quants build elaborate strategies on top of factors that don't predict anything.

Key takeaways

Mental model

The factor research funnel

The factor research funnel

PCA: the orthogonal-factor view

PCA on a returns matrix produces a set of orthogonal vectors (principal components), each a linear combination of the original assets, ordered by how much variance they explain. The economic interpretation is loose but useful:

  • PC1 is almost always the market — every stock loads positively on it because every stock moves with the index.
  • PC2 is typically the dominant style or sector contrast (growth vs value, defensive vs cyclical).
  • PC3+ become harder to name but still represent real risk premia.

The factor exposures (pca.components_) tell you how much each asset loads on each component. Plotting the loadings on PC1 vs PC2 reveals clusters — a sector or factor pull on each pole. The cookbook's mining-vs-healthcare example produces a textbook PC2 contrast: positive loadings for the gold miners, negative for healthcare, suggesting PC2 is "commodity-like vs. defensive-growth."

OLS for beta and the hedging identity

The factor-model identity:

R_portfolio = α + β × R_benchmark + ε

is fitted by statsmodels.OLS(Y, sm.add_constant(X)).fit(). The slope gives β; the intercept gives α; the residual ε is the part the model can't explain. To hedge the benchmark exposure, short β units of it. The new portfolio's return is α + β × R_benchmark + ε − β × R_benchmark = α + ε — the systematic factor is gone, the alpha (and the residual noise) remain.

This is the foundation of long/short equity. You take the long exposure you want and short enough of the benchmark to cancel out the market beta.

The information coefficient — Spearman rank correlation

The "information coefficient" (IC) of a factor is the Spearman rank correlation between today's factor value and tomorrow's (or T-day forward) return. A factor with IC = 0.05 is weak but tradeable if costs are low. A factor with IC = 0.20 is exceptional. A factor with IC = 0.00 is noise.

Spearman (rank) rather than Pearson (linear) because finance data has heavy tails — a single outlier can dominate a Pearson correlation; ranks are insensitive to magnitude. This is the standard quant practice for the same reason.

The Pipeline API — research to production

A Zipline Pipeline is a declarative specification: here is the factor (a CustomFactor subclass with inputs, window_length, and compute()); here are the columns to output (raw factor, top decile, bottom decile, rank); here is the screen (only assets in the top 100 by dollar volume). The engine computes this efficiently across thousands of securities, day by day. The output is a MultiIndex DataFrame indexed by (date, asset) — directly consumable by Alphalens for performance analysis or by a Zipline algorithm for live trading.

The mental shift is from imperative ("loop over each ticker each day, compute the factor") to declarative ("here is the factor's data dependencies, the engine will figure it out"). The latter scales; the former does not.

Practical application

A complete factor study, step by step

For a quant proposing a new factor (call it X), the discipline is:

  1. Construct X for every stock in the universe over a multi-year history. Store as a MultiIndex DataFrame (symbol, date).
  2. Sanity check — plot the distribution of X. If it's bimodal or wildly heavy-tailed, you may need to winsorize or rank-transform.
  3. Compute forward returns at multiple horizons (1d, 5d, 21d, 63d) — does the factor predict next-day moves or next-quarter moves?
  4. Spearman IC between X today and forward returns at each horizon. Plot the IC distribution by month — is it stable, or does it disappear in some regimes?
  5. Quantile sort — bucket assets by X into quintiles or deciles each day. Plot the average forward return per quintile. A monotone pattern (Q1 < Q2 < ... < Q5) is what you want; a flat or non-monotone pattern is a kill signal.
  6. Construct a long/short portfolio: long Q5, short Q1, equal-weight, rebalanced periodically. Compute the gross Sharpe ratio.
  7. Subtract transaction costs — at realistic levels (10 bps for liquid US equities, more for smaller markets) — and compute net Sharpe. If net Sharpe is still positive after a 5-year out-of-sample window, you have a candidate factor.
  8. Build a Pipeline that produces the factor and ranks; wire it into a Zipline backtest.

This is the routine that turns "I have an idea" into "I have a tradable signal," and it composes directly out of the topic's primitives.

Parkinson volatility — a worked custom factor

The topic uses Parkinson volatility as its custom-factor example because it illustrates two things at once: a better estimator (uses high and low, not just close) and the workflow (compute → forward returns → Spearman test). The formula:

σ²_Parkinson = (1 / (4 ln 2)) × [ln(H/L)]²

then rolled over a 14-day window, square-rooted, annualized by √252, and finally normalized (subtract mean, divide by standard deviation) so it's directly comparable across stocks.

The cookbook finds a small but statistically significant negative IC for Parkinson volatility vs. one-day forward returns — consistent with the well-documented "low-volatility anomaly" (calmer stocks outperform riskier ones on a risk-adjusted basis, in defiance of the textbook risk-return trade-off).

Example

Consider a quant proposing the "earnings quality" factor: a metric combining net income, free cash flow, and accruals, normalized into a single score per stock per quarter. The trader wants to know: does this factor predict the next-quarter return?

The complete study, using this topic's tools:

  1. Build a MultiIndex DataFrame indexed by (symbol, fiscal_quarter) with one column per input (net income, FCF, accruals) for every S&P 500 constituent over fifteen years.
  2. Compute the score as a function of the inputs, then standardize each quarter cross-sectionally (z-score within the quarter so different macro environments don't dominate).
  3. Compute next-quarter return for each (symbol, quarter) via groupby(level="symbol").close.shift(-1) / close - 1.
  4. Compute Spearman IC per quarter. Plot the time series of quarterly ICs.
  5. Sort into quintiles each quarter, hold equal-weighted, compute the long-Q5–short-Q1 return series.
  6. Run an OLS of the long/short return against the Fama-French three factors. A statistically significant positive intercept after factor controls is the headline result — your factor produces alpha that the canonical factors don't explain.
  7. Build a CustomFactor in Zipline that computes the score from the same inputs. Wire it into a Pipeline that screens for top-100 liquidity, ranks by the score, and emits long/short flags for the top/bottom 50.

If the alpha survives steps 4-6 and the Pipeline produces sensible rankings in step 7, the quant has a candidate ready for the backtesting topics that follow.

Continue exploring

Tags