Assess Backtest Risk and Performance Metrics with Pyfolio
4 min read
Core idea
A trading strategy is a probability distribution dressed as an equity curve, and any single number you pull off the curve will mislead you. The Sharpe ratio tells you about volatility but ignores drawdowns. Max drawdown ignores risk-adjusted return. Either viewed alone causes the wrong decision. Pyfolio Reloaded — which sits on top of empyrical-reloaded and consumes Zipline backtest output — exists to make the composite view cheap to produce: return analytics, drawdown analytics, rolling-risk analytics, exposure analytics, and trade-level analytics, all from one backtest pickle and a benchmark.
Author's framing: No single risk or performance metric tells the entire story. The composite view across multiple metrics is what reveals how a strategy actually behaves under different market regimes.
Why it matters
Strategy performance shifts by regime
A strategy that earns a 1.8 Sharpe over a five-year backtest may have a 2.4 Sharpe in the 2017 bull period and a 0.3 Sharpe through the 2018 selloff. Annual-summary statistics hide this. Pyfolio's rolling-window plots (plot_rolling_volatility, plot_rolling_sharpe) and per-period breakdowns (plot_monthly_returns_heatmap, plot_annual_returns) surface the regime sensitivity that single numbers conceal. The live_start_date argument splits metrics into backtest-only and post-deployment buckets — the most honest measure of whether a strategy still works.
Drawdown is the survival metric, not the performance one
Annual return tells you what you might earn. Max drawdown tells you what you might quit at. A 50% drawdown — even on a strategy that ultimately doubles — is psychologically unsurvivable for most operators and structurally unsurvivable for leveraged ones. plot_drawdown_periods, plot_drawdown_underwater, and show_worst_drawdown_periods together produce the answer to "how bad does it get, how long does it stay bad, and how long until it recovers?"
Exposure and sector concentration explain returns
Two strategies can have identical equity curves and radically different risk profiles if one is 200% net long and the other is market-neutral. plot_gross_leverage, plot_exposures, plot_holdings, show_and_plot_top_positions, and plot_sector_allocations decompose the returns into what was held — the portfolio composition that produced them. Sector mapping (built from OpenBB Platform screener data) lets you ask whether your "stock-picking" alpha was secretly a sector bet.
Trade-level analysis reveals the real distribution
Round-trip extraction — pairing each opening transaction with its closing counterpart — turns the time-series of P&L into a distribution of individual bets. extract_round_trips, print_round_trip_stats, and plot_round_trip_lifetimes answer questions strategy-level metrics cannot: What's the win rate? The profit factor? The average holding period by sector? Is the strategy actually 100 mediocre trades or 5 lucky ones?
Key takeaways
Mental model
Practical application
The Pyfolio workflow has a fixed shape; only the inputs change between strategies.
-
Prepare the triplet. Load the Zipline pickle with
pd.read_pickle. Callpf.utils.extract_rets_pos_txn_from_zipline(perf)to getreturns,positions, andtransactions. Replace Equity objects in the transactions DataFrame'ssymbolcolumn with their string representations using.apply(lambda s: s.symbol). -
Acquire the benchmark and sector map. Pull SPY (or another benchmark) historical prices via OpenBB, compute percent changes, localize to UTC, and align to the
returnsindex. Use the OpenBB equity screener (obb.equity.profile) on the position symbols to build a{symbol: sector}dictionary, marking missing sectors as "Unknown". -
Run the return analytics.
plot_rolling_returns,show_perf_stats(withlive_start_date),plot_monthly_returns_heatmap,plot_annual_returns,plot_returns,plot_return_quantiles. Passlive_start_dateconsistently so the backtest-vs-live split is honest. -
Run the drawdown and rolling-risk analytics.
plot_drawdown_periods(top=10),plot_drawdown_underwater,show_worst_drawdown_periods. Layer onplot_rolling_volatilityandplot_rolling_sharpeto see regime shifts. Useextract_interesting_date_rangesto overlay known stress windows. -
Run the exposure analytics.
plot_holdings,plot_long_short_holdings,plot_gross_leverage,plot_exposures,show_and_plot_top_positions,plot_sector_allocations(consuming the sector map). -
Run the trade-level analytics. Extract round trips from
transactions, thenprint_round_trip_statsandplot_round_trip_lifetimes. Re-run with sector-grouped round trips to see which sectors produced the bets.
Example
Imagine a long-short equity strategy that earns 14% annualized with a Sharpe of 1.2 over a five-year backtest. The numbers are decent. Then you run the Pyfolio tearsheet.
The rolling Sharpe shows the strategy earned 80% of its alpha in 2017 — and was flat-to-negative for the eighteen months that followed. The plot_drawdown_underwater plot reveals a 22% drawdown that took fourteen months to recover. The exposure plot shows gross leverage drifting from 1.5x at the start to 2.3x by year three — return per unit of risk is actually deteriorating. The sector allocation plot shows 38% of the book concentrated in Technology by month 60. Round-trip stats show a 47% win rate but a 1.8 profit factor — the wins are big and rare, the losses small and frequent; that's a "lottery ticket" pattern that breaks down when the lottery stops paying.
The strategy isn't necessarily bad. But the composite view shows it's a Tech-momentum bet that worked in one regime and creeps in leverage. That's a very different decision than "1.2 Sharpe, ship it." The composite analysis turned a number into an explanation.
Related lessons
Related concepts
- Tearsheetlinked concept
- Drawdownlinked concept
- Sharpe Ratiolinked concept
- Rolling Metricslinked concept
- Round-Trip Analysislinked concept
- Risk Managementlinked concept