Data-DrivenResearch

I Tested Markov Chains on 11 Crypto Assets. Here's Every Result.

May 21, 2026·20 min read
I Tested Markov Chains on 11 Crypto Assets. Here's Every Result.

Markov chains are everywhere in crypto content right now. Threads with transition matrices. Posts about "hidden regime detection." Reels with arrows between colorful boxes claiming to predict the next market move. Most of it is AI-generated, none of it is verified, and the engagement is through the roof.

That's the problem with crypto content in 2026. Everything sounds like quant finance. Nothing gets tested. Someone prompts ChatGPT for a Markov chain explainer, wraps it in a carousel, and suddenly it's gospel.

I didn't post about it. I tested it. On 11 assets. Over 2 years. With real data from Kraken. Using both observable Markov chains and quant-grade Hidden Markov Models with BIC model selection, out-of-sample validation, and walk-forward backtesting.

The results don't match the content.

> Disclosure: This study was produced by Anny, an AI-powered portfolio intelligence platform. The quantitative analysis was executed by Anny's Python engine. The article was generated with AI assistance and reviewed by an automated editorial panel. Anny builds trend-following tools — this study found that Markov chains don't improve them. All numerical claims are computed results from the stated Kraken dataset and can be independently reproduced.

The Claim vs. The Test

The version going around sounds like this: "We don't predict price. We predict which box the market is in and where that box historically leads." The idea is that crypto markets move through discrete states — strong down, mild down, neutral, mild up, strong up — and that these transitions are non-random. If the market was in "strong down" yesterday, there's a statistically significant probability it stays there or moves to a specific next state.

It sounds rigorous. It has matrices. It feels like math. It's the kind of thing AI generates really well and nobody bothers to verify.

So I built it. Discretized daily returns into 5 states for each of the top 11 crypto assets (BTC, ETH, SOL, BNB, XRP, ADA, DOGE, AVAX, DOT, LINK, SUI), counted every single transition over 2 years of daily data, and ran a chi-squared independence test on each matrix.

The 5x5 Grid: Near-Random for Almost Every Asset

Here's BTC's actual transition matrix — what happens after each type of day:

From \ ToStrong DownMild DownNeutralMild UpStrong Up
Strong Down16%29%18%19%17%
Mild Down18%16%21%25%21%
Neutral15%21%28%22%15%
Mild Up13%27%28%17%16%
Strong Up15%22%26%26%12%

Source: 720 daily BTC/USDT close prices from Kraken (2-year lookback from study date, May 2026). Returns discretized into 5 states: strong down (<-2%), mild down (-2% to -0.5%), neutral (-0.5% to +0.5%), mild up (+0.5% to +2%), strong up (>+2%).

BTC 5x5 transition heatmap — near-uniform probabilities across all states
BTC 5x5 transition heatmap — near-uniform probabilities across all states

Look at that first row. After a strong down day, there's a 16% chance of another strong down... and a 17% chance of strong up. That's noise, not signal.

The chi-squared test confirms it: p = 0.2247. Not significant. The transitions are statistically indistinguishable from random.

ETH is worse: p = 0.7662. DOT: p = 0.5839. LINK: p = 0.7219. BNB: p = 0.0983. AVAX: p = 0.2593.

Out of 11 assets, only two show statistically significant transition structure at the uncorrected p < 0.05 threshold:

AssetChi-squared p-valueSignificant (p < 0.05)?Survives Bonferroni (p < 0.0045)?
BTC0.2247NoNo
ETH0.7662NoNo
SOL0.0791NoNo
BNB0.0983NoNo
XRP0.0196YesNo
ADA0.0994NoNo
DOGE0.0376YesNo
AVAX0.2593NoNo
DOT0.5839NoNo
LINK0.7219NoNo
SUI0.0635NoNo

Source: Chi-squared independence test on 5x5 transition matrices per asset. With 11 independent tests, the Bonferroni-corrected significance threshold is p < 0.0045 (0.05/11). At the corrected threshold, zero assets show significant transition structure. Even at the lenient uncorrected threshold, only XRP (p=0.0196) and DOGE (p=0.0376) pass — the two most meme-driven, sentiment-dominated assets on the list.

Chi-squared p-values for all 11 assets — only XRP and DOGE below 0.05, none below Bonferroni threshold
Chi-squared p-values for all 11 assets — only XRP and DOGE below 0.05, none below Bonferroni threshold

After correcting for multiple testing, the observable Markov chain has no statistically significant predictive power on any major crypto asset.

"But What About Hidden Markov Models?"

Fair pushback. The version circulating online uses observable states (return buckets), which is the simplest possible implementation. Real quants use Hidden Markov Models — the market has hidden states that generate the returns we observe, and you can infer those states probabilistically.

I built that too. Gaussian HMM with:

  • Vol-normalized log returns (z-scored by 20-day rolling volatility)
  • BIC model selection testing k=2 through k=6 states
  • Baum-Welch EM fitting with 5 random restarts per k
  • Forward algorithm only (no look-ahead bias — I never use future data)
  • Out-of-sample 70/30 train/test validation

BIC-Selected State Counts

AssetOptimal StatesBICLog-Likelihood
BTC2-3173.71622.9
ETH2-2861.51466.8
SOL3-2754.51442.7
BNB2-1502.2783.7
XRP3-2710.31420.6
ADA4-2535.71369.4
DOGE3-2520.71325.9
AVAX3-2622.41376.7
DOT2-2538.11305.1
LINK4-2712.91458.0
SUI3-2569.51350.3

Source: Bayesian Information Criterion over Gaussian HMM fits (k=2..6, 5 random restarts each). Lower BIC = better model. BNB shows notably lower log-likelihood (783.7 vs 1300+) due to fewer data points from Kraken's thinner BNB/USDT order book — see data limitations below.

The BIC selected 2 states for BTC (bearish/bullish), 2 for ETH, 3 for SOL, and ranging from 2 to 4 for the other assets. Here's what BTC's hidden regime looks like:

StateMean Z-ReturnPersistenceExpected Duration
Bearish-0.01499.0%103.8 days
Bullish+0.09095.2%20.7 days

That 99% persistence on the bearish state means once BTC is classified as bearish, the model expects it to stay bearish for over 100 days. The bullish state lasts about 3 weeks on average. These are real, distinct regimes.

All 11 Assets: HMM Regime Profiles

AssetStatesBearish DurationBullish DurationBearish PersistenceBullish Persistence
BTC2103.8d20.7d99.0%95.2%
ETH271.7d20.0d98.6%95.0%
SOL320.5d21.9d95.1%95.4%
BNB222.4d149.3d95.5%99.3%
XRP331.6d26.1d96.8%96.2%
ADA429.2d10.7d96.6%90.7%
DOGE318.5d14.2d94.6%92.9%
AVAX377.5d25.0d98.7%96.0%
DOT277.6d28.7d98.7%96.5%
LINK439.5d22.9d97.5%95.6%
SUI331.0d13.8d96.8%92.7%

Source: Gaussian HMM with Baum-Welch EM, vol-normalized features. Duration = 1/(1 - persistence). "Bearish" = lowest mean z-return state; "Bullish" = highest. For 3- and 4-state models, only the extreme states are shown.

The model finds something real. The question is: can you actually use it?

The Backtest: Testing Utility, Not Signal

Here's where I went further than the content. I didn't test Markov chains as a standalone trading signal — that would be a strawman. Nobody serious uses a regime label alone to enter a trade.

Instead, I tested its utility: does adding HMM regime detection to an existing trend-following system make that system better? Does it improve entries, reduce drawdowns, or increase confidence in a way that translates to better risk-adjusted returns?

I tested 5 strategies on BTC, ETH, and SOL over 720 days:

  1. Buy & Hold — baseline
  2. Trend Only — dual-EMA (12/26) crossover, buy on Accumulate signal, sell on Distribute
  3. Trend + Wait Exit — same as above but also exit on Wait zone (ambiguous trend)
  4. HMM Regime Filtered — only take trend signals when HMM says "bullish"
  5. Confidence Scaled — scale position size by HMM confidence (0-100%)
  6. Plus confidence-gated variants at 50%, 60%, 70%, 80%, and 90% thresholds. The question wasn't "can Markov chains trade?" It was: can they make an existing strategy better?

    BTC Results (720 days)

    StrategyReturnSharpeMax DrawdownTradesExposure
    Buy & Hold+14.8%0.382-49.5%0100%
    Trend Only-15.2%-0.147-40.1%1153%
    Trend + Wait Exit+0.9%0.147-29.2%1544%
    Regime Filtered-11.3%n/a*-11.3%12%
    Confidence Scaled-10.0%-0.617-18.9%1153%

    Source: Walk-forward backtest, 720 daily bars, BTC/USDT Kraken. Gross returns (no transaction costs, no slippage). Sharpe = annualized return / annualized volatility, 0% benchmark rate. Regime Filtered Sharpe omitted — a Sharpe ratio on 1 trade (12 days of exposure) is statistically meaningless.*

    Read that regime-filtered row again. One trade in 720 days. The HMM classified BTC as bullish for only 92 out of 720 days. And the one trade it took lost 11.3%.

    The confidence-gated versions (50%, 60%, 70%, 80%) all produced the identical result: 1 trade, -11.3%. At 90% confidence threshold, zero trades — the model never reached that confidence during a trend signal.

    BTC Confidence Gating: Every Threshold Tested

    ThresholdReturnSharpeTrades
    No gate (trend only)-15.2%-0.14711
    Regime filter (any confidence)-11.3%-0.9741
    50% confidence gate-11.3%-0.9741
    60% confidence gate-11.3%-0.9741
    70% confidence gate-11.3%-0.9741
    80% confidence gate-11.3%-0.9741
    90% confidence gate0.0%0.0000

    Source: Same walk-forward backtest with confidence gate applied. The HMM's filtered probability never exceeded 90% during an active trend signal, resulting in zero trades at that threshold.

    BTC confidence gating — flat line from 50% to 80%, identical results at every threshold
    BTC confidence gating — flat line from 50% to 80%, identical results at every threshold

    The data is unambiguous: confidence gating adds nothing. The regime filter is already so restrictive that raising the bar from 50% to 80% doesn't change a single trade.

    ETH Results (720 days)

    StrategyReturnSharpeMax DrawdownTradesExposure
    Buy & Hold-43.5%-0.064-63.2%0100%
    Trend Only+5.8%0.277-51.8%1041%
    Trend + Wait Exit+16.5%0.390-44.8%1137%
    Regime Filtered-17.9%-0.739-17.9%27%
    Confidence Scaled-7.8%-0.173-29.1%1041%

    Source: Walk-forward backtest, 720 daily bars, ETH/USDT Kraken. Gross returns.

    ETH was the only asset where trend-following beat buy-and-hold (because ETH was in a sustained drawdown — the trend signal correctly kept you out). But adding the HMM regime filter destroyed the edge: from +16.5% to -17.9%.

    SOL Results (720 days)

    StrategyReturnSharpeMax DrawdownTradesExposure
    Buy & Hold-48.1%-0.014-70.3%0100%
    Trend Only-35.6%-0.238-53.0%1042%
    Trend + Wait Exit-32.1%-0.215-50.6%1337%
    Regime Filtered-30.6%-0.428-39.8%417%
    Confidence Scaled-32.4%-0.456-43.8%1042%

    Source: Walk-forward backtest, 720 daily bars, SOL/USDT Kraken. Gross returns.

    Everything lost money. The regime filter lost slightly less in absolute terms, but only because it barely traded (4 trades, 17% exposure). That's not an edge — it's sitting in cash during a drawdown and calling it a strategy.

    Strategy returns comparison — BTC, ETH, SOL across all 5 strategy variants
    Strategy returns comparison — BTC, ETH, SOL across all 5 strategy variants

    Out-of-Sample Validation

    I also ran the HMM as a standalone strategy (long in bullish state, flat otherwise) on all 11 assets with a 70/30 train/test split (~504 training days, ~216 test days per asset).

    Important caveat: With only ~216 test days, these results have wide confidence intervals. They are directional evidence, not statistically definitive proof. The purpose is to check whether the HMM adds any signal at all — not to claim precise Sharpe ratios.

    AssetHMM Strategy SharpeBuy & Hold SharpeHMM Switched States?Beats B&H?
    BTC-1.016-1.016No (stayed bullish)No
    ETH-1.274-1.274No (stayed bullish)No
    SOL-1.672-1.672No (stayed bullish)No
    BNB-0.542-1.903YesYes
    XRP-1.151-1.151No (stayed bullish)No
    ADA-1.813-1.813No (stayed bullish)No
    DOGE-0.183-1.166YesYes
    AVAX-1.494-1.494No (stayed bullish)No
    DOT1.114-1.245YesYes
    LINK-1.214-1.214No (stayed bullish)No
    SUI-1.170-1.170No (stayed bullish)No

    Source: 70/30 train/test split on 2-year daily data per asset. Strategy: long during bullish HMM state, flat otherwise. "HMM Switched States?" column explains why 8 assets show identical Sharpe to Buy & Hold.

    The column "HMM Switched States?" is the key to reading this table. For 8 out of 11 assets, the HMM never changed its mind during the test period. It classified the regime as "bullish" on day 1 and stayed there for all 216 test days, producing identical returns to buy-and-hold. The model's extreme persistence (95-99% self-transition probability) means it almost never flips — and when it doesn't flip, it provides zero information.

    For BNB, DOGE, and DOT, the model happened to switch states at useful moments. But 3 out of 11 is consistent with random chance (expect ~2-3 wins at random with any binary filter). This is not a systematic edge.

    Out-of-sample HMM vs Buy & Hold — 8 of 11 identical, 3
    Out-of-sample HMM vs Buy & Hold — 8 of 11 identical, 3 "wins" consistent with chance

    The Regime Stability Problem Nobody Talks About

    Here's something the content never mentions: HMM regimes drift.

    I ran the model on rolling 180-day windows (60-day step) and measured how much each state's mean return changed across windows. A stable model would show low variance; a drifting model means yesterday's "bullish" state has different characteristics than last quarter's "bullish" state.

    AssetStatesState Variance (min - max across states)Stability Assessment
    BTC20.15 - 0.22Moderate
    ETH20.05 - 0.14Good
    SOL30.17 - 0.33Moderate
    BNB20.02 - 0.05Good
    XRP30.05 - 0.42Mixed
    ADA40.13 - 0.84Unstable
    DOGE30.12 - 0.79Unstable
    AVAX30.05 - 0.87Unstable
    DOT20.06 - 0.21Good
    LINK40.18 - 0.77Unstable
    SUI30.20 - 0.64Unstable

    Source: Rolling 180-day HMM fits, 60-day step, 9 windows per asset (4 for BNB due to data availability). Variance measured across each state's mean z-return over all windows. Values above 0.5 indicate the model discovers fundamentally different "regimes" depending on the training window. State label matching across windows uses mean z-return ordering (lowest = bearish, highest = bullish), not label identity.

    ADA, DOGE, AVAX, LINK, and SUI have at least one state with variance above 0.6 — meaning the regime labels are unreliable across different time windows. The model finds different "regimes" depending on when you start looking. You can't build a system on states that don't persist.

    Regime stability — 5 of 11 assets exceed the 0.5 instability threshold
    Regime stability — 5 of 11 assets exceed the 0.5 instability threshold

    What Actually Worked (And It Had Nothing To Do With Markov Chains)

    The one consistent finding across all three backtested assets: exiting on the Wait zone (trend ambiguity) instead of waiting for a full trend reversal improved the Sharpe ratio.

    AssetTrend Only SharpeWait Exit SharpeImprovementMax DD Change
    BTC-0.147+0.147+0.294-40.1% to -29.2%
    ETH0.2770.390+0.113-51.8% to -44.8%
    SOL-0.238-0.215+0.023-53.0% to -50.6%

    Source: Comparison of trend-only vs. Wait-exit strategies from the same walk-forward backtests above. Both are gross returns. The improvement is consistent in direction across all three assets, though not statistically tested for significance given the small sample (3 assets, ~355 OOS days each).

    This is a dual-EMA exit refinement. It has nothing to do with Markov chains, HMMs, or hidden states. It's just: "don't wait for the trend to fully Distribute — get out when the signal enters the Wait zone."

    The only real improvement in this study came from a tighter exit rule on a simple trend indicator. No matrices. No hidden states. No AI-generated thread required.

    What Volatility Clustering Actually Tells You

    The one Markov-related property that IS real and measurable: volatility clusters. Based on 20-day realized volatility discretized into quintiles, extreme volatility persists at rates between 89% and 94% across all 11 assets. Very low volatility persists between 83% and 93%.

    AssetExtreme Vol PersistenceVery Low Vol Persistence
    BTC91%92%
    ETH93%88%
    SOL94%83%
    BNB93%93%
    XRP93%91%
    ADA94%88%
    DOGE89%87%
    AVAX91%89%
    DOT94%88%
    LINK89%89%
    SUI89%91%

    Source: 20-day realized volatility quintile transition matrices computed from 2-year daily data per asset. Persistence = self-transition probability (probability of remaining in the same volatility quintile the next day). This is a well-documented market property first described by Mandelbrot in his 1963 study of cotton price variations and later formalized in GARCH models (Bollerslev, 1986).

    Volatility persistence — extreme and very-low vol quintiles across all 11 assets
    Volatility persistence — extreme and very-low vol quintiles across all 11 assets

    Volatility clustering is real, measurable, and one of the strongest statistical properties in financial data. But you don't need Markov chains to trade it — GARCH, realized volatility, and simpler lookback approaches all capture the same effect. And the content going viral right now isn't about volatility clustering anyway.

    Cross-Asset Contagion: Does BTC Lead Alts?

    One more claim worth testing: does BTC's current state predict where alts go next day?

    AltSame-Day CorrelationBeta to BTCBTC State Predicts Alt?p-value
    ETH0.8221.25No0.5171
    SOL0.7881.38No0.3299
    DOGE0.7811.54No0.2851
    AVAX0.7411.42No0.7062
    LINK0.7261.41No0.2864
    SUI0.7011.64Yes0.0145
    ADA0.6981.51No0.2874
    DOT0.6601.24No0.2039
    XRP0.6371.18No0.2122
    BNB0.0060.01No0.5697

    Source: Chi-squared test of BTC's Markov return state vs. alt's next-day return state. Correlation = Pearson on daily returns. Beta = OLS slope of alt returns on BTC returns. Both computed on same-day pairs. With 10 tests, Bonferroni-corrected threshold is p < 0.005.

    Only SUI shows significance at the uncorrected threshold (p = 0.0145) — and it doesn't survive Bonferroni correction (needs p < 0.005). No alt has a statistically robust next-day dependence on BTC's current return state.

    Data limitation — BNB: The near-zero correlation (0.006) and beta (0.01) for BNB is an artifact of Kraken's thin BNB/USDT liquidity. On Binance's native pair, the correlation would be materially higher. This row reflects a data source limitation, not a market structure insight — I include it for transparency rather than silently dropping the outlier.

    The correlations are high for most alts (0.63 - 0.82 same-day), but BTC's current state has no predictive power over where alts go tomorrow. Same-day correlation is not next-day prediction — a distinction the content consistently blurs.

    The Verdict

    I spent two weeks building a quant-grade implementation. Observable Markov chains, Gaussian HMMs with BIC selection, walk-forward backtests, out-of-sample validation, rolling window stability analysis. 11 assets. 720+ days. Five strategy variants plus confidence gating at five thresholds.

    Here's what I found:

    1. Observable Markov chain transitions are statistically random across all 11 major crypto assets after Bonferroni correction. The chi-squared test rejects the viral claim.
    2. HMM regimes are real but don't improve existing systems. The model correctly identifies bullish and bearish states, but it identifies them too late. By the time it's confident, the move already happened. As a utility layer on top of trend-following, it subtracts value rather than adding it.
    3. HMM as a confidence filter destroys returns. Adding regime confidence to trend signals reduced BTC from 11 trades to 1, ETH from 10 to 2. A filter that eliminates almost every trade isn't adding confidence — it's adding paralysis.
    4. Confidence gating adds zero value at any threshold from 50% to 90%. The results are identical because the regime filter is already binary in practice — either the HMM agrees with your trend or it doesn't.
    5. Regime labels drift over time. Five of eleven assets show state variances above 0.6 across rolling windows, meaning the model finds different regimes depending on when you look.
    6. The only improvement came from a simpler exit rule (Wait zone exit on a dual-EMA) that has nothing to do with Markov chains.
    7. The next time you see a thread with a colorful transition matrix claiming to predict where crypto goes next — ask yourself: did they test it? Did they run a chi-squared test on those transitions? Did they backtest it out-of-sample? Did they check how many trades their regime filter actually takes?

      In a world where AI can generate a convincing quant thread in 30 seconds, the only thing that matters is whether someone ran the numbers. I did.

      The code, methodology, and raw data behind this study are available on request. If you want to replicate or extend it, reach out.


      Full Methodology and Limitations

      Data: 2 years of daily close prices for BTC, ETH, SOL, BNB, XRP, ADA, DOGE, AVAX, DOT, LINK, SUI via Kraken CCXT (approximately June 2024 - May 2026, exact dates vary by asset listing). Buy-and-hold returns over this window: BTC +14.8%, ETH -43.5%, SOL -48.1%. The window captured both rally and drawdown phases.

      Observable Markov Chain: 5 return states (strong down <-2%, mild down -2% to -0.5%, neutral -0.5% to +0.5%, mild up +0.5% to +2%, strong up >+2%). Direct sequential counting, zero look-ahead bias. Chi-squared independence test on each 5x5 matrix. Bonferroni correction applied for 11 simultaneous tests.

      HMM: Gaussian Hidden Markov Model with diagonal covariance, Baum-Welch EM fitting, 5 random restarts per k. Features: vol-normalized log returns (z-scored by 20-day rolling volatility). BIC model selection (k=2..6). Forward algorithm only for state inference (no smoothing, no future data).

      Backtests: Walk-forward HMM regime detection (365-day training window, 60-day retrain interval). Trend signal via dual-EMA (12/26) with 0.5% threshold for Wait zone. Gross returns — no transaction costs, no slippage, no leverage.

      Sharpe ratio: Annualized mean return / annualized volatility. Benchmark rate assumed 0%, consistent with common crypto backtest methodology where the flat-position alternative earns stablecoin yield (typically 3-5% APY), not sovereign bonds. Using a 4% benchmark rate would reduce all Sharpe ratios by approximately 0.15-0.20 but would not change the relative rankings or conclusions.

      Out-of-sample: 70/30 train/test split (~504 train days, ~216 test days). Strategy: long in bullish HMM state, flat otherwise. This is a limited test window — 216 days provides directional evidence but not high statistical confidence.

      Walk-forward limitation: 365-day training on 720-day total data yields only ~355 days of true out-of-sample walk-forward performance. This is a constraint of the dataset size, not the methodology. Results should be interpreted as directional rather than definitive. A longer dataset (5+ years) would provide more robust walk-forward validation.

      Rolling stability: 180-day windows, 60-day step. Measured variance of each state's mean z-return across all windows. State matching across windows uses mean z-return ordering, not label identity.

      Data limitation — BNB: Kraken's BNB/USDT pair has materially lower liquidity than Binance's native BNB/USDT, producing anomalous correlation and beta values. BNB results should be interpreted with this caveat. All other assets have adequate Kraken liquidity for daily-resolution analysis.

      All code and raw data available on request.


      This is not financial advice. Anny is not a registered investment adviser. Past performance is not indicative of future results.