Research Architecture · 2025
HRM-RDT Hierarchical Recurrent-Depth Transformer
A fusion of OpenMythos recurrent depth and HRM hierarchical convergence —
slow abstraction and fast computation, looped inside a stable scaffold.
Input → Prelude →
[ Recurrent Block: H ↔ L (×K loops) ]
→ Coda → Output
How HRM Works
HRM has four learnable components: an input network that encodes tokens into vectors,
a low-level L module for fast detailed computation, a high-level H module
for abstract reasoning, and an output network that converts the final H hidden state
to output predictions.
The critical insight is the timing: the L module advances only after completing multiple
computational steps and reaching a local equilibrium — at which point it is reset to begin a new
phase guided by the H module's updated state. This is called hierarchical
convergence: L runs fast and hard until it settles, then H takes one slow step, then L resets
and runs again.
The L module functions like a standard RNN, but its hidden state updates are conditioned
not just on its own previous state, but also on the H module's current hidden state —
which changes much more slowly.
Slow, abstract reasoning. Takes one step per full L-convergence cycle. Its zH guides the next L inner cycle.
Fast, detailed computation. Loops until local equilibrium, then resets. Conditioned on H's current hidden state at each step.
How OpenMythos Works
The full data flow: Input token IDs → Embedding → Prelude (standard transformer blocks,
run once) → Recurrent Block (one TransformerBlock looped T times, with
update rule: h_t+1 = A·h_t + B·e + Transformer(h_t, e)) → Coda
(standard transformer blocks, run once) → RMSNorm → LM head → Output logits.
The key is that e — the Prelude's output — is re-injected at every single loop step.
Without this re-injection, the hidden state would drift away from the original input signal
across deep loops. Learned matrices A and B govern how much of the
previous hidden state and the encoded input carry forward at each step.
To prevent residual explosion, OpenMythos enforces the spectral radius of A
to be less than 1 by construction, guaranteeing stability regardless of learning rate or gradient noise.
How the Hybrid Works
The fusion is conceptually clean: OpenMythos's single TransformerBlock inside the
Recurrent Block gets replaced by HRM's H+L pair. Instead of one block looping
T times, you get L looping T_L times per step, then
H taking one slow step, then the whole thing looping T_outer times —
with e re-injected at every inner L step for LTI stability.
The H module's zH is what guides L's next inner cycle, exactly as in the original HRM.
The OpenMythos stability guarantee (spectral radius < 1) applies to the outer recurrence,
keeping the entire nested structure bounded across arbitrary depth.
Key Design Choices
| Choice | Rationale |
|---|---|
RMSNorm between loop iterations | Training stability per HRM paper |
AdamW optimizer | Keeps weights bounded across deep recurrence |
| Cross-attention for H↔L coupling | H guides L without hard dependency |
Fixed max_loop_iters + optional early exit | Avoids infinite loops; enables adaptive depth |
| Small model (<100M params) | Feasible to train on 1,000 samples |
e re-injected at every inner L step | LTI stability; prevents input signal drift |
| Spectral radius of A < 1 | Guarantees outer recurrence stability by construction |
Dataset & Status
Dataset
Size | 1,000 samples |
Domain | Human-based reasoning |
Format | JSON — { input, output, reasoning_trace } |
Split | 800 train / 100 val / 100 test |
Compute | Lightning.ai · PyTorch + HuggingFace |
Project Milestones
- Architecture designed
- Prelude + Coda implementation
- HRM H + L module implementation
- Recurrent Block integration
- Training loop
- Evaluation on 1K dataset