Research Architecture · 2025

HRM-RDT Hierarchical Recurrent-Depth Transformer

A fusion of OpenMythos recurrent depth and HRM hierarchical convergence —
slow abstraction and fast computation, looped inside a stable scaffold.

Input → Prelude → [ Recurrent Block: H ↔ L (×K loops) ] → Coda → Output

How HRM Works

HRM has four learnable components: an input network that encodes tokens into vectors, a low-level L module for fast detailed computation, a high-level H module for abstract reasoning, and an output network that converts the final H hidden state to output predictions.

The critical insight is the timing: the L module advances only after completing multiple computational steps and reaching a local equilibrium — at which point it is reset to begin a new phase guided by the H module's updated state. This is called hierarchical convergence: L runs fast and hard until it settles, then H takes one slow step, then L resets and runs again.

The L module functions like a standard RNN, but its hidden state updates are conditioned not just on its own previous state, but also on the H module's current hidden state — which changes much more slowly.

Module — High-Level H Module

Slow, abstract reasoning. Takes one step per full L-convergence cycle. Its zH guides the next L inner cycle.

Module — Low-Level L Module

Fast, detailed computation. Loops until local equilibrium, then resets. Conditioned on H's current hidden state at each step.

✦ OPENMYTHOS ✦

How OpenMythos Works

The full data flow: Input token IDs → Embedding → Prelude (standard transformer blocks, run once) → Recurrent Block (one TransformerBlock looped T times, with update rule: h_t+1 = A·h_t + B·e + Transformer(h_t, e)) → Coda (standard transformer blocks, run once) → RMSNorm → LM head → Output logits.

The key is that e — the Prelude's output — is re-injected at every single loop step. Without this re-injection, the hidden state would drift away from the original input signal across deep loops. Learned matrices A and B govern how much of the previous hidden state and the encoded input carry forward at each step.

To prevent residual explosion, OpenMythos enforces the spectral radius of A to be less than 1 by construction, guaranteeing stability regardless of learning rate or gradient noise.

✦ THE FUSION ✦

How the Hybrid Works

The fusion is conceptually clean: OpenMythos's single TransformerBlock inside the Recurrent Block gets replaced by HRM's H+L pair. Instead of one block looping T times, you get L looping T_L times per step, then H taking one slow step, then the whole thing looping T_outer times — with e re-injected at every inner L step for LTI stability.

The H module's zH is what guides L's next inner cycle, exactly as in the original HRM. The OpenMythos stability guarantee (spectral radius < 1) applies to the outer recurrence, keeping the entire nested structure bounded across arbitrary depth.

✦ DESIGN CHOICES ✦

Key Design Choices

Choice	Rationale
`RMSNorm` between loop iterations	Training stability per HRM paper
`AdamW` optimizer	Keeps weights bounded across deep recurrence
Cross-attention for H↔L coupling	H guides L without hard dependency
Fixed `max_loop_iters` + optional early exit	Avoids infinite loops; enables adaptive depth
Small model (<100M params)	Feasible to train on 1,000 samples
`e` re-injected at every inner L step	LTI stability; prevents input signal drift
Spectral radius of A < 1	Guarantees outer recurrence stability by construction

✦ DATASET ✦

Dataset & Status

Dataset

`Size`	1,000 samples
`Domain`	Human-based reasoning
`Format`	JSON — `{ input, output, reasoning_trace }`
`Split`	800 train / 100 val / 100 test
`Compute`	Lightning.ai · PyTorch + HuggingFace

Project Milestones

✓ Architecture designed
Prelude + Coda implementation
HRM H + L module implementation
Recurrent Block integration
Training loop
Evaluation on 1K dataset

✦ REFERENCES ✦

References

github.com/kyegomez/OpenMythos arxiv.org/abs/2506.21734 — HRM Paper github.com/sapientinc/HRM