Loading...
Loading...
Interactive reference for open-weights language model architectures. Every model has a full transformer diagram, live tensor shape tracking, and a KV cache memory calculator that responds to context length in real time. Architecture specs verified against the original paper, HuggingFace config.json, and a secondary reference for every model.
Every modern open-weights LLM (GPT, Llama, Mistral, Qwen, Gemma, DeepSeek) shares the same decoder-only transformer skeleton: stacked blocks of normalization + attention + feedforward, wired with residual connections. Models differ in the choices inside each slot: the attention variant (MHA vs GQA vs MoE), the positional encoding (learned vs RoPE), the normalization (LayerNorm vs RMSNorm), the activation (GELU vs SwiGLU), and the scale of each dimension. Pick a model below to see its exact wiring.
Auto-generated diff, radar chart, KV cache scaling, and “when to pick which” decision matrix.
MHA, GQA, MQA, MLA, sliding window, DeepSeek sparse attention. Live KV cache and FLOPs math with a toggle and sliders.