Show HN: LLM-Infra-Lab – A minimal, reproducible lab for LLM systems

1 points by Sai-HN an hour ago

I built this because most LLM infra repos are either huge (impossible to learn from) or too toy-like to reflect how real inference systems behave. There was no middle ground for people who want to understand KV caching, batching, routing, sharding, or scaling—without needing a cluster or reading 5k-line abstractions.

LLM-Infra-Lab provides small, readable, reproducible demos of real infra primitives (vLLM-style KV cache mock, batching simulator, minimal router/workers, JAX pmap model, etc.), all runnable on CPU/Colab. It’s meant as a learning & experimentation lab for anyone who wants to see how LLM systems actually work under the hood.

Happy to answer questions or add modules people request.