Layered Prefill [Code]

2025.06 - 2025.10

Layered Prefill is a open-source LLM serving framework which changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. Built on top of vLLM, with more detailed explanations available in this publication.