Large language model (LLM) applications often reuse previously processed context, such as chat history and documents, which in troduces significant redundant computation. Existing LLM serving systems ...