Context Window
Context window defines the maximum number of tokens a model can see in one request. For long documents, repos, or chat logs, you need to truncate, summarize, or use retrieval techniques (RAG) to fit important context.
Larger windows often cost more compute time. Claude models were early with very long windows, while ChatGPT and Gemini offer different limits by plan. Strong prompt design places the most important context first and last.
Key characteristics
- Determines how much text, code, instructions, and history the model can use in a response.
- Affects cost, latency, and how you structure long workflows or documents.
- Makes summarization, chunking, and retrieval important when material exceeds the window.