Most confusion about large language models evaporates once you stop picturing the context window as memory and start picturing it as a desk. Everything you want the model to consider has to fit on the desk at once.

The model does not remember your last message. It re-reads the entire desk every single turn.

What this changes

If the desk is finite and re-read each turn, then retrieval, summarisation, and ordering are not optimisations — they are the whole game. Put the most relevant material closest to the question.

ai