RAG pairs two stages: retrieve relevant context from a knowledge source (typically a vector store) and generate an answer with the LLM using that context. Those vectors come from running your documents and the user’s query, through an embedding model, which turns text into dense numeric representations so semantically similar items sit near each...
