Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) #

Retrieval-Augmented Generation (RAG) is a system design pattern that improves an LLM’s answers by:

Retrieving relevant information from an external knowledge source, and then
Augmenting the LLM prompt with that retrieved context before generating the final response.

RAG helps an LLM look things up first, then answer using evidence.

RAG is commonly used when:

RAG does not change the model weights.
It changes what the model sees at inference time by adding retrieved context.