RAG

RAG (Retrieval Augmented Generation) – we make a layer that improves the query by automatically adding data from some database.

2 processes:

  • fill the database
  • use

Fill the database:

  • prepare data in some format (for example, a folder with files in md or json format)
  • there’s some script that converts data to the database. The database has a field (vectors, which were text previously) by which search happens and text that will be added to the query. This can be the same thing, or they can be slightly different (for example, a file is divided into paragraphs, they become search keys, and the entire file is added to the query to provide complete context)
  • the script usually contains a specialized LLM (usually not the same one that will process the final query), which converts text into some vector (array of numbers). It’s saved in the database. As a rule, a vector database is used. It can find the proximity of one set of numbers to another. Since numbers encode some concepts (let them be explicitly unknown to us, but formed during the initial training of the LLM), in practice such search works quite well.

Use:

  • send the original query to the same LLM that was used to create the database. Get a vector.
  • Send the vector to the database and ask to return N similar records. From records take text. Remove text duplicates.
  • Add texts to the query according to some template.
  • Send the improved query to the target LLM.

In general, the approach can be considered outdated. In general, it’s better to use MCP, but for some specific scenarios it can continue to work.