LLM: RAG
Retrieval-Augmented Generation (RAG) is a technique used to equip the LLM the ability to use the updated news. It has here are several, it often use vector DB to store the LLM.
Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases.
This is good for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. It retrieving relevant document chunks from external knowledge base through semantic similarity calculation. RAG effectively reduces the problem of generating factually incorrect content.
The In-Context Learning(ICL) abilities first display in the GPT3, which enable RAG to answer more complex and knowledge intensive tasks during the inference stages. Unlike previously, which used the knowledge from RAG to fine-tuning model. Later, the enhancement of RAG began to incorporate more with LLM fine-tuning techniques.
As name indicated, there are several techniques used in the “Retrieval”, ‘Generation’ and “Augmentation”.
There are three main parts of the RAG:
- Naive RAG
- Advanced RAG
- Modular RAG
1 Naive RAG
The Naive RAG follows a traditional process that include indexing, retrieval, and generation, which is also characterized as a “Retrieve-Read” framework
- Indexing: starts with the cleaning and extraction of raw data in diverse formats like PDF, HTML, Word, and Markdown, which is then converted into a uniform plain text format. Due to the context limitation of the language models, text need to be segmented into smaller, digestible chunks. Each chunks are then encoded into vector representations using as embedding model and stored in vector database.
- Retrieval: Upon receive a query from user, the RAG system employs the same encoding model utilized during the indexing phase to transform the query into a vector representation. It then computes the similarity scores between the query vector and the vector of chunks within the indexed corpus. The system return the Top-K chunks, which show greatest similarity to the query. These chunks are then used as the expanded context in prompts.
- Generation: The query and chunks are synthesized into a prompt and sent to the LLM to generate response.
As we can see, there are several limitation of the Naive RAG:
- Retrieval: The retrieval phase struggle with precision and recall, leading to the selection of misaligned or irrelevant chunks, and missing of crucial information
- Generation: When the chunks is irrelevant to the query, the LLM might produces content not supported by the context.
- Augmentation Hurdles: Integrating retrieved information with the different task can be challenging, sometimes resulting in disjointed or incoherent outputs. The process may also encounter redundancy when similar information is retrieved from multiple sources, leading to repetitive responses.
2 Advanced RAG
This kind of the RAG introduce specific improvements to overcome the limitations of the Naive RAG.
- Retrieval: It employs pre-retrieval and post-retrieval strategies. And also incorporates the optimization methods.
- Indexing: refines its indexing techniques through the use of a sliding window approach, fine-grained segmentation, and the incorporation of meta-data.
2.1 Pre-Retrieval Process
The primary focus is on optimizing the indexing structure and the original query.
- Optimizing Indexing: enhance the quality of the content being indexed:
- Enhancing data granularity
- Optimizing index structures,
- Adding Meta-data
- Alignment optimization
- Mixed retrieval
- Optimizing Query: make the user’s original question clearer and more suitable for the retrieval task, through:
- Query rewritting
- Query Transformation
- Query Expansion
2.2 Post-Retrieval Process
Once relevant context is retrieved, it is crucial to integrate it effectively with the query. The main methods in this stage are:
- Re-rank chunks: Re-ranking the retrieved information to relocated the most relevant content to the edges of the prompt
- Context Compressing: concentrate on selecting the essential information, emphasizing critical sections, and shortening the context to be processed.
3 Modular RAG
It advances the above two RAG by incorporating diverse strategies for improving its components, such as:
- Adding a search module for similarity searches
- Refining the retriever through fine-tuning
There are several examples:
- restructured RAG
- re-arranged RAG
Modular RAG support sequential processing and integrated end-to-end training across its components
3.1 New Modules
3.1.1 Search Module
The search module adapts to specific scenarios, enabling direct searches across various sources like search engines, database, and knowledge graphs, using LLM-generated code and query languages
The RAG-Fusing addresses search limitations by employing a multi-query strategy that expands user queries into diverse perspectives, utilizing parallel vector searches and intelligent re-ranking to uncover both explicit and transformative knowledge, utilizing parallel vector searches and intelligent re-ranking to uncover both explict and transformative knowledge.
3.1.2 Memory Module
The Memory module leverages the LLM’s memory to guide retrieval, creating an unbounded memory pool that aligns the text more closely with data distribution through iterative self-enhancement.
3.1.3 Routing Module
The Routing navigates through diverse data sources, selecting the optimal pathway for a query, whether it involves summarization, database searches, or merging different information streams.
3.1.4 The Preict Module
The predict module aims to reduce redundancy and noise by generating context directly through the LLM, ensuring relevance and accuracy.
3.1.5 Task Adapter Module
The adapter module tailors RAG to various downstream tasks, automating prompt retrieval for zero-shot inputs and creating task-specific retrievers through few-shot query generationg.
3.2 New Pattern
Modular RAG offers remarkable adaptability by allowing module substitution or reconfiguration to address specific challenges. Modular RAG expands the flexibility by integrating new modules or adjusting interaction flow among existing ones, enhancing its applicability across different tasks.