Multi-Doc AI Chat with IncarnaMind

type

status

date

slug

summary

Advantages of IncarnaMind:

Cross-document query: IncarnaMind supports multi-hop query, which can process multiple documents at the same time, rather than just querying one document at a time. This is very useful for users who need to perform complex information retrieval across multiple documents, thereby providing users with more comprehensive and integrated data information.

Adapt to complex scenarios: Traditional tools can only process a single document, but IncarnaMind breaks this limitation and is very suitable for processing complex scenarios involving multiple documents.

Sliding Window Chunking: This method dynamically adjusts the size and position of the window during the information retrieval process to ensure that both broad contextual information and detailed information are obtained. This is an improvement over the traditional fixed-block-size Retrieval Augmented Generation (RAG) method. The size and position of the information retrieval window are dynamically adjusted according to the complexity of the document content and the needs of the user query. This balances obtaining more comprehensive contextual information and fine details.

Improved information parsing: Compared with the traditional fixed block size method, this adaptive technology enables the system to better parse and understand complex documents, improving the effectiveness of information retrieval.

Multi-strategy retrieval: By integrating multiple retrieval strategies, the integrated retriever can effectively screen coarse-grained and fine-grained data in user documents.

Reducing fact hallucinations: Through a diverse range of retrieval methods, the ensemble retriever helps reduce the “fact hallucination” problem common in large language models, ensuring that the content provided is accurate and relevant.

It supports a variety of large language models, including OpenAI's GPT series, Anthropic's Claude, and open source models such as Llama2. This wide compatibility allows it to be used in different models and hardware environments, providing greater flexibility and choice.

Optimized performance: Especially optimized for the Llama2-70b-chat model, which performs well in inference and security, but also requires higher GPU resources.

IncarnaMind allows users to use local quantized models (such as the GGUF version of Llama2), which not only improves data privacy, but also allows them to run without relying on external APIs, reducing dependence on cloud resources.

IncarnaMind provides effective solutions to several pain points of traditional document retrieval tools (such as the limitations of fixed blocks and the trade-off between accuracy and semantic understanding), enhancing the accuracy and practicality of document queries.

IncarnaMind’s technical approach can be broken down into the following key steps and components:

User input: Users ask questions in the chat box, such as "What is the difference between this paper and the GPT paper?" The system will first record the user input and determine the documents to be retrieved based on the input content.

First Ensemble Retriever: The system uses the initial retriever to obtain fragments of relevant documents. This stage of retrieval will search for relevant content fragments in multiple preset documents based on the question entered by the user.

Adaptive Chunking: Before the Second Ensemble Retriever, the system uses sliding window chunking technology to segment the document fragments obtained from the initial retrieval. This chunking process dynamically adjusts the window size and position according to the complexity and context of the document content to ensure that relevant information can be obtained more accurately in subsequent retrieval.

Objective: Through chunking technology, we can better balance the retrieval of fine-grained information and semantic context, so that the system can not only answer simple questions, but also handle complex questions involving multiple documents and requiring cross-context.

Second Ensemble Retriever: After the sliding window is segmented, the system performs a more detailed secondary search. This stage of search is more in-depth and can extract the most relevant details from the segmented fragments based on the user's question.

Final answer generation: By combining the results of the initial and secondary searches, the system generates a final answer. For example, in the architecture diagram, the system can extract information from multiple related documents and generate an answer that includes comparisons and summaries.

Document indexing: During the retrieval process, the system records the index location of each fragment so that it can accurately reference the information in the original document when generating answers. This approach ensures that the generated answers are highly accurate and relevant.

💡

Scalability Challenges During Peak Streaming Times? Handle increased load