A recent research paper found that an open-source AI system using retrieval augmentation can outperform proprietary chatbot models like OpenAI’s GPT-3.5.
The paper published on Oct. 4 by Nvidia researchers compares different techniques for handling long context in large language models (LLMs) — the key algorithms behind today’s conversational AI. One method is simply extending the context window, allowing the LLM to directly “read” more tokens of text as input and keep it in mind when producing its output. The other approach uses retrieval to provide the LLM with only the most relevant context from a large database.
Their best approach combines both techniques — a 70 billion parameter LLaMA open source model with an extended 32,000 token context window, further augmented by retrieving relevant passages from a corpus. The retriever provides context on demand, rather than the LLM having to store everything, making it more efficient.
