Insights
- Generative AI depends on ease of use, trust, and ethics. The problem is that appropriate information is often inaccessible, poorly defined, and irrelevant to the query in question.
- Effective information retrieval is crucial for a responsible AI strategy because it drives efficiencies, enhances productivity, and creates new opportunities.
- Retrieval augmented generation (RAG) is the gold standard for information retrieval. However, most approaches might not align with the specific needs of an enterprise.
- To address this, firms should fine-tune an embedding model with enterprise-specific data and integrate it with RAG.
- InfosysEM is one such solution, which builds embedding models through fine-tuning, and is a step forward toward responsible AI.
AI, especially generative, is here to stay, but its ability to deliver significant business value depends on ease of use, trust, ethics, and data governance.
Firms must trust that their AI systems work correctly, query the right data, generate appropriate information, and can explain their reasoning understandably when there's doubt.
The problem is that appropriate information is often inaccessible, poorly defined, and irrelevant to the query in question. Unless generative AI can quickly provide precise and reliable information to users, firms will miss out on the full potential of the technology, which is estimated to deliver up to a 40% increase in productivity across regions, according to upcoming research from the Infosys Knowledge Institute.
Effective information retrieval is crucial for a responsible AI strategy because it drives efficiencies, enhances productivity, and creates new opportunities.
One solution to tackling the information retrieval challenge is to build a custom generative model from the organization's data. However, this requires substantial data and is prohibitively expensive, making it impractical for many businesses. Additionally, custom generative models output hallucinations in the same way as the big public generative tools such as ChatGPT do. These drawbacks can lead to inaccurate insights and decision-making, reducing the overall reliability and effectiveness of such models.
The power of RAG, embeddings, and fine-tuning
Retrieval augmented generation (RAG) is a better solution for information retrieval and is widely regarded as the gold standard. RAG reduces reliance on static, public datasets while offering greater agility and flexibility in AI applications.
However, RAG is often used with pretrained embedding models (typically trained on large, generic bodies of text data), which, while effective, might not fully align with the specific needs of an enterprise.
To address this, firms should fine-tune an embedding model with enterprise-specific data and integrate it with RAG.
Embedding models are algorithms that convert text into numerical representations known as embeddings. These embeddings capture the semantic relationships and contextual meaning embedded within words and documents, allowing machines to understand and compare the meanings of words based on their usage in context. For instance, embeddings for phrases like "artificial intelligence" and "machine learning" would be positioned closely together in the embedding space, indicating their semantic similarity.
Embedding models can be considered a subset or a component of a regular large language model (LLM). Regular LLMs internally generate embeddings as part of their text processing pipeline. But they also contain other components that allow them to process those embeddings further and generate responses. Embedding models lack these additional LLM-specific capabilities and are limited to generating embeddings. So, embedding models are much smaller and simpler compared to LLMs and require fewer computational resources to use – one of the reasons why firms will implement this innovation into their AI strategy in future.
Fine-tuning involves taking a pretrained embedding model and adapting it with enterprise-specific data.
Fine-tuning an embedding model and integrating it with RAG aims to enhance the relevance and accuracy of search results, leveraging the strengths of both fine-tuned models and RAG: using fine-tuned models with the RAG approach improves effectiveness to meet enterprise requirements. This fine-tuning approach is an innovation we are exploring at Infosys and is probably the most effective way of enriching generative AI with novel, domain-specific, and accurate information – leading to better responsible AI outcomes.
Infosys tackles this issue for clients using InfosysEM, which builds embedding models through fine-tuning. An innovation whose specs were released in May; it's designed to improve information retrieval within an organization. The approach set out in this article can be followed to build a similar enterprise model for client-specific organizations.
A closer look at the RAG approach
RAG enhances the accuracy and reliability of generative AI models by fetching facts from external sources. It combines strengths of two approaches: retrieval, which finds relevant information from large datasets, and generation, which formulates answers based on that retrieved information.
Rather than just querying generic information, RAG brings the industry domain or context specificity into tasks such as information retrieval or semantic search.
Figure 1 provides a high-level overview of the RAG architecture, illustrating how retrieval and generation components interact.
Figure 1. RAG architecture for information retrieval
Source: Infosys
RAG operates through a series of interconnected components that collectively enhance search results:
- Document encoder: Converts documents into embeddings (numerical representations that encode semantic context and relationships of data tokens) for storage.
- Vector database: Stores these document embeddings, enabling efficient similarity searches.
- Query encoder: Converts the user's query into an embedding.
- Embedding model: The core component that transforms both documents and queries into embeddings.
- Retriever: Uses the query embedding to search the Vector DB for the most relevant documents.
- Generator: Produces coherent and contextually relevant responses based on the retrieved documents.
The embedding model supports both the document encoder and query encoder in translating information into numerical formats. This ensures the RAG system retrieves the most relevant documents, which is vital for providing accurate and helpful responses to user queries.
Limitations of RAG techniques in enterprise settings
While RAG advances information retrieval, it faces challenges in enterprise environments. Models trained on general data often miss specific terminology and context unique to an enterprise. As a result, search results lack the precision and relevance needed in an organization. This underscores the need for embedding models tailored to an enterprise's specific data and requirements.
Customizing embedding models for enterprise environments involves several approaches, such as training a new embedding model from scratch, or extending pretraining of an existing embedding model. Training from scratch means building an embedding model specifically for that organization, which requires vast amounts of annotated data and computational resources. Extending pretraining by incorporating domain-specific data into existing models like BERT (a language model that helps computers understand the meaning of ambiguous language in text by using surrounding text to establish context) also demands significant resources and may not be viable for enterprises with limited datasets.
Fine-tuning for tailored information retrieval
Fine-tuning is likely the most practical and effective solution for enterprise environments, which is why InfosysEM was developed. Fine-tuning involves taking a pretrained embedding model and adapting it with enterprise-specific data. This approach leverages transfer learning, where knowledge from a broader dataset is adapted to a smaller, domain-specific dataset. Fine-tuning enables efficient customization without the need for extensive resources. It ensures that the embedding model captures the nuances of enterprise language and context, enhancing the accuracy and relevance of information retrieval tasks.
The fine-tuning process
The process of fine-tuning an embedding model involves several steps:
- Dataset preparation: Gather and preprocess enterprise-specific data, ensuring it reflects the linguistic characteristics and terminologies specific to the organization.
- Selection of a pretrained model: Choose a pretrained embedding model suitable for the enterprise's needs, such as BERT or a domain-specific variant.
- Fine-tuning: Adapt the pretrained model using the enterprise dataset through iterative training cycles. This step allows the model to learn domain-specific patterns while retaining the foundational knowledge from the original pretraining.
- Evaluation and validation: Assess the performance of the fine-tuned model on validation datasets to ensure it meets the desired criteria for information retrieval tasks.
Figure 2 illustrates how these steps fit together and describes the processes involved in fine-tuning the embedding model using enterprise-specific datasets.
Figure 2. The fine-tuning process
Source: Infosys
Getting it right
As discussed, building enterprise-specific embedding models has many advantages. However, implementing a fine-tuned embedding model demands best practices in data handling, model selection, and model validation.
From work conducted by Infosys experts, getting information retrieval right through embedding models is as much art as science. Key considerations when building domain-specific embedding model include:
- Assess data availability: Evaluate the availability and quality of enterprise-specific data before deciding on a fine-tuning strategy.
- Choose appropriate models: Select pretrained models that fit the enterprise's domain and information retrieval needs.
- Iterative improvement: Continuously refine and optimize the fine-tuned model based on ongoing feedback and performance metrics.
The road to greater trust and reliability
In an Ipsos survey, only 37% of employees feel AI will improve their jobs, while 52% express concerns about AI products and services.
Focusing on better information retrieval will enhance AI products and build trust among enterprise users, a key pillar of any AI strategy.
Information retrieval based on fine-tuned embedding models integrated with the RAG architecture is a positive step toward responsible AI, reducing some anxiety surrounding generative AI models. And AI decisions will be comprehensible and controllable by humans, with consistent business outcomes – a true win-win.