Comparison of RAG Implementations on AWS: Bedrock Knowledge Bases vs. Amazon Kendra with Detailed Step-by-Step Implementations and Guidance
Introduction to RAG and Amazon Bedrock
Retrieval-Augmented Generation (RAG) is an AI framework designed to enhance the capabilities of Large Language Models (LLMs) by incorporating external knowledge sources into the response generation process. Unlike traditional fine-tuning, which requires extensive preprocessing and training on specific datasets, RAG dynamically retrieves relevant information from a knowledge base during query processing. This method allows LLMs to access up-to-date and context-specific information, thereby improving the accuracy and relevance of the generated responses.
Amazon Bedrock is a managed AI service from AWS that provides access to leading LLMs, including those from Anthropic and other AI companies. Bedrock supports both fine-tuning and RAG, enabling users to implement sophisticated AI solutions with minimal overhead. Bedrock offers two primary methods for implementing RAG: using Bedrock Knowledge Bases and custom integrations with Amazon Kendra.
Key Concepts and Terminology
1. Retrieval-Augmented Generation (RAG)
RAG combines two main components:
- Retriever: Responsible for fetching relevant documents or information from a knowledge base.
- Generator: An LLM that uses the retrieved information to generate contextually appropriate responses.
2. Vector Database
A vector database stores embeddings, which are vector representations of textual data. These embeddings are used to find the similarity between a user’s query and stored documents.
3. Amazon Kendra
Amazon Kendra is an intelligent search service powered by machine learning. It indexes documents from various data sources and allows users to search using natural language queries.
Overview of Architectures
Bedrock Knowledge Bases
- Fully Managed RAG Workflow: Automates data ingestion, retrieval, prompt augmentation, and citation generation.
- Data Storage: Uses Amazon S3 for data storage.
- Embedding Models: Utilizes models like Titan Embeddings G1 to generate vector embeddings.
Amazon Kendra-Based RAG
- Advanced Search Capabilities: Uses natural language processing to retrieve relevant documents.
- Data Source Integration: Connects to various data sources, including S3, to index and retrieve documents.
- Custom Workflow: Requires custom setup using AWS Lambda, LangChain, and other AWS services.
Approach 1: Implementing RAG with Bedrock Knowledge Bases
Detailed Step-by-Step Implementation
1. Data Preparation and Upload to S3
The effectiveness of your RAG system hinges on the quality and relevance of your data. This step is crucial for ensuring that the information your LLM draws upon is accurate and up-to-date.
Gather all relevant documents in text format. For building a knowledge base, you need to collect all necessary documents. This can include product manuals, troubleshooting guides, FAQs, and any other text-based resources relevant to your use case.
Example: If you are creating a knowledge base for customer support, collect documents such as user manuals, common issue guides, and FAQs.
Data Gathering:
- Identify and collect all pertinent information related to your specific use case. This might involve:
- Scraping: Programmatically extracting information from websites (e.g., using libraries like Beautiful Soup or Scrapy).
- API Integration: Pulling data from external sources via their APIs.
- Manual Curation: Consolidating documents from internal repositories.
Data Cleaning:
- This is a critical step to ensure data quality. It involves tasks such as:
- Removing Duplicates: Eliminate redundant information.
- Correcting Errors: Fix typos, inconsistencies, and formatting issues.
- Standardization: Ensure consistent formatting and terminology across documents.
- Filtering: Remove irrelevant or outdated content.
Data Transformation:
- Convert data into a format suitable for S3 storage (e.g., plain text (.txt) or markdown (.md)). You might use libraries like Pandas or NLTK for text processing and conversion.
Code Sample (Python — Data Cleaning):
import pandas as pd
# Load data from CSV (replace with your actual data source)
df = pd.read_csv('data.csv')# Drop duplicates and handle missing values
df.drop_duplicates(inplace=True)
df.fillna('', inplace=True) # Save as text file
df.to_csv('cleaned_data.txt', index=False, header=False)
Create S3 Bucket:
Create a new S3 bucket for storing your documents. Explanation: An S3 bucket is needed to store the documents that will be used by the knowledge base.
Steps:
- Navigate to the AWS Management Console.
- Open the S3 service.
- Click on “Create bucket.”
- Choose Region: Select the same AWS region where your Bedrock models reside for optimal performance.
- Enter a unique bucket name
- Naming Convention: Adopt a clear and descriptive naming convention for easy identification.
- Permissions: Configure bucket permissions to ensure secure access to your data.
- Complete the remaining configurations and click “Create bucket.”
Upload Data:
Upload the prepared text files to the newly created S3 bucket. Explanation: Once the bucket is created, you need to upload your documents to this bucket so that they can be accessed by Bedrock.
You can use the AWS Management Console, AWS CLI, or SDKs (Boto3 for Python) to upload your files:
Steps:
- Open the S3 bucket you just created.
- Click on the “Upload” button.
- Drag and drop your text files or use the file selector to upload your documents.
- Click “Upload” to complete the process.
import boto3
import os
# Initialize the S3 client
s3_client = boto3.client('s3') # Define the bucket name
bucket_name = 'your_bucket_name' # Upload files to S3
for document in documents:
file_path = os.path.join(directory, document)
s3_client.upload_file(file_path, bucket_name, document)
print(f'Uploaded {document} to S3 bucket {bucket_name}')
2. Creating a Bedrock Knowledge Base
This step involves accessing the Bedrock Console where you will set up and manage your knowledge base.
Steps:
- Open the AWS Management Console.
- Navigate to the Bedrock service.
- Select “Knowledge Base” from the Orchestration menu.
- Click on “Create knowledge base.”
- Name and Description: Choose a descriptive name and provide a concise summary of the knowledge base’s purpose.
IAM Role:
- Create a new role or select an existing one with the following permissions. The IAM role must have permissions to read from the S3 bucket and write to the vector database.
- bedrock:InvokeModel
- s3:GetObject, s3:ListBucket
- Permissions for your vector database (e.g., es:ESHttpPost, es:ESHttpPut for OpenSearch)
Data Source:
Specify the S3 bucket where you uploaded your data. The documents in the S3 bucket will be used as the data source for the knowledge base.
- Select an embedding model. The embedding model, such as Titan Embeddings G1 model, will generate vector embeddings for the uploaded documents.
- Choose to create a new vector store or use an existing one, and configure field mappings.
Vector Database considerations:
- Managed by Bedrock: Easiest option, suitable for most use cases. Currently, Bedrock managed vector store options are Amazon OpenSearch and Pinecone.
- Existing Vector Database: Provides more flexibility but requires additional setup and configuration.
- After reviewing the summary of your Knowledge Base settings, click “Create knowledge base” to finalize the setup.
3. Sync Data Source and Model Access
In this step, verify that the selected embedding model has the necessary permissions, and ensure the embedding model is accessible.
Steps:
- Go to the Bedrock Console.
- Navigate to “Model Access” and verify that the selected embedding model has the necessary permissions.
- Model Access: Ensure the embedding model you selected is accessible to Bedrock.
- In the “Model Access” section in the Bedrock Console, grant access if needed.
Sync Data Source:
- Go to the Knowledge Base management page.
- Click on the “Sync” button to start the data synchronization process.
- Sync Data: Initiate the data sync process. Bedrock will read your data, generate embeddings, and store them in the vector database. This might take some time, depending on your dataset’s size.
4. Testing and Querying the Knowledge Base
Knowledge Base testing interface in Bedrock Console
In this step, we set up and configure search strategies (default, hybrid, semantic) to optimize retrieval based on different use cases, and test the Knowledge Base by submitting queries and observing the retrieved information and generated responses.
Steps:
- Go to the Knowledge Base settings.
- Configure the search strategy that best fits your use case (default, hybrid, or semantic).
Search Strategy:
- Default: Bedrock’s automatic selection, suitable for general use.
- Hybrid: Combines semantic and keyword matching, offering a good balance.
- Semantic: Focuses on meaning, useful when dealing with ambiguous queries.
Run Tests:
- Here we formulate various questions to test your RAG system thoroughly and examine the retrieved information and generated responses to assess the accuracy and relevance.
- Use the test interface in the Bedrock Console to submit sample queries.
- Observe the results and fine-tune the search strategy if necessary.
5. Generate Responses
In this step, we select an LLM model such as Anthropic Claude 2.1, or Amazon Titan LLM to generate responses based on the retrieved information.
Note: The embedding model: ‘Titan Embeddings G1’ in combination with the vector database: ‘Amazon OpenSearch’, or ‘Pinecone’ are fully compatible with both Anthropic Claude 2.1, and the Amazon Titan LLM on Amazon Bedrock.
Steps:
- LLM Selection: Choose the most appropriate LLM model for your use case. Consider factors like performance, cost, and the model’s specific strengths.
- From the Bedrock Console interface, submit queries to the LLM and evaluate the responses for relevance and accuracy.
- Query Refinement: Iterate on your queries based on the initial results. Adjust your search strategy or prompt template if necessary to improve response quality.
Additional Tips and Considerations
- Document Structure: Organize your documents for better retrieval. Use clear headings, sections, and formatting to make it easier for the RAG system to locate relevant information.
- Data Versioning: Keep track of your data updates and consider versioning to revert to previous versions if needed.
- Monitoring: Monitor your RAG system’s performance regularly to identify any issues or areas for improvement.
- Feedback Loop: Gather feedback from users to refine your queries, prompt templates, and overall system behavior.
Pros and Cons of Bedrock Knowledge Bases
Pros:
- Fully Managed Service: Reduces operational overhead by automating data ingestion, retrieval, prompt augmentation, and citation generation.
- Simplicity: Easy to set up and use, requiring minimal custom code and configuration.
- Scalability: Handles large volumes of data efficiently and scales automatically as the data grows.
Cons:
- Performance Latency: Querying the vector database in real-time can introduce latency in response generation.
- Limited Flexibility: Customizing the data ingestion process or integrating non-S3 data sources may be challenging.
Use Cases:
- Customer Support: Quickly set up a knowledge base to handle customer queries using existing support documentation.
- Internal Knowledge Management: Create a centralized knowledge base for employees to access organizational information and documentation.
- Education and Training: Provide students or trainees with access to a wealth of information, allowing them to query and receive relevant responses.
Approach 2: Implementing RAG with Amazon Kendra
Detailed Step-by-Step Implementation
1. Setup and Configuration
Install Required Libraries:
- Install the necessary libraries (langchain_core, langchain_community, openai) to support the RAG implementation.
- Use pip to install these packages in your development environment.
!pip install openai --quiet
!pip install langchain --quiet
!pip install langchain_core --quiet
!pip install langchain_openai --quiet
!pip install langchain_community --quiet
Load Configuration:
- Load the configuration details from a JSON file, which includes AWS region names, Kendra index ID, and other credentials.
def load_config(file_path="config.json"):
with open(file_path, "r") as file:
config = json.load(file)
return config
config = load_config()
bedrock_region_name = config['BEDROCK_REGION_NAME']
kendra_region_name= config['KENDRA_REGION_NAME']
kendra_index_id = config['KENDRA_INDEX_ID']
2. Configure Amazon Kendra
Create Kendra Index:
- In the Kendra console, create a new index and configure it to connect to the S3 bucket where your documents are stored.
- Schedule regular syncs to keep the index updated with new or modified documents.
3. Implement the Retriever
Setup Amazon Kendra Retriever:
- Use the AmazonKendraRetriever class from the LangChain library to interface with the Kendra index.
- Configure it to retrieve the top k relevant documents for a given query.
from langchain_community.retrievers import AmazonKendraRetriever
retriever = AmazonKendraRetriever(
index_id=kendra_index_id,
top_k=3,
region_name=kendra_region_name
)
Perform Semantic Search:
- Test the retriever by performing a semantic search with sample queries.
relevant_docs = retriever.get_relevant_documents("What is Hiberus Xalok?")
print(relevant_docs)
4. Configure the Generator (LLM)
Select LLM Model:
- Choose an LLM, such as Anthropic Claude 2, and configure it using the Bedrock API.
from langchain_community.llms import Bedrock
llm = Bedrock(
region_name = bedrock_region_name,
model_kwargs = {
"max_tokens_to_sample":300,
"temperature":1,
"top_k":250,
"top_p":0.999,
"anthropic_version":"bedrock-2023-05-31"
},
model_id = "anthropic.claude-v2"
)
5. Create a Prompt Template
Design Prompt Template:
- Use PromptTemplate from LangChain to create a template that includes instructions, context from the retrieved documents, and the user’s query.
from langchain_core.prompts import PromptTemplate
custom_template = """Act as an AI assistant named HiberusAI.
Use the following information to answer the question at the end. If you don't know the answer, respond with "I'm sorry, I don't have enough information". If the question is in Spanish, answer in Spanish. If the question is in English, answer in English. If the question is in French, answer in French.{context}{question}
"""prompt = PromptTemplate(template=custom_template, input_variables=["context", "question"])
6. Combine Retriever and Generator
Chain Retriever and Generator:
- Use the RetrievalQA class to combine the retriever and the LLM, allowing the system to generate responses based on retrieved documents.
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt},
)
Follow-Up Questions:
- Implement follow-up question handling using ConversationBufferMemory to maintain the context of the conversation.
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, output_key='answer', k=1)qa_chain = ConversationalRetrievalChain.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
memory=memory,
chain_type_kwargs={"prompt": prompt},
)
7. Deploy with AWS Lambda
Create AWS Lambda Function:
- Develop and deploy a Lambda function to handle RAG queries. Ensure it includes all necessary dependencies and configurations.
Invoke Lambda Function:
- Write a function to invoke the Lambda function from your application, such as a Streamlit web app.
import json
import boto3
def invoke_lambda_function(payload, function_name):
lambda_client = boto3.client('lambda', region_name='eu-west-1')
try:
response = lambda_client.invoke(
FunctionName=function_name,
InvocationType='Event',
Payload=json.dumps(payload)
)
if response['StatusCode'] == 200:
response_payload = response['Payload'].read()
return json.loads(response_payload)
else:
raise Exception("Error invoking Lambda function: StatusCode={}, FunctionError={}".format(response['StatusCode'], response.get('FunctionError')))
except Exception as e:
raise Exception("Error invoking Lambda function: {}".format(str(e)))function_name = 'uc_genai_neddine-rag'
payload = {'query': "What is Hiberus Xalok?"}
response = invoke_lambda_function(payload, function_name)
print(response)
8. Develop a Chat UI with Streamlit
Build User Interface:
- Use Streamlit to create a user interface for interacting with the Lambda function.
- Implement features for response streaming, displaying source documents, and collecting user feedback.
Pros and Cons of Amazon Kendra based RAG Architecture
Pros:
- Advanced Search Capabilities: Supports natural language querying and advanced document ranking.
- Unified Search: Can connect multiple data repositories to a single index.
- Scalability: Highly scalable and integrates seamlessly with other AWS services.
Cons:
- Cost: Relatively high cost compared to other solutions.
- Complexity: More complex setup and maintenance compared to fully managed solutions.
Use Cases:
- Enterprise Search: Ideal for organizations needing advanced search capabilities across multiple data sources.
- Legal and Regulatory Compliance: Useful for retrieving specific documents from vast repositories based on detailed search queries.
- Research and Development: Helps in retrieving relevant research papers, articles, and documents quickly and accurately.
Comparison, Analysis, and Best Use Cases
Comparison of Approaches
Analysis
Bedrock Knowledge Bases:
- Pros: Easy to set up, managed service, good for straightforward use cases.
- Cons: Limited flexibility, potential latency due to real-time queries.
Amazon Kendra-Based RAG:
- Pros: Advanced search capabilities, flexible integration, highly scalable.
- Cons: Higher cost, complex setup and maintenance.
Best Use Cases and Guidance
When to Use Bedrock Knowledge Bases:
- For quick and straightforward RAG implementations.
- When data is primarily stored in S3 and managed services are preferred.
- Suitable for customer support, internal knowledge management, and educational purposes.
When to Use Amazon Kendra-Based RAG:
- For enterprise-level applications requiring advanced search capabilities.
- When integrating multiple data sources and requiring precise document retrieval.
- Ideal for legal, compliance, research, and development scenarios.
Conclusion
Both Bedrock Knowledge Bases and Amazon Kendra offer robust solutions for implementing RAG on AWS, each with its own set of advantages and trade-offs. The choice between the two approaches depends on the specific needs of your organization, including complexity, cost, latency, accuracy, and flexibility. By carefully considering these factors and the outlined use cases, you can select the most appropriate architecture to enhance the capabilities of your LLM and meet your business objectives.