Introduction

A new feature added and announced on watsonx.ai platform; using AutoAI to automate and accelerate the search for an optimized, production-quality, Retrieval-augmented generation (RAG) pattern based on users’ data and use-case.

What does this feature bring to users

This feature takes the complexity out of choosing which LLM, document chunking techniques, and retrieval methods work best for your RAG use-case, as you can read the full explanation from this post of Armand Ruiz, VP of AI at IBM.

Sample example of implementation.

Set up the environment
Before you use the sample code in this notebook, you must perform the following setup tasks:

Create a watsonx.ai Runtime Service instance (a free plan is offered and information about how to create the instance can be found here).
Install and import the required modules and dependencies
!pip install -U 'ibm-watsonx-ai[rag]>=1.2.4' | tail -n 1
!pip install -U "langchain_community>=0.3,<0.4" | tail -n 1
Defining the watsonx.ai credentials
This cell defines the credentials required to work with the watsonx.ai Runtime service.

Action: Provide the IBM Cloud user API key. For details, see documentation.

import getpass
from ibm_watsonx_ai import Credentials

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your watsonx.ai api key (hit enter): "),
)
Defining the project id
The foundation model requires a project id that provides the context for the call. We will try to obtain the id directly from the project in which this notebook runs. If this fails, you'll have to provide the project id.

import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")
Create an instance of APIClient with authentication details.

from ibm_watsonx_ai import APIClient

client = APIClient(credentials=credentials, project_id=project_id)

RAG Optimizer definition
Defining a connection to training data
Upload training data to a COS bucket and then define a connection to this file. This example uses the Base description from the ibm_watsonx_ai documentation.

The code in the next cell uploads training data to the bucket.

import os
import requests

url = "https://ibm.github.io/watsonx-ai-python-sdk/base.html"

document_filename = "base.html"

response = requests.get(url)

response.raise_for_status()

if not os.path.isfile(document_filename):
    with open(document_filename, "w", encoding="utf-8") as file:
        file.write(response.text)

document_asset_details = client.data_assets.create(name=document_filename, file_path=document_filename)

document_asset_id = client.data_assets.get_id(document_asset_details)
document_asset_id
Creating data asset...
SUCCESS
'4f76e9c4-724e-45a2-8099-2d93f2746db3'
Define a connection to training data.

from ibm_watsonx_ai.helpers import DataConnection

input_data_references = [DataConnection(data_asset_id=document_asset_id)]
Defining a connection to test data
Upload a json file that will be used for benchmarking to COS and then define a connection to this file. This example uses content from the ibm_watsonx_ai SDK documentation.

benchmarking_data_IBM_page_content = [
    {
        "question": "How can you set or refresh user request headers using the APIClient class?",
        "correct_answer": "client.set_headers({'Authorization': 'Bearer <token>'})",
        "correct_answer_document_ids": [
            "base.html"
        ]
    },
    {
        "question": "How to initialise Credentials object with api_key",
        "correct_answer": "credentials = Credentials(url = 'https://us-south.ml.cloud.ibm.com', api_key = '***********')",
        "correct_answer_document_ids": [
            "base.html"
        ]
    }
]
The code in the next cell uploads testing data to the bucket as a json file.

import json

test_filename = "benchmarking_data_Base.json"

if not os.path.isfile(test_filename):
    with open(test_filename, "w") as json_file:
        json.dump(benchmarking_data_IBM_page_content, json_file, indent=4)

test_asset_details = client.data_assets.create(name=test_filename, file_path=test_filename)

test_asset_id = client.data_assets.get_id(test_asset_details)
test_asset_id
Creating data asset...
SUCCESS
'84b59630-65a4-466d-b174-400928fb9634'
Define connection information to testing data.

test_data_references = [DataConnection(data_asset_id=test_asset_id)]
RAG Optimizer configuration
Provide the input information for AutoAI RAG optimizer:

name - experiment name
description - experiment description
max_number_of_rag_patterns - maximum number of RAG patterns to create
optimization_metrics - target optimization metrics
from ibm_watsonx_ai.experiment import AutoAI

experiment = AutoAI(credentials, project_id=project_id)

rag_optimizer = experiment.rag_optimizer(
    name='AutoAI RAG run - Base documentation',
    description="AutoAI RAG Optimizer on ibm_watsonx_ai Base documentation",
    foundation_models=["ibm/granite-13b-chat-v2"],
    embedding_models=["ibm/slate-125m-english-rtrvr"],
    retrieval_methods=["simple"],
    chunking=[
        {
            "chunk_size": 512,
            "chunk_overlap": 64,
            "method": "recursive"
        }
    ],
    max_number_of_rag_patterns=4,
    optimization_metrics=[AutoAI.RAGMetrics.ANSWER_CORRECTNESS]
)
Configuration parameters can be retrieved via get_params().

rag_optimizer.get_params()
{'name': 'AutoAI RAG run - ModelInference documentation',
 'description': 'AutoAI RAG Optimizer on ibm_watsonx_ai ModelInference documentation',
 'chunking': [{'chunk_size': 512, 'chunk_overlap': 64, 'method': 'recursive'}],
 'embedding_models': ['ibm/slate-125m-english-rtrvr'],
 'retrieval_methods': ['simple'],
 'foundation_models': ['ibm/granite-13b-chat-v2'],
 'max_number_of_rag_patterns': 4,
 'optimization_metrics': ['answer_correctness']}

RAG Experiment run
Call the run() method to trigger the AutoAI RAG experiment. You can either use interactive mode (synchronous job) or background mode (asynchronous job) by specifying background_mode=True.

run_details = rag_optimizer.run(
    input_data_references=input_data_references,
    test_data_references=test_data_references,
    background_mode=False
)

##############################################

Running 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69'

##############################################


pending.................
running....
completed
Training of 'efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69' finished successfully.
You can use the get_run_status() method to monitor AutoAI RAG jobs in background mode.

rag_optimizer.get_run_status()
'completed'

Comparison and testing of RAG Patterns
You can list the trained patterns and information on evaluation metrics in the form of a Pandas DataFrame by calling the summary() method. You can use the DataFrame to compare all discovered patterns and select the one you like for further testing.

summary = rag_optimizer.summary()
summary
mean_answer_correctness mean_faithfulness mean_context_correctness chunking.method chunking.chunk_size chunking.chunk_overlap embeddings.model_id vector_store.distance_metric retrieval.method retrieval.number_of_chunks generation.model_id
Pattern_Name           
Pattern4 0.7083 0.2317 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 3 ibm/granite-13b-chat-v2
Pattern1 0.5833 0.2045 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr cosine simple 5 ibm/granite-13b-chat-v2
Pattern2 0.5833 0.2372 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 5 ibm/granite-13b-chat-v2
Pattern3 0.5833 0.2117 1.0 recursive 512 64 ibm/slate-125m-english-rtrvr euclidean simple 3 ibm/granite-13b-chat-v2
Additionally, you can pass the scoring parameter to the summary method, to filter RAG patterns starting with the best.

summary = rag_optimizer.summary(scoring="faithfulness")
rag_optimizer.get_run_details()
{'entity': {'completed_at': '2025-01-10T10:15:30.808Z',
  'hardware_spec': {'id': 'a6c4923b-b8e4-444c-9f43-8a7ec3020110', 'name': 'L'},
  'input_data_references': [{'location': {'href': '/v2/assets/4f76e9c4-724e-45a2-8099-2d93f2746db3?project_id=b9156b62-8f2a-4a40-8570-990fdd5d67cb',
     'id': '4f76e9c4-724e-45a2-8099-2d93f2746db3'},
    'type': 'data_asset'}],
  'message': {'level': 'info', 'text': 'AAR019I: AutoAI execution completed.'},
  'parameters': {'constraints': {'chunking': [{'chunk_overlap': 64,
      'chunk_size': 512,
      'method': 'recursive'}],
    'embedding_models': ['ibm/slate-125m-english-rtrvr'],
    'foundation_models': ['ibm/granite-13b-chat-v2'],
    'max_number_of_rag_patterns': 4,
    'retrieval_methods': ['simple']},
   'optimization': {'metrics': ['answer_correctness']},
   'output_logs': True},
  'results': [{'context': {'iteration': 1,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 16,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern1/indexing_inference_notebook.ipynb'},
      'name': 'Pattern1',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 5},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'cosine',
        'index_name': 'autoai_rag_efb6f9ce_20250110101318',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.2541,
       'ci_low': 0.155,
       'mean': 0.2045,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 2,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 13,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern2/indexing_inference_notebook.ipynb'},
      'name': 'Pattern2',
      'settings': {'chunking': {'chunk_overlap': 64,
        'chunk_size': 512,
        'method': 'recursive'},
       'embeddings': {'model_id': 'ibm/slate-125m-english-rtrvr',
        'truncate_input_tokens': 512,
        'truncate_strategy': 'left'},
       'generation': {'context_template_text': '[Document]\n{document}\n[End]',
        'model_id': 'ibm/granite-13b-chat-v2',
        'parameters': {'decoding_method': 'greedy',
         'max_new_tokens': 1000,
         'min_new_tokens': 1},
        'prompt_template_text': '<|system|>\nYou are Granite Chat, an AI language model developed by IBM. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.\n<|user|>\nYou are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is grounded in context and user query. Always make sure that your response is relevant to the question.\nAnswer Length: detailed\n{reference_documents}\n{question} \n<|assistant|>'},
       'retrieval': {'method': 'simple', 'number_of_chunks': 5},
       'vector_store': {'datasource_type': 'chroma',
        'distance_metric': 'euclidean',
        'index_name': 'autoai_rag_efb6f9ce_20250110101349',
        'operation': 'upsert',
        'schema': {'fields': [{'description': 'text field',
           'name': 'text',
           'role': 'text',
           'type': 'string'},
          {'description': 'document name field',
           'name': 'document_id',
           'role': 'document_name',
           'type': 'string'},
          {'description': 'chunk starting token position in the source document',
           'name': 'start_index',
           'role': 'start_index',
           'type': 'number'},
          {'description': 'chunk number per document',
           'name': 'sequence_number',
           'role': 'sequence_number',
           'type': 'number'},
          {'description': 'vector embeddings',
           'name': 'vector',
           'role': 'vector_embeddings',
           'type': 'array'}],
         'id': 'autoai_rag_1.0',
         'name': 'Document schema using open-source loaders',
         'type': 'struct'}}}},
     'software_spec': {'name': 'autoai-rag_rt24.1-py3.11'}},
    'metrics': {'test_data': [{'ci_high': 0.6667,
       'ci_low': 0.5,
       'mean': 0.5833,
       'metric_name': 'answer_correctness'},
      {'ci_high': 0.3194,
       'ci_low': 0.155,
       'mean': 0.2372,
       'metric_name': 'faithfulness'},
      {'mean': 1.0, 'metric_name': 'context_correctness'}]}},
   {'context': {'iteration': 3,
     'max_combinations': 4,
     'rag_pattern': {'composition_steps': ['chunking',
       'embeddings',
       'vector_store',
       'retrieval',
       'generation'],
      'duration_seconds': 25,
      'location': {'evaluation_results': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern3/evaluation_results.json',
       'indexing_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5d2d78ffcf69/Pattern3/indexing_inference_notebook.ipynb',
       'inference_notebook': 'default_autoai_rag_out/efb6f9ce-8057-4fc1-9d84-5

  
Author Of article : Alain Airom

Read full article