Open In Colab Open on GitHub

LiteLLM is an open-source, locally run proxy server providing an OpenAI-compatible API. It supports various LLM providers, including IBM’s WatsonX, enabling seamless integration with tools like AG2.

Running LiteLLM with WatsonX requires the following installations:

  1. AG2 – A framework for building and orchestrating AI agents.
  2. LiteLLM – An OpenAI-compatible proxy for bridging non-compliant APIs.
  3. IBM WatsonX – LLM service requiring specific session token authentication.

Prerequisites

Before setting up, ensure Docker is installed. Refer to the Docker installation guide. Optionally, consider using Postman to easily test API requests.


Installing WatsonX

To set up WatsonX, follow these steps: 1. Access WatsonX: - Sign up for WatsonX.ai. - Create an API_KEY and PROJECT_ID.

  1. Validate WatsonX API Access:
    • Verify access using the following commands:

Tip: Verify access to watsonX APIs before installing LiteLLM.

Get Session Token:

curl -L "https://iam.cloud.ibm.com/identity/token?=null" 
-H "Content-Type: application/x-www-form-urlencoded" 
-d "grant_type=urn%3Aibm%3Aparams%3Aoauth%3Agrant-type%3Aapikey" 
-d "apikey=<API_KEY>"

Get list of LLMs:

curl -L "https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2024-09-16&project_id=1eeb4112-5f6e-4a81-9b61-8eac7f9653b4&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200" 
-H "Authorization: Bearer <SESSION TOKEN>"

Ask the LLM a question:

curl -L "https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-02" 
-H "Content-Type: application/json" 
-H "Accept: application/json" 
-H "Authorization: Bearer <SESSION TOKEN>" \
-d "{
  \"model_id\": \"google/flan-t5-xxl\",
  \"input\": \"What is the capital of Arkansas?:\",
  \"parameters\": {
    \"max_new_tokens\": 100,
    \"time_limit\": 1000
  },
  \"project_id\": \"<PROJECT_ID>"}"

With access to watsonX API’s validated you can install the python library from https://ibm.github.io/watsonx-ai-python-sdk/install.html


Installing LiteLLM

To install LiteLLM, follow these steps: 1. Download LiteLLM Docker Image:

docker pull ghcr.io/berriai/litellm:main-latest

OR

Install LiteLLM Python Library:

pip install 'litellm[proxy]'
  1. Create a LiteLLM Configuration File:

    • Save as litellm_config.yaml in a local directory.

    • Example content for WatsonX:

       model_list:
          - model_name: llama-3-8b
          litellm_params:
             # all params accepted by litellm.completion()
             model: watsonx/meta-llama/llama-3-8b-instruct
             api_key: "os.environ/WATSONX_API_KEY" 
             project_id: "os.environ/WX_PROJECT_ID"
      
      
      - model_name: "llama_3_2_90"
         litellm_params:
            model: watsonx/meta-llama/llama-3-2-90b-vision-instruct
            api_key: os.environ["WATSONX_APIKEY"] = "" # IBM cloud API key
            max_new_tokens: 4000
      
  2. Start LiteLLM Container:

    docker run -v <Directory>\litellm_config.yaml:/app/config.yaml -e WATSONX_API_KEY=<API_KEY> -e WATSONX_URL=https://us-south.ml.cloud.ibm.com/ml/v1/text/generation?version=2023-05-02 -e WX_PROJECT_ID=<PROJECT_ID> -p 4000:4000 ghcr.io/berriai/litellm:main-latest --config /app/config.yaml --detailed_debug
    

Installing AG2

AG2 simplifies orchestration and communication between agents. To install:

  1. Open a terminal with administrator rights.

  2. Run the following command:

    pip install ag2
    

Once installed, AG2 agents can leverage WatsonX APIs via LiteLLM.


phi1 = {
    "config_list": [
        {
            "model": "llama-3-8b",
            "base_url": "http://localhost:4000", #use http://0.0.0.0:4000 for Macs
            "api_key":"watsonx",
            "price" : [0,0]
        },
    ],
    "cache_seed": None,  # Disable caching.
}

phi2 = {
    "config_list": [
        {
            "model": "llama-3-8b",
            "base_url": "http://localhost:4000", #use http://0.0.0.0:4000 for Macs
            "api_key":"watsonx",
            "price" : [0,0]
        },
    ],
    "cache_seed": None,  # Disable caching.
}

from AG2 import ConversableAgent, AssistantAgent

jack = ConversableAgent(
    "Jack (Phi-2)",
    llm_config=phi2,
    system_message="Your name is Jack and you are a comedian in a two-person comedy show.",
)

emma = ConversableAgent(
    "Emma (Gemma)",
    llm_config=phi1, 
    system_message="Your name is Emma and you are a comedian in two-person comedy show.",
)

#AG2
chat_result = jack.initiate_chat(emma, message="Emma, tell me a joke.", max_turns=2)