Skip to content

AG2 + Gemini Thinking Config Variants#

Open In Colab Open on GitHub

Author: Priyanshu Deshmukh

This notebook shows how to adjust Gemini thinking features in AG2: - thinking_budget (token budget for thinking) - thinking_level (“High” vs “Low”) - include_thoughts (whether to return thought summaries)

Reference: Gemini Thinking Guide

Install AG2 with Google Gemini support:

pip install ag2[gemini]
import os

from dotenv import load_dotenv

from autogen import ConversableAgent, LLMConfig

load_dotenv()

api_key = os.getenv("GOOGLE_GEMINI_API_KEY")
if not api_key:
    raise RuntimeError("GOOGLE_GEMINI_API_KEY is not set. Please set it in your environment or .env file.")

prompt = """You are playing the 20 question game. You know that what you are looking for
    is an aquatic mammal that doesn't live in the sea, is venomous and that's
    smaller than a cat. What could that be and how could you make sure?
    """

AG2 now supports Google Gemini’s ThinkingConfig#

ThinkConfig has three configuration, which are configured through LLMConfig item parameters: - thinking budget: Indicates the thinking budget in tokens. 0 is DISABLED. -1 is AUTOMATIC. The default values and allowed ranges are model dependent. - thinking level: The level of thoughts tokens that the model should generate. - include_thoughts: Indicates whether to include thoughts in the response. If true, thoughts are returned only if the model supports thought and thoughts are available.

# example configuration for ThinkingConfig Support
llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-pro-preview",
        "api_type": "google",
        "api_key": api_key,
        # "thinking_budget": 1000, # Thinking Budget or Thinking Level
        "thinking_level": "High",  # Use the thinkingLevel parameter with Gemini 3 Pro. While thinkingBudget is accepted for backwards compatibility, using it with Gemini 3 Pro is recommended
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

thinking_budget#

The thinking_budget parameter, introduced with the Gemini 2.5 series, guides the model on the specific number of thinking tokens to use for reasoning.

Note: Use the thinking_level parameter with Gemini 3 Pro. While thinking_budget is accepted for backwards compatibility, using it with Gemini 3 Pro may result in suboptimal performance.

0 is DISABLED. -1 is AUTOMATIC. The default values and allowed ranges are model dependent. the ranges can be found here: thinking budget ranges

budget = 4096

llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",
        "api_type": "google",
        "api_key": api_key,
        "thinking_budget": budget,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

Vary thinking_level#

You can set thinking_level to “low” or “high” (which is the default for gemini-3-pro-preview) for gemini-3-pro-preview or gemini-3-flash-preview. gemini-3-flash-preview also supports “medium” or “minimal” (similar to no thinking). These settings will indicate to the model if it allowed to do a lot of thinking. Since the thinking process stays dynamic, high doesn’t mean it will always use a lot of token in its thinking phase, just that it’s allowed to.

level = "High"

llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-flash-preview",
        "api_type": "google",
        "api_key": api_key,
        "thinking_level": level,
    }
)

agent = ConversableAgent(name="agent", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

thinking_level is not supported by Gemini 2.5 Flash (this code will throw an error). It is, however, supported by Gemini 3 Flash preview.

level = "Low"
llm_config = LLMConfig(
    config_list={
        "model": "gemini-2.5-flash",  # Note: "gemini-3-flash-preview" does support thinking_level
        "api_type": "google",
        "api_key": api_key,
        "thinking_level": level,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

# This will cause an exception as gemini-2.5-flash does not support thinking_level
response.process()

include_thoughts#

True/False to see thought summaries or no thoughts, respectively. Summaries of the model’s thinking reveal its internal problem-solving pathway. Users can leverage this feature to check the model’s strategy and remain informed during complex tasks.

The agent’s reply message will contain the thoughts first and then the answer.

llm_config = LLMConfig(
    config_list={
        "model": "gemini-3-flash-preview",
        "api_type": "google",
        "api_key": api_key,
        "thinking_budget": 4096,
        "include_thoughts": True,
    }
)

agent = ConversableAgent(name="agent-thoughts", description="you are a helpful assistant", llm_config=llm_config)
response = agent.run(message=prompt, max_turns=2, user_input=True)

response.process()

Tips#

  • For long/complex tasks, use a higher thinking_budget.
  • thinking_level can be lowered for lighter reasoning.
  • Set include_thoughts=True when you want thought summaries; turn off to reduce output.

Reference: Gemini Thinking Guide