Notebooks
Agent Chat with Multimodal Models: DALLE and GPT-4V
Use Cases
- Use cases
- Reference Agents
- Notebooks
- All Notebooks
- Group Chat with Customized Speaker Selection Method
- RAG OpenAI Assistants in AutoGen
- Using RetrieveChat with Qdrant for Retrieve Augmented Code Generation and Question Answering
- Auto Generated Agent Chat: Function Inception
- Task Solving with Provided Tools as Functions (Asynchronous Function Calls)
- Using Guidance with AutoGen
- Solving Complex Tasks with A Sequence of Nested Chats
- Group Chat
- Solving Multiple Tasks in a Sequence of Async Chats
- Auto Generated Agent Chat: Task Solving with Provided Tools as Functions
- Conversational Chess using non-OpenAI clients
- RealtimeAgent with local websocket connection
- Web Scraping using Apify Tools
- DeepSeek: Adding Browsing Capabilities to AG2
- Interactive LLM Agent Dealing with Data Stream
- Generate Dalle Images With Conversable Agents
- Supercharging Web Crawling with Crawl4AI
- RealtimeAgent in a Swarm Orchestration
- Perform Research with Multi-Agent Group Chat
- Agent Tracking with AgentOps
- Translating Video audio using Whisper and GPT-3.5-turbo
- Automatically Build Multi-agent System from Agent Library
- Auto Generated Agent Chat: Collaborative Task Solving with Multiple Agents and Human Users
- Structured output
- CaptainAgent
- Group Chat with Coder and Visualization Critic
- Cross-Framework LLM Tool Integration with AG2
- Using FalkorGraphRagCapability with agents for GraphRAG Question & Answering
- Demonstrating the `AgentEval` framework using the task of solving math problems as an example
- RealtimeAgent in a Swarm Orchestration using WebRTC
- A Uniform interface to call different LLMs
- From Dad Jokes To Sad Jokes: Function Calling with GPTAssistantAgent
- Solving Complex Tasks with Nested Chats
- Usage tracking with AutoGen
- Agent with memory using Mem0
- Using RetrieveChat Powered by PGVector for Retrieve Augmented Code Generation and Question Answering
- Tools with Dependency Injection
- Solving Multiple Tasks in a Sequence of Chats with Different Conversable Agent Pairs
- WebSurferAgent
- Using RetrieveChat Powered by MongoDB Atlas for Retrieve Augmented Code Generation and Question Answering
- Assistants with Azure Cognitive Search and Azure Identity
- ReasoningAgent - Advanced LLM Reasoning with Multiple Search Strategies
- Agentic RAG workflow on tabular data from a PDF file
- RealtimeAgent in a Swarm Orchestration
- Making OpenAI Assistants Teachable
- Run a standalone AssistantAgent
- AutoBuild
- Solving Multiple Tasks in a Sequence of Chats
- Currency Calculator: Task Solving with Provided Tools as Functions
- Swarm Orchestration with AG2
- Use AutoGen in Databricks with DBRX
- Using a local Telemetry server to monitor a GraphRAG agent
- Auto Generated Agent Chat: Solving Tasks Requiring Web Info
- StateFlow: Build Workflows through State-Oriented Actions
- Groupchat with Llamaindex agents
- Using Neo4j's native GraphRAG SDK with AG2 agents for Question & Answering
- WebSurferAgent
- Agent Chat with Multimodal Models: LLaVA
- Group Chat with Retrieval Augmented Generation
- Runtime Logging with AutoGen
- SocietyOfMindAgent
- Agent Chat with Multimodal Models: DALLE and GPT-4V
- Agent Observability with OpenLIT
- Mitigating Prompt hacking with JSON Mode in Autogen
- Trip planning with a FalkorDB GraphRAG agent using a Swarm
- Language Agent Tree Search
- Auto Generated Agent Chat: Collaborative Task Solving with Coding and Planning Agent
- OptiGuide with Nested Chats in AutoGen
- Auto Generated Agent Chat: Task Solving with Langchain Provided Tools as Functions
- Writing a software application using function calls
- Auto Generated Agent Chat: GPTAssistant with Code Interpreter
- Adding Browsing Capabilities to AG2
- Agentchat MathChat
- Chatting with a teachable agent
- RealtimeAgent with gemini client
- Preprocessing Chat History with `TransformMessages`
- Chat with OpenAI Assistant using function call in AutoGen: OSS Insights for Advanced GitHub Data Analysis
- Small, Local Model (IBM Granite) Multi-Agent RAG
- Websockets: Streaming input and output using websockets
- RealtimeAgent in a Swarm Orchestration
- Task Solving with Code Generation, Execution and Debugging
- Agent Chat with Async Human Inputs
- Agent Chat with custom model loading
- Chat Context Dependency Injection
- DeepReaserchAgent
- Nested Chats for Tool Use in Conversational Chess
- Auto Generated Agent Chat: Group Chat with GPTAssistantAgent
- Cross-Framework LLM Tool for CaptainAgent
- Auto Generated Agent Chat: Teaching AI New Skills via Natural Language Interaction
- SQL Agent for Spider text-to-SQL benchmark
- Auto Generated Agent Chat: Task Solving with Code Generation, Execution, Debugging & Human Feedback
- OpenAI Assistants in AutoGen
- (Legacy) Implement Swarm-style orchestration with GroupChat
- Enhanced Swarm Orchestration with AG2
- Using RetrieveChat for Retrieve Augmented Code Generation and Question Answering
- Using Neo4j's graph database with AG2 agents for Question & Answering
- AgentOptimizer: An Agentic Way to Train Your LLM Agent
- Engaging with Multimodal Models: GPT-4V in AutoGen
- RealtimeAgent with WebRTC connection
- Agent with memory using Mem0
- FSM - User can input speaker transition constraints
- Config loader utility functions
- Community Gallery
Notebooks
Agent Chat with Multimodal Models: DALLE and GPT-4V
Multimodal agent chat with DALL-E and GPT-4v.
Requires: OpenAI V1.
Before everything starts, install AutoGen with the lmm
option
pip install "pyautogen[lmm]>=0.2.3"
import os
import re
from typing import Dict, List, Optional, Union
import PIL
import matplotlib.pyplot as plt
from PIL import Image
from diskcache import Cache
from openai import OpenAI
import autogen
from autogen import Agent, AssistantAgent, ConversableAgent, UserProxyAgent
from autogen.agentchat.contrib.img_utils import _to_pil, get_image_data, get_pil_image
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
config_list_4v = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"model": ["gpt-4-vision-preview"],
},
)
config_list_gpt4 = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"model": ["gpt-4", "gpt-4-0314", "gpt4", "gpt-4-32k", "gpt-4-32k-0314", "gpt-4-32k-v0314"],
},
)
config_list_dalle = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"model": ["dalle"],
},
)
gpt4_llm_config = {"config_list": config_list_gpt4, "cache_seed": 42}
The config_list_dalle
should be something like:
[
{
'model': 'dalle',
'api_key': 'Your API Key here',
'api_version': '2024-02-01'
}
]
Helper Functions
We first create a wrapper for DALLE call, make the
def dalle_call(client: OpenAI, model: str, prompt: str, size: str, quality: str, n: int) -> str:
"""Generate an image using OpenAI's DALL-E model and cache the result.
This function takes a prompt and other parameters to generate an image using OpenAI's DALL-E model.
It checks if the result is already cached; if so, it returns the cached image data. Otherwise,
it calls the DALL-E API to generate the image, stores the result in the cache, and then returns it.
Args:
client (OpenAI): The OpenAI client instance for making API calls.
model (str): The specific DALL-E model to use for image generation.
prompt (str): The text prompt based on which the image is generated.
size (str): The size specification of the image. TODO: This should allow specifying landscape, square, or portrait modes.
quality (str): The quality setting for the image generation.
n (int): The number of images to generate.
Returns:
str: The image data as a string, either retrieved from the cache or newly generated.
Note:
- The cache is stored in a directory named '.cache/'.
- The function uses a tuple of (model, prompt, size, quality, n) as the key for caching.
- The image data is obtained by making a secondary request to the URL provided by the DALL-E API response.
"""
# Function implementation...
cache = Cache(".cache/") # Create a cache directory
key = (model, prompt, size, quality, n)
if key in cache:
return cache[key]
# If not in cache, compute and store the result
response = client.images.generate(
model=model,
prompt=prompt,
size=size,
quality=quality,
n=n,
)
image_url = response.data[0].url
img_data = get_image_data(image_url)
cache[key] = img_data
return img_data
Here is a helper function to extract image from a DALLE agent. We will show the DALLE agent later.
def extract_img(agent: Agent) -> PIL.Image:
"""Extracts an image from the last message of an agent and converts it to a PIL image.
This function searches the last message sent by the given agent for an image tag,
extracts the image data, and then converts this data into a PIL (Python Imaging Library) image object.
Parameters:
agent (Agent): An instance of an agent from which the last message will be retrieved.
Returns:
PIL.Image: A PIL image object created from the extracted image data.
Note:
- The function assumes that the last message contains an <img> tag with image data.
- The image data is extracted using a regular expression that searches for <img> tags.
- It's important that the agent's last message contains properly formatted image data for successful extraction.
- The `_to_pil` function is used to convert the extracted image data into a PIL image.
- If no <img> tag is found, or if the image data is not correctly formatted, the function may raise an error.
"""
last_message = agent.last_message()["content"]
if isinstance(last_message, str):
img_data = re.findall("<img (.*)>", last_message)[0]
elif isinstance(last_message, list):
# The GPT-4V format, where the content is an array of data
assert isinstance(last_message[0], dict)
img_data = last_message[0]["image_url"]["url"]
pil_img = get_pil_image(img_data)
return pil_img
The DALLE Agent
class DALLEAgent(ConversableAgent):
def __init__(self, name, llm_config: dict, **kwargs):
super().__init__(name, llm_config=llm_config, **kwargs)
try:
config_list = llm_config["config_list"]
api_key = config_list[0]["api_key"]
except Exception as e:
print("Unable to fetch API Key, because", e)
api_key = os.getenv("OPENAI_API_KEY")
self._dalle_client = OpenAI(api_key=api_key)
self.register_reply([Agent, None], DALLEAgent.generate_dalle_reply)
def send(
self,
message: Union[Dict, str],
recipient: Agent,
request_reply: Optional[bool] = None,
silent: Optional[bool] = False,
):
# override and always "silent" the send out message;
# otherwise, the print log would be super long!
super().send(message, recipient, request_reply, silent=True)
def generate_dalle_reply(self, messages: Optional[List[Dict]], sender: "Agent", config):
"""Generate a reply using OpenAI DALLE call."""
client = self._dalle_client if config is None else config
if client is None:
return False, None
if messages is None:
messages = self._oai_messages[sender]
prompt = messages[-1]["content"]
# TODO: integrate with autogen.oai. For instance, with caching for the API call
img_data = dalle_call(
client=client,
model="dall-e-3",
prompt=prompt,
size="1024x1024", # TODO: the size should be flexible, deciding landscape, square, or portrait mode.
quality="standard",
n=1,
)
img_data = _to_pil(img_data) # Convert to PIL image
# Return the OpenAI message format
return True, {"content": [{"type": "image_url", "image_url": {"url": img_data}}]}
Simple Example: Call directly from User
dalle = DALLEAgent(name="Dalle", llm_config={"config_list": config_list_dalle})
user_proxy = UserProxyAgent(
name="User_proxy", system_message="A human admin.", human_input_mode="NEVER", max_consecutive_auto_reply=0
)
# Ask the question with an image
user_proxy.initiate_chat(
dalle,
message="""Create an image with black background, a happy robot is showing a sign with "I Love AutoGen".""",
)
img = extract_img(dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
Example With Critics: Iterate several times to improve
class DalleCreator(AssistantAgent):
def __init__(self, n_iters=2, **kwargs):
"""Initializes a DalleCreator instance.
This agent facilitates the creation of visualizations through a collaborative effort among
its child agents: dalle and critics.
Parameters:
- n_iters (int, optional): The number of "improvement" iterations to run. Defaults to 2.
- **kwargs: keyword arguments for the parent AssistantAgent.
"""
super().__init__(**kwargs)
self.register_reply([Agent, None], reply_func=DalleCreator._reply_user, position=0)
self._n_iters = n_iters
def _reply_user(self, messages=None, sender=None, config=None):
if all((messages is None, sender is None)):
error_msg = f"Either {messages=} or {sender=} must be provided."
logger.error(error_msg) # noqa: F821
raise AssertionError(error_msg)
if messages is None:
messages = self._oai_messages[sender]
img_prompt = messages[-1]["content"]
# Define the agents
self.critics = MultimodalConversableAgent(
name="Critics",
system_message="""You need to improve the prompt of the figures you saw.
How to create a figure that is better in terms of color, shape, text (clarity), and other things.
Reply with the following format:
CRITICS: the image needs to improve...
PROMPT: here is the updated prompt!
""",
llm_config={"config_list": config_list_4v, "max_tokens": 1000},
human_input_mode="NEVER",
max_consecutive_auto_reply=3,
)
self.dalle = DALLEAgent(
name="Dalle", llm_config={"config_list": config_list_dalle}, max_consecutive_auto_reply=0
)
# Data flow begins
self.send(message=img_prompt, recipient=self.dalle, request_reply=True)
img = extract_img(self.dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
print("Image PLOTTED")
for i in range(self._n_iters):
# Downsample the image s.t. GPT-4V can take
img = extract_img(self.dalle)
smaller_image = img.resize((128, 128), Image.Resampling.LANCZOS)
smaller_image.save("result.png")
self.msg_to_critics = f"""Here is the prompt: {img_prompt}.
Here is the figure <img result.png>.
Now, critic and create a prompt so that DALLE can give me a better image.
Show me both "CRITICS" and "PROMPT"!
"""
self.send(message=self.msg_to_critics, recipient=self.critics, request_reply=True)
feedback = self._oai_messages[self.critics][-1]["content"]
img_prompt = re.findall("PROMPT: (.*)", feedback)[0]
self.send(message=img_prompt, recipient=self.dalle, request_reply=True)
img = extract_img(self.dalle)
plt.imshow(img)
plt.axis("off") # Turn off axis numbers
plt.show()
print(f"Image {i} PLOTTED")
return True, "result.jpg"
creator = DalleCreator(
name="DALLE Creator!",
max_consecutive_auto_reply=0,
system_message="Help me coordinate generating image",
llm_config=gpt4_llm_config,
)
user_proxy = UserProxyAgent(name="User", human_input_mode="NEVER", max_consecutive_auto_reply=0)
user_proxy.initiate_chat(
creator, message="""Create an image with black background, a happy robot is showing a sign with "I Love AutoGen"."""
)