Skip to content

LiveAgent

LiveAgent is a full-duplex voice agent backed by a provider's realtime API. Unlike the turn-by-turn STT/TTS pipeline, it opens a single bidirectional session for the entire conversation — audio flows in and out continuously, with built-in voice activity detection and barge-in.

Quick start#

A LiveAgent holds a RealtimeConfig and is opened via agent.run(), which yields a ConversationContext. Peers (player, recorder, observers) share that context so they all read from and write to the same event stream.

import asyncio

from autogen.beta.live import (
    LiveAgent,
    SoundDevicePlayer,
    SoundDeviceRecorder,
    openai,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=openai.RealTimeConfig(
        "gpt-realtime-2",
        output=openai.AudioOutput(voice="ballad", speed=1.2),
    ),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        await asyncio.Future()  # run until cancelled

if __name__ == "__main__":
    asyncio.run(main())

Note

The three context managers must share the same context so the recorder's RecordedAudioEvents reach the provider session and the provider's SynthesizedAudioEvents reach the player.

Watching the transcript#

The realtime provider streams both audio and a text transcript. Subscribe to ModelMessageChunk to receive the assistant's transcript token-by-token.

import asyncio

from autogen.beta.events import ModelMessageChunk
from autogen.beta.live import (
    LiveAgent,
    OpenAIRealTimeConfig,
    SoundDevicePlayer,
    SoundDeviceRecorder,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=OpenAIRealTimeConfig("gpt-realtime-2"),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        with context.stream.where(ModelMessageChunk).join() as events:
            async for event in events:
                print(event)

if __name__ == "__main__":
    asyncio.run(main())

Tip

stream.where(EventType).join() gives you an async iterator that yields filtered events. It's the idiomatic way to consume a single event type from the live session without writing a subscriber.

Text-only output#

To keep the realtime session for its low-latency turn detection but disable audio output entirely, swap AudioOutput for TextOutput. The model returns raw text via ModelMessageChunk and never produces synthesized audio.

import asyncio

from autogen.beta.events import ModelMessageChunk
from autogen.beta.live import (
    LiveAgent,
    SoundDeviceRecorder,
    openai,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=openai.RealTimeConfig(
        "gpt-realtime-2",
        output=openai.TextOutput(),
    ),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        with context.stream.where(ModelMessageChunk).join() as events:
            async for event in events:
                print(event)

if __name__ == "__main__":
    asyncio.run(main())

Tools in a realtime session#

LiveAgent supports the same @agent.tool decorator as a regular Agent. Tool calls are routed through AG2's normal tool executor, and results are sent back to the provider's realtime session automatically.

import asyncio

from autogen.beta.live import (
    LiveAgent,
    OpenAIRealTimeConfig,
    SoundDevicePlayer,
    SoundDeviceRecorder,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant.",
    config=OpenAIRealTimeConfig("gpt-realtime-2"),
)

@agent.tool
async def sum_numbers(a: int, b: int) -> int:
    """You can use this tool to sum two numbers."""
    print(f"Summing {a} and {b}")
    return a + b

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        SoundDeviceRecorder(context=context),
    ):
        print("Starting...")
        await asyncio.Future()

if __name__ == "__main__":
    asyncio.run(main())

Providers#

LiveAgent is provider-neutral — it accepts any RealtimeConfig. AG2 Beta ships with two implementations.

from autogen.beta.live import openai

config = openai.RealTimeConfig(
    "gpt-realtime-2",
    output=openai.AudioOutput(voice="ballad", speed=1.2),
    input=openai.InputConfig(
        # semantic VAD with interruption is the default
        turn_detection={
            "type": "semantic_vad",
            "create_response": True,
            "interrupt_response": True,
        },
    ),
)

Available voices: alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar.

1
2
3
4
5
6
7
from autogen.beta.live import gemini

config = gemini.RealTimeConfig(
    "gemini-3.1-flash-live-preview",
    output=gemini.AudioOutput(voice="Puck", language_code="en-US"),
    input=gemini.InputConfig(transcribe=True),
)

Available voices: Aoede, Charon, Fenrir, Kore, Leda, Orus, Puck, Zephyr.

Warning

Gemini Live's audio I/O is fixed by the API: 16 kHz mono PCM input, 24 kHz mono PCM output. Configure the recorder accordingly:

SoundDeviceRecorder(context=context, sample_rate=16000)

Full Gemini example with a tool

import asyncio

from autogen.beta.events import ModelMessageChunk, TranscriptionChunkEvent
from autogen.beta.live import (
    LiveAgent,
    SoundDevicePlayer,
    SoundDeviceRecorder,
    gemini,
)

agent = LiveAgent(
    name="assistant",
    prompt="You are a helpful voice assistant. Always respond in English.",
    config=gemini.RealTimeConfig(
        "gemini-3.1-flash-live-preview",
        output=gemini.AudioOutput(voice="Puck", language_code="en-US"),
        input=gemini.InputConfig(transcribe=True),
    ),
)

async def main() -> None:
    async with (
        agent.run() as context,
        SoundDevicePlayer(context=context),
        # Gemini Live requires 16 kHz mono PCM input
        SoundDeviceRecorder(context=context, sample_rate=16000),
    ):
        print("Starting...")
        with context.stream.where(ModelMessageChunk | TranscriptionChunkEvent).join() as events:
            async for event in events:
                print(event)

if __name__ == "__main__":
    asyncio.run(main())

LiveAgent vs Agent#

LiveAgent mirrors Agent's constructor surface — name, prompt, tools, middleware, observers, dependencies, variables, plugins, hitl_hook — so most agent-level concepts carry over. The differences:

Feature Agent LiveAgent
Entry point await agent.ask(input) async with agent.run() as context
History Returned via AgentReply Lives on the session's stream
Turn detection Application-driven (you call ask) Provider-driven (VAD)
Structured output Supported Not supported
tasks / run_subtask Supported Not supported

If you need both — for example, a realtime voice front-end that hands off to a tasking agent — drive the handoff through a tool on the LiveAgent that delegates to a separate Agent using Agent.as_tool().

What's next#

  • STT & TTS — the lower-latency turn-by-turn alternative.
  • Tools — tool authoring, middleware, and approval flows that all work inside a LiveAgent.