LiveAgent is a full-duplex voice agent backed by a provider's realtime API. Unlike the turn-by-turn STT/TTS pipeline, it opens a single bidirectional session for the entire conversation — audio flows in and out continuously, with built-in voice activity detection and barge-in.
A LiveAgent holds a RealtimeConfig and is opened via agent.run(), which yields a ConversationContext. Peers (player, recorder, observers) share that context so they all read from and write to the same event stream.
importasynciofromautogen.beta.liveimport(LiveAgent,SoundDevicePlayer,SoundDeviceRecorder,openai,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant.",config=openai.RealTimeConfig("gpt-realtime-2",output=openai.AudioOutput(voice="ballad",speed=1.2),),)asyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDevicePlayer(context=context),SoundDeviceRecorder(context=context),):print("Starting...")awaitasyncio.Future()# run until cancelledif__name__=="__main__":asyncio.run(main())
Note
The three context managers must share the samecontext so the recorder's RecordedAudioEvents reach the provider session and the provider's SynthesizedAudioEvents reach the player.
importasynciofromautogen.beta.eventsimportModelMessageChunkfromautogen.beta.liveimport(LiveAgent,OpenAIRealTimeConfig,SoundDevicePlayer,SoundDeviceRecorder,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant.",config=OpenAIRealTimeConfig("gpt-realtime-2"),)asyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDevicePlayer(context=context),SoundDeviceRecorder(context=context),):print("Starting...")withcontext.stream.where(ModelMessageChunk).join()asevents:asyncforeventinevents:print(event)if__name__=="__main__":asyncio.run(main())
Tip
stream.where(EventType).join() gives you an async iterator that yields filtered events. It's the idiomatic way to consume a single event type from the live session without writing a subscriber.
To keep the realtime session for its low-latency turn detection but disable audio output entirely, swap AudioOutput for TextOutput. The model returns raw text via ModelMessageChunk and never produces synthesized audio.
importasynciofromautogen.beta.eventsimportModelMessageChunkfromautogen.beta.liveimport(LiveAgent,SoundDeviceRecorder,openai,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant.",config=openai.RealTimeConfig("gpt-realtime-2",output=openai.TextOutput(),),)asyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDeviceRecorder(context=context),):print("Starting...")withcontext.stream.where(ModelMessageChunk).join()asevents:asyncforeventinevents:print(event)if__name__=="__main__":asyncio.run(main())
LiveAgent supports the same @agent.tool decorator as a regular Agent. Tool calls are routed through AG2's normal tool executor, and results are sent back to the provider's realtime session automatically.
importasynciofromautogen.beta.liveimport(LiveAgent,OpenAIRealTimeConfig,SoundDevicePlayer,SoundDeviceRecorder,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant.",config=OpenAIRealTimeConfig("gpt-realtime-2"),)@agent.toolasyncdefsum_numbers(a:int,b:int)->int:"""You can use this tool to sum two numbers."""print(f"Summing {a} and {b}")returna+basyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDevicePlayer(context=context),SoundDeviceRecorder(context=context),):print("Starting...")awaitasyncio.Future()if__name__=="__main__":asyncio.run(main())
fromautogen.beta.liveimportopenaiconfig=openai.RealTimeConfig("gpt-realtime-2",output=openai.AudioOutput(voice="ballad",speed=1.2),input=openai.InputConfig(# semantic VAD with interruption is the defaultturn_detection={"type":"semantic_vad","create_response":True,"interrupt_response":True,},),)
importasynciofromautogen.beta.eventsimportModelMessageChunk,TranscriptionChunkEventfromautogen.beta.liveimport(LiveAgent,SoundDevicePlayer,SoundDeviceRecorder,gemini,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant. Always respond in English.",config=gemini.RealTimeConfig("gemini-3.1-flash-live-preview",output=gemini.AudioOutput(voice="Puck",language_code="en-US"),input=gemini.InputConfig(transcribe=True),),)asyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDevicePlayer(context=context),# Gemini Live requires 16 kHz mono PCM inputSoundDeviceRecorder(context=context,sample_rate=16000),):print("Starting...")withcontext.stream.where(ModelMessageChunk|TranscriptionChunkEvent).join()asevents:asyncforeventinevents:print(event)if__name__=="__main__":asyncio.run(main())
LiveAgent mirrors Agent's constructor surface — name, prompt, tools, middleware, observers, dependencies, variables, plugins, hitl_hook — so most agent-level concepts carry over. The differences:
Feature
Agent
LiveAgent
Entry point
await agent.ask(input)
async with agent.run() as context
History
Returned via AgentReply
Lives on the session's stream
Turn detection
Application-driven (you call ask)
Provider-driven (VAD)
Structured output
Supported
Not supported
tasks / run_subtask
Supported
Not supported
If you need both — for example, a realtime voice front-end that hands off to a tasking agent — drive the handoff through a tool on the LiveAgent that delegates to a separate Agent using Agent.as_tool().