autogen.beta.live is the AG2 Beta module for building voice-enabled agents. It covers two complementary patterns: a turn-by-turn STT → Agent → TTS pipeline built on top of a regular Agent, and a full-duplex LiveAgent that streams audio to and from a provider's realtime API.
You already have a text agent and want to add a voice front-end; you need tool execution, middleware, structured output, or any other text-agent feature.
You want barge-in, interruption, semantic VAD, or a phone-call-like UX.
Note
Both patterns share the same audio I/O primitives — SoundDevicePlayer and SoundDeviceRecorder — and the same event stream. You can mix observers (TTS, logging, persistence) across both.
importasynciofromautogen.betaimportAgent,configfromautogen.beta.liveimport(OpenAITTSConfig,OpenAITranscriber,SoundDevicePlayer,SoundDeviceRecorder,TTSObserver,)agent=Agent(name="assistant",prompt="You are a helpful voice assistant.",config=config.OpenAIResponsesConfig(model="gpt-5",streaming=True),observers=[TTSObserver(config=OpenAITTSConfig(model="gpt-4o-mini-tts"))],)asyncdefmain()->None:pipeline=OpenAITranscriber("gpt-4o-mini-transcribe").pipe(agent)asyncwithSoundDevicePlayer()asplayer:voice=SoundDeviceRecorder().record(duration=3)reply=awaitpipeline.ask(voice,stream=player.stream)print(reply.body)if__name__=="__main__":asyncio.run(main())
importasynciofromautogen.beta.liveimport(LiveAgent,SoundDevicePlayer,SoundDeviceRecorder,openai,)agent=LiveAgent(name="assistant",prompt="You are a helpful voice assistant.",config=openai.RealTimeConfig("gpt-realtime-2",output=openai.AudioOutput(voice="ballad",speed=1.2),),)asyncdefmain()->None:asyncwith(agent.run()ascontext,SoundDevicePlayer(context=context),SoundDeviceRecorder(context=context),):print("Starting...")awaitasyncio.Future()if__name__=="__main__":asyncio.run(main())