This paper presents the design and implementation of a production-ready AI Voice Call Assistant platform for automated call management. The proposed system integrates real-time WebRTC audio transport via LiveKit, streaming Speech-to-Text (STT) using Deepgram Nova-2, a large language model reasoning layer powered by OpenAI GPT-4o through LangChain, and streaming Text-to-Speech (TTS) via ElevenLabs Multilingual v2. The backend is built on FastAPI with WebSocket-based pipeline orchestration, while the frontend dashboard is developed using Next.js 14 and Tailwind CSS. Session state is managed with Redis and persistent data is stored in PostgreSQL. All services are containerized using Docker and served through an NGINX reverse proxy. The system achieves low-latency end-to-end voice interaction, interrupt handling (barge-in), multi-turn context memory, function calling via LangChain tools, and agent configuration via a web dashboard. Experimental results demonstrate end-to-end response latency of 1.2-1.8 seconds, competitive with commercial voice AI platforms such as Vapi.ai and Retell AI, while remaining fully open-source and self-hostable.
Building similarity graph...
Analyzing shared references across papers
Loading...
Asiq Sikkander T N
Revathy D
SRM Dental College
Building similarity graph...
Analyzing shared references across papers
Loading...
N et al. (Fri,) studied this question.
synapsesocial.com/papers/69ccb79916edfba7beb899fe — DOI: https://doi.org/10.64388/irev9i9-1715544
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: