Real-time voice conversion service based on Seed-VC, providing WebSocket voice conversion with PCM and Opus audio format support
English | 简体中文
Features are continuously being updated. Stay tuned for our latest developments…
Fast-VC-Service aims to build a high-performance real-time streaming voice conversion cloud service designed for production environments. Based on the Seed-VC model, it supports WebSocket protocol and PCM/OPUS audio encoding formats.
Core Features | Quick Start | Performance | Version Updates | TODO | Acknowledgements
Core Features
- Real-time Conversion: Low-latency streaming voice conversion based on Seed-VC
- WebSocket API: Support for PCM and OPUS audio formats
- Performance Monitoring: Complete real-time performance metrics statistics
- High Concurrency: Multi-Worker concurrent processing, supporting production environments
- Easy Deployment: Simple configuration, one-click startup
Quick Start
? One-click Installation
# Clone project git clone --recursive https://*gi*th*ub.com/Leroll/fast-vc-service.git cd fast-vc-service # Configure environment cp .env.example .env # Install dependencies (Poetry recommended) poetry install # Start service fast-vc serve
? Quick Testing
# WebSocket real-time voice conversion python examples/websocket/ws_client.py \\ --source-wav-path \"wavs/sources/low-pitched-male-24k.wav\" \\ --encoding PCM
For detailed installation and usage guide, please refer to Quick Start documentation.
? Performance
| GPU | Concurrency | Worker | Chunk time | First Token Latency | End-to-End Latency | Avg Chunk Latency | Avg RTF | Median RTF | P95 RTF |
|---|---|---|---|---|---|---|---|---|---|
| 4090D | 1 | 6 | 500 | 136.0 | 143.0 | 105.0 | 0.21 | 0.22 | 0.24 |
| 4090D | 12 | 12 | 500 | 140.1 | 256.6 | 216.6 | 0.44 | 0.45 | 0.51 |
| 1080TI | 1 | 6 | 500 | 157.0 | 272.0 | 252.2 | 0.50 | 0.51 | 0.61 |
| 1080TI | 3 | 6 | 500 | 154.3 | 261.3 | 304.9 | 0.61 | 0.62 | 0.73 |
- Time unit: milliseconds (ms)
- View detailed test report:
- Performance-Report_4090D
- Performance-Report_1080ti
Version Updates
2025-07-02 – v0.1.3: Added Process and Instance Level Concurrency Monitoring
- Added PID record to logs for easier instance tracking
- Added instance concurrency monitoring feature for real-time concurrency viewing
- Optimized performance analysis interface to reduce impact on real-time performance
2025-06-26 – v0.1.2: Persistent Storage Optimization
- Optimized session persistent storage module with asynchronous processing
- Separated time-consuming timeline statistical analysis module to improve response speed
- Optimized timeline recording mechanism to reduce storage overhead
2025-06-19 – v0.1.1: First Packet Performance Optimization
- Added performance monitoring API endpoint /tools/performance-report for real-time performance metrics
- Enhanced timing logs for better performance bottleneck analysis
- Mitigated delay issue caused by first audio packet model invocation
View Historical Versions
2025-06-15 – v0.1.0: Basic Service Framework
Completed the core framework construction of real-time voice conversion service based on Seed-VC, implementing WebSocket streaming inference, performance monitoring, multi-format audio support and other complete basic functions.
- Real-time streaming voice conversion service
- WebSocket API support for PCM and Opus formats
- Complete performance monitoring and statistics system
- Flexible configuration management and environment variable support
- Multi-Worker concurrent processing capability
- Concurrent performance testing framework
? TODO
-
tag – v0.2 – Improve inference efficiency, reduce RTF – v2025-xx
- Optimize timeline_lognize, add delay items for same events
- Add SLOW tags in logs for monitoring receive interval, send interval, and VC-E2E latency
- Optimize session tool\’s file naming
- Add adaptive pitch extraction functionality with corresponding toggle switch
- Change VAD to use ONNX-GPU to improve inference speed
- Complete support for seed-vc V2.0 model
- Explore solutions to reduce model inference latency (e.g., new model architectures, quantization, etc.)
- Use torchaudio to directly read reference audio to GPU, eliminating transfer steps
- Fix file_vc issue with the last block
- Create Docker image, AutoDL image
Acknowledgements
- Seed-VC – Provides powerful underlying voice conversion model
- RVC – Provides basic streaming voice conversion pipeline
