sherpa onnx

2025-12-10 0 834

Supported functions

Speech recognition Speech synthesis Source separation
✔️ ✔️ ✔️
Speaker identification Speaker diarization Speaker verification
✔️ ✔️ ✔️
Spoken Language identification Audio tagging Voice activity detection
✔️ ✔️ ✔️
Keyword spotting Add punctuation Speech enhancement
✔️ ✔️ ✔️

Supported platforms

Architecture Android iOS Windows macOS linux HarmonyOS
x64 ✔️ ✔️ ✔️ ✔️ ✔️
x86 ✔️ ✔️
arm64 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️
arm32 ✔️ ✔️ ✔️
riscv64 ✔️

Supported programming languages

1. C++ 2. C 3. Python 4. JavaScript
✔️ ✔️ ✔️ ✔️
5. Java 6. C# 7. Kotlin 8. Swift
✔️ ✔️ ✔️ ✔️
9. Go 10. Dart 11. Rust 12. Pascal
✔️ ✔️ ✔️ ✔️

For Rust support, please see sherpa-rs

It also supports WebAssembly.

Introduction

This repository supports running the following functions locally

  • Speech-to-text (i.e., ASR); both streaming and non-streaming are supported
  • Text-to-speech (i.e., TTS)
  • Speaker diarization
  • Speaker identification
  • Speaker verification
  • Spoken language identification
  • Audio tagging
  • VAD (e.g., silero-vad)
  • Speech enhancement (e.g., gtcrn)
  • Keyword spotting
  • Source separation (e.g., spleeter, UVR)

on the following platforms and operating systems:

  • x86, x86_64, 32-bit ARM, 64-bit ARM (arm64, aarch64), RISC-V (riscv64), RK NPU
  • Linux, macOS, Windows, openKylin
  • Android, WearOS
  • iOS
  • HarmonyOS
  • NodeJS
  • WebAssembly
  • NVIDIA Jetson Orin NX (Support running on both CPU and GPU)
  • NVIDIA Jetson Nano B01 (Support running on both CPU and GPU)
  • Raspberry Pi
  • RV1126
  • LicheePi4A
  • VisionFive 2
  • 旭日X3派
  • 爱芯派
  • RK3588
  • etc

with the following APIs

  • C++, C, Python, Go, C#
  • Java, Kotlin, JavaScript
  • Swift, Rust
  • Dart, Object Pascal

Links for Huggingface Spaces

You can visit the following Huggingface spaces to try sherpa-onnx without
installing anything. All you need is a browser.
Description URL 中国镜像
Speaker diarization Click me 镜像
Speech recognition Click me 镜像
Speech recognition with Whisper Click me 镜像
Speech synthesis Click me 镜像
Generate subtitles Click me 镜像
Audio tagging Click me 镜像
Source separation Click me 镜像
Spoken language identification with Whisper Click me 镜像

We also have spaces built using WebAssembly. They are listed below:

Description Huggingface space ModelScope space
Voice activity detection with silero-vad Click me 地址
Real-time speech recognition (Chinese + English) with Zipformer Click me 地址
Real-time speech recognition (Chinese + English) with Paraformer Click me 地址
Real-time speech recognition (Chinese + English + Cantonese) with Paraformer-large Click me 地址
Real-time speech recognition (English) Click me 地址
VAD + speech recognition (Chinese) with Zipformer CTC Click me 地址
VAD + speech recognition (Chinese + English + Korean + Japanese + Cantonese) with SenseVoice Click me 地址
VAD + speech recognition (English) with Whisper tiny.en Click me 地址
VAD + speech recognition (English) with Moonshine tiny Click me 地址
VAD + speech recognition (English) with Zipformer trained with GigaSpeech Click me 地址
VAD + speech recognition (Chinese) with Zipformer trained with WenetSpeech Click me 地址
VAD + speech recognition (Japanese) with Zipformer trained with ReazonSpeech Click me 地址
VAD + speech recognition (Thai) with Zipformer trained with GigaSpeech2 Click me 地址
VAD + speech recognition (Chinese 多种方言) with a TeleSpeech-ASR CTC model Click me 地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-large Click me 地址
VAD + speech recognition (English + Chinese, 及多种中文方言) with Paraformer-small Click me 地址
VAD + speech recognition (多语种及多种中文方言) with Dolphin-base Click me 地址
Speech synthesis (English) Click me 地址
Speech synthesis (German) Click me 地址
Speaker diarization Click me 地址

Links for pre-built Android APKs

You can find pre-built Android APKs for this repository in the following table
Description URL 中国用户
Speaker diarization Address 点此
Streaming speech recognition Address 点此
Simulated-streaming speech recognition Address 点此
Text-to-speech Address 点此
Voice activity detection (VAD) Address 点此
VAD + non-streaming speech recognition Address 点此
Two-pass speech recognition Address 点此
Audio tagging Address 点此
Audio tagging (WearOS) Address 点此
Speaker identification Address 点此
Spoken language identification Address 点此
Keyword spotting Address 点此

Links for pre-built Flutter APPs

Real-time speech recognition

Description URL 中国用户
Streaming speech recognition Address 点此

Text-to-speech

Description URL 中国用户
Android (arm64-v8a, armeabi-v7a, x86_64) Address 点此
Linux (x64) Address 点此
macOS (x64) Address 点此
macOS (arm64) Address 点此
Windows (x64) Address 点此

Note: You need to build from source for iOS.

Links for pre-built Lazarus APPs

Generating subtitles

Description URL 中国用户
Generate subtitles (生成字幕) Address 点此

Links for pre-trained models

Description URL
Speech recognition (speech to text, ASR) Address
Text-to-speech (TTS) Address
VAD Address
Keyword spotting Address
Audio tagging Address
Speaker identification (Speaker ID) Address
Spoken language identification (Language ID) See multi-lingual Whisper ASR models from Speech recognition
Punctuation Address
Speaker segmentation Address
Speech enhancement Address
Source separation Address

Some pre-trained ASR models (Streaming)

Please see

  • https://k2-fsa.*g*ithu*b.io/sherpa/onnx/pretrained_models/online-transducer/index.html
  • https://k2-fsa.gi***thub.io/sherpa/onnx/pretrained_models/online-paraformer/index.html
  • https://k2-fsa.git***hub.io/sherpa/onnx/pretrained_models/online-ctc/index.html

for more models. The following table lists only SOME of them.

Name Supported Languages Description
sherpa-onnx-streaming-zipformer-bilingual-zh-en-2023-02-20 Chinese, English See also
sherpa-onnx-streaming-zipformer-small-bilingual-zh-en-2023-02-16 Chinese, English See also
sherpa-onnx-streaming-zipformer-zh-14M-2023-02-23 Chinese Suitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-en-20M-2023-02-17 English Suitable for Cortex A7 CPU. See also
sherpa-onnx-streaming-zipformer-korean-2024-06-16 Korean See also
sherpa-onnx-streaming-zipformer-fr-2023-04-14 French See also

Some pre-trained ASR models (Non-Streaming)

Please see

  • https://k2-fsa.*git*h*ub.io/sherpa/onnx/pretrained_models/offline-transducer/index.html
  • https://k2-fsa.gi***thub.io/sherpa/onnx/pretrained_models/offline-paraformer/index.html
  • https://k2-fsa.git*h**ub.io/sherpa/onnx/pretrained_models/offline-ctc/index.html
  • https://k2-fsa.g**ithu*b.io/sherpa/onnx/pretrained_models/telespeech/index.html
  • https://k2-fsa.gi**t*hub.io/sherpa/onnx/pretrained_models/whisper/index.html

for more models. The following table lists only SOME of them.

Name Supported Languages Description
sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8 English It is converted from https://hug*gi*ngf*ace.co/nvidia/parakeet-tdt-0.6b-v2
Whisper tiny.en English See also
Moonshine tiny English See also
sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03 Chinese A Zipformer CTC model
sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17 Chinese, Cantonese, English, Korean, Japanese 支持多种中文方言. See also
sherpa-onnx-paraformer-zh-2024-03-09 Chinese, English 也支持多种中文方言. See also
sherpa-onnx-zipformer-ja-reazonspeech-2024-08-01 Japanese See also
sherpa-onnx-nemo-transducer-giga-am-russian-2024-10-24 Russian See also
sherpa-onnx-nemo-ctc-giga-am-russian-2024-10-24 Russian See also
sherpa-onnx-zipformer-ru-2024-09-18 Russian See also
sherpa-onnx-zipformer-korean-2024-06-24 Korean See also
sherpa-onnx-zipformer-thai-2024-06-20 Thai See also
sherpa-onnx-telespeech-ctc-int8-zh-2024-06-04 Chinese 支持多种方言. See also

Useful links

  • Documentation: https://k2-fsa.gi*th**ub.io/sherpa/onnx/
  • Bilibili 演示视频: https://search.bi*lib*i*li.com/all?keyword=%E6%96%B0%E4%B8%80%E4%BB%A3Kaldi

How to reach us

Please see
https://k2-fsa.g**it*hub.io/sherpa/social-groups.html
for 新一代 Kaldi 微信交流群 and QQ 交流群.

Projects using sherpa-onnx

BreezeApp from MediaTek Research

BreezeAPP is a mobile AI application developed for both Android and iOS platforms.
Users can download it directly from the App Store and enjoy a variety of features
offline, including speech-to-text, text-to-speech, text-based chatbot interactions,
and image question-answering

  • Download APK for BreezeAPP
  • APK 中国镜像
1 2 3

Open-LLM-VTuber

Talk to any LLM with hands-free voice interaction, voice interruption, and Live2D taking
face running locally across platforms

See also Open-LLM-VTuber/Open-LLM-VTuber#50

voiceapi

Streaming ASR and TTS based on FastAPI

It shows how to use the ASR and TTS Python APIs with FastAPI.

腾讯会议摸鱼工具 TMSpeech

Uses streaming ASR in C# with graphical user interface.

Video demo in Chinese: 【开源】Windows实时字幕软件(网课/开会必备)

lol互动助手

It uses the JavaScript API of sherpa-onnx along with Electron

Video demo in Chinese: 爆了!炫神教你开打字挂!真正影响胜率的英雄联盟工具!英雄联盟的最后一块拼图!和游戏中的每个人无障碍沟通!

Sherpa-ONNX 语音识别服务器

A server based on nodejs providing Restful API for speech recognition.

QSmartAssistant

一个模块化,全过程可离线,低占用率的对话机器人/智能音箱

It uses QT. Both ASR
and TTS
are used.

Flutter-EasySpeechRecognition

It extends ./flutter-examples/streaming_asr by
downloading models inside the app to reduce the size of the app.

Note: [Team B] Sherpa AI backend also uses
sherpa-onnx in a Flutter APP.

sherpa-onnx-unity

sherpa-onnx in Unity. See also #1695,
#1892, and #1859

xiaozhi-esp32-server

本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器
Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.

See also

  • ASR新增轻量级sherpa-onnx-asr
  • feat: ASR增加sherpa-onnx模型

KaithemAutomation

Pure Python, GUI-focused home automation/consumer grade SCADA.

It uses TTS from sherpa-onnx. See also Speak command that uses the new globally configured TTS model.

Open-XiaoAI KWS

Enable custom wake word for XiaoAi Speakers. 让小爱音箱支持自定义唤醒词。

Video demo in Chinese: 小爱同学启动~˶╹ꇴ╹˶!

C++ WebSocket ASR Server

It provides a WebSocket server based on C++ for ASR using sherpa-onnx.

Go WebSocket Server

It provides a WebSocket server based on the Go programming language for sherpa-onnx.

Making robot Paimon, Ep10 \”The AI Part 1\”

It is a YouTube video,
showing how the author tried to use AI so he can have a conversation with Paimon.

It uses sherpa-onnx for speech-to-text and text-to-speech.

1

下载源码

通过命令行克隆项目:

git clone https://github.com/k2-fsa/sherpa-onnx.git

收藏 (0) 打赏

感谢您的支持,我会继续努力的!

打开微信/支付宝扫一扫,即可进行扫码打赏哦,分享从这里开始,精彩与您同在
点赞 (0)

申明:本文由第三方发布,内容仅代表作者观点,与本网站无关。对本文以及其中全部或者部分内容的真实性、完整性、及时性本站不作任何保证或承诺,请读者仅作参考,并请自行核实相关内容。本网发布或转载文章出于传递更多信息之目的,并不意味着赞同其观点或证实其描述,也不代表本网对其真实性负责。

左子网 编程相关 sherpa onnx https://www.zuozi.net/33547.html

pyafipws
上一篇: pyafipws
ReplaceOSK
下一篇: ReplaceOSK
常见问题
  • 1、自动:拍下后,点击(下载)链接即可下载;2、手动:拍下后,联系卖家发放即可或者联系官方找开发者发货。
查看详情
  • 1、源码默认交易周期:手动发货商品为1-3天,并且用户付款金额将会进入平台担保直到交易完成或者3-7天即可发放,如遇纠纷无限期延长收款金额直至纠纷解决或者退款!;
查看详情
  • 1、描述:源码描述(含标题)与实际源码不一致的(例:货不对板); 2、演示:有演示站时,与实际源码小于95%一致的(但描述中有”不保证完全一样、有变化的可能性”类似显著声明的除外); 3、发货:不发货可无理由退款; 4、安装:免费提供安装服务的源码但卖家不履行的; 5、收费:价格虚标,额外收取其他费用的(但描述中有显著声明或双方交易前有商定的除外); 6、其他:如质量方面的硬性常规问题BUG等。 注:经核实符合上述任一,均支持退款,但卖家予以积极解决问题则除外。
查看详情
  • 1、左子会对双方交易的过程及交易商品的快照进行永久存档,以确保交易的真实、有效、安全! 2、左子无法对如“永久包更新”、“永久技术支持”等类似交易之后的商家承诺做担保,请买家自行鉴别; 3、在源码同时有网站演示与图片演示,且站演与图演不一致时,默认按图演作为纠纷评判依据(特别声明或有商定除外); 4、在没有”无任何正当退款依据”的前提下,商品写有”一旦售出,概不支持退款”等类似的声明,视为无效声明; 5、在未拍下前,双方在QQ上所商定的交易内容,亦可成为纠纷评判依据(商定与描述冲突时,商定为准); 6、因聊天记录可作为纠纷评判依据,故双方联系时,只与对方在左子上所留的QQ、手机号沟通,以防对方不承认自我承诺。 7、虽然交易产生纠纷的几率很小,但一定要保留如聊天记录、手机短信等这样的重要信息,以防产生纠纷时便于左子介入快速处理。
查看详情

相关文章

猜你喜欢
发表评论
暂无评论
官方客服团队

为您解决烦忧 - 24小时在线 专业服务