mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-25 23:11:02 +08:00
translated
This commit is contained in:
parent
342e735b76
commit
39c126158f
@ -1,131 +0,0 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (luuming)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (5 Good Open Source Speech Recognition/Speech-to-Text Systems)
|
||||
[#]: via: (https://fosspost.org/lists/open-source-speech-recognition-speech-to-text)
|
||||
[#]: author: (Simon James https://fosspost.org/author/simonjames)
|
||||
|
||||
5 Good Open Source Speech Recognition/Speech-to-Text Systems
|
||||
======
|
||||
|
||||
![](https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/open-source-speech-recognition-speech-to-text.png?resize=1237%2C527&ssl=1)
|
||||
|
||||
A speech-to-text (STT) system is as its name implies; A way of transforming the spoken words via sound into textual files that can be used later for any purpose.
|
||||
|
||||
Speech-to-text technology is extremely useful. It can be used for a lot of applications such as a automation of transcription, writing books/texts using your own sound only, enabling complicated analyses on information using the generated textual files and a lot of other things.
|
||||
|
||||
In the past, the speech-to-text technology was dominated by proprietary software and libraries; Open source alternatives didn’t exist or existed with extreme limitations and no community around. This is changing, today there are a lot of open source speech-to-text tools and libraries that you can use right now.
|
||||
|
||||
Here we list 5 of them.
|
||||
|
||||
### Open Source Speech Recognition Libraries
|
||||
|
||||
#### Project DeepSpeech
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 15 open source speech recognition][1]
|
||||
|
||||
This project is made by Mozilla; The organization behind the Firefox browser. It’s a 100% free and open source speech-to-text library that also implies the machine learning technology using TensorFlow framework to fulfill its mission.
|
||||
|
||||
In other words, you can use it to build training models yourself to enhance the underlying speech-to-text technology and get better results, or even to bring it to other languages if you want. You can also easily integrate it to your other machine learning projects that you are having on TensorFlow. Sadly it sounds like the project is currently only supporting English by default.
|
||||
|
||||
It’s also available in many languages such as Python (3.6); Which allows you to have it working in seconds:
|
||||
|
||||
```
|
||||
pip3 install deepspeech
|
||||
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
|
||||
```
|
||||
|
||||
You can also install it using npm:
|
||||
|
||||
```
|
||||
npm install deepspeech
|
||||
```
|
||||
|
||||
For more information, refer to the [project’s homepage][2].
|
||||
|
||||
#### Kaldi
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 17 open source speech recognition][3]
|
||||
|
||||
Kaldi is an open source speech recognition software written in C++, and is released under the Apache public license. It works on Windows, macOS and Linux. Its development started back in 2009.
|
||||
|
||||
Kaldi’s main features over some other speech recognition software is that it’s extendable and modular; The community is providing tons of 3rd-party modules that you can use for your tasks. Kaldi also supports deep neural networks, and offers an [excellent documentation on its website][4].
|
||||
|
||||
While the code is mainly written in C++, it’s “wrapped” by Bash and Python scripts. So if you are looking just for the basic usage of converting speech to text, then you’ll find it easy to accomplish that via either Python or Bash.
|
||||
|
||||
[Project’s homepage][5].
|
||||
|
||||
#### Julius
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 19 open source speech recognition][6]
|
||||
|
||||
Probably one of the oldest speech recognition software ever; It’s development started in 1991 at the University of Kyoto, and then its ownership was transferred to an independent project team in 2005.
|
||||
|
||||
Julius main features include its ability to perform real-time STT processes, low memory usage (Less than 64MB for 20000 words), ability to produce N-best/Word-graph output, ability to work as a server unit and a lot more. This software was mainly built for academic and research purposes. It is written in C, and works on Linux, Windows, macOS and even Android (on smartphones).
|
||||
|
||||
Currently it supports both English and Japanese languages only. The software is probably availbale to install easily in your Linux distribution’s repository; Just search for julius package in your package manager. The latest version was [released][7] around one and half months ago.
|
||||
|
||||
[Project’s homepage][8].
|
||||
|
||||
#### Wav2Letter++
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 21 open source speech recognition][9]
|
||||
|
||||
If you are looking for something modern, then this one is for you. Wav2Letter++ is an open source speech recognition software that was released by Facebook’s AI Research Team just 2 months ago. The code is released under the BSD license.
|
||||
|
||||
Facebook is [describing][10] its library as “the fastest state-of-the-art speech recognition system available”. The concepts on which this tool is built makes it optimized for performance by default; Facebook’s also-new machine learning library [FlashLight][11] is used as the underlying core of Wav2Letter++.
|
||||
|
||||
Wav2Letter++ needs you first to build a training model for the language you desire by yourself in order to train the algorithms on it. No pre-built support of any language (including English) is available; It’s just a machine-learning-driven tool to convert speech to text. It was written in C++, hence the name (Wav2Letter++).
|
||||
|
||||
[Project’s homepage][12].
|
||||
|
||||
#### DeepSpeech2
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 23 open source speech recognition][13]
|
||||
|
||||
Researchers at the Chinese giant Baidu are also working on their own speech-to-text engine, called DeepSpeech2. It’s an end-to-end open source engine that uses the “PaddlePaddle” deep learning framework for converting both English & Mandarin Chinese languages speeches into text. The code is released under BSD license.
|
||||
|
||||
The engine can be trained on any model and for any language you desire. The models are not released with the code; You’ll have to build them yourself, just like the other software. DeepSpeech2’s source code is written in Python; So it should be easy for you to get familiar with it if that’s the language you use.
|
||||
|
||||
[Project’s homepage][14].
|
||||
|
||||
### Conclusion
|
||||
|
||||
The speech recognition category is still mainly dominated by proprietary software giants like Google and IBM (which do provide their own closed-source commercial services for this), but the open source alternatives are promising. Those 5 open source speech recognition engines should get you going in building your application, all of them are still under heavy development by time. In few years, we expect open source to become the norm for those technologies just like in the other industries.
|
||||
|
||||
If you have any other recommendations for this list, or comments in general, we’d love to hear them below!
|
||||
|
||||
**
|
||||
|
||||
Shares
|
||||
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://fosspost.org/lists/open-source-speech-recognition-speech-to-text
|
||||
|
||||
作者:[Simon James][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://fosspost.org/author/simonjames
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/hero_speech-machine-learning2.png?resize=820%2C280&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 16 open source speech recognition)
|
||||
[2]: https://github.com/mozilla/DeepSpeech
|
||||
[3]: https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/Screenshot-at-2019-02-19-1134.png?resize=591%2C138&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 18 open source speech recognition)
|
||||
[4]: http://kaldi-asr.org/doc/index.html
|
||||
[5]: http://kaldi-asr.org
|
||||
[6]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/mic_web.png?resize=385%2C100&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 20 open source speech recognition)
|
||||
[7]: https://github.com/julius-speech/julius/releases
|
||||
[8]: https://github.com/julius-speech/julius
|
||||
[9]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/fully_convolutional_ASR.png?resize=850%2C177&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 22 open source speech recognition)
|
||||
[10]: https://code.fb.com/ai-research/wav2letter/
|
||||
[11]: https://github.com/facebookresearch/flashlight
|
||||
[12]: https://github.com/facebookresearch/wav2letter
|
||||
[13]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/ds2.png?resize=850%2C313&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 24 open source speech recognition)
|
||||
[14]: https://github.com/PaddlePaddle/DeepSpeech
|
@ -0,0 +1,127 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (luuming)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (5 Good Open Source Speech Recognition/Speech-to-Text Systems)
|
||||
[#]: via: (https://fosspost.org/lists/open-source-speech-recognition-speech-to-text)
|
||||
[#]: author: (Simon James https://fosspost.org/author/simonjames)
|
||||
|
||||
5 款不错的开源语音识别/语音文字转换系统
|
||||
|
||||
======
|
||||
|
||||
![](https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/open-source-speech-recognition-speech-to-text.png?resize=1237%2C527&ssl=1)
|
||||
|
||||
<ruby>语音文字转换<rt>speech-to-text</rt></ruby>(STT)系统就像它名字所蕴含的那样,是一种将说出的单词转换为文本文件以供后续用途的方式。
|
||||
|
||||
语音文字转换技术非常有用。它可以用到许多应用中,例如自动转录,使用自己的声音写书籍或文本,用生成的文本文件和其他工具做复杂的分析等。
|
||||
|
||||
在过去,语音文字转换技术以专有软件和库为主导,开源替代品并不存在或是有严格的限制并且没有社区。这一点正在发生改变,当今有许多开源语音文字转换工具和库可以让你立即使用。
|
||||
|
||||
这里我列出了 5 个。
|
||||
|
||||
### 开源语音识别库
|
||||
|
||||
#### DeepSpeech 项目
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 15 open source speech recognition][1]
|
||||
|
||||
该项目由 Firefox 浏览器背后的组织 Mozilla 团队开发。它 100% 自由并且使用 TensorFlow 机器学习框架实现。
|
||||
|
||||
换句话说,你可以用它训练自己的模型获得更好的效果,甚至可以用它转换其它的语言。你也可以轻松的将它集成到自己的 Tensorflow 机器学习项目中。可惜的是项目当前默认仅支持英语。
|
||||
|
||||
它也支持许多编程语言,例如 Python(3.6)。可以让你在数秒之内获取:
|
||||
|
||||
```
|
||||
pip3 install deepspeech
|
||||
deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio my_audio_file.wav
|
||||
```
|
||||
|
||||
你也可以通过 npm 安装它:
|
||||
|
||||
```
|
||||
npm install deepspeech
|
||||
```
|
||||
|
||||
想要获得更多信息,请参考[项目主页][2]。
|
||||
|
||||
#### Kaldi
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 17 open source speech recognition][3]
|
||||
|
||||
Kaldi 是一个用 C++ 写的开源语音识别软件,并且在 Apache 公共许可下发布。它可以运行在 Windows,macOS 和 Linux 上。它的开发始于 2009。
|
||||
|
||||
Kaldi 超过其他语音识别软件的主要特点是可扩展和模块化。社区提供了大量的三方模块可以用来完成你的任务。Kaldi 也支持深度神经网络,并且在它的网站上提供了[出色的文档][4]。
|
||||
|
||||
虽然代码主要由 C++ 完成,但它通过 Bash 和 Python 脚本进行了封装。因此,如果你仅仅想使用基本的语音到文字转换功能,你就会发现通过 Python 或 Bash 能够轻易的完成。
|
||||
|
||||
[Project’s homepage][5].
|
||||
|
||||
#### Julius
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 19 open source speech recognition][6]
|
||||
|
||||
可能是有史以来最古老的语音识别软件之一。它的发展始于 1991 年的京都大学,之后在 2005 年将所有权转移到了一个独立的项目组。
|
||||
|
||||
Julius 的主要特点包括了执行实时 STT 的能力,低内存占用(20000 单词少于 64 MB),输出<ruby>最优词<rt>N-best word</rt></ruby>/<ruby>词图<rt>Word-graph</rt></ruby>的能力,当作服务器单元运行的能力和很多东西。这款软件主要为学术和研究所设计。由 C 语言写成,并且可以运行在 Linux,Windows,macOS 甚至 Android(在智能手机上)。
|
||||
|
||||
它当前仅支持英语和日语。软件或许易于从 Linux 发行版的仓库中安装。只要在软件包管理器中搜索 julius 即可。最新的版本[发布][7]于大约一个半月之前。
|
||||
|
||||
[Project’s homepage][8].
|
||||
|
||||
#### Wav2Letter++
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 21 open source speech recognition][9]
|
||||
|
||||
如果你在寻找一个更加时髦的,那么这款一定适合。Wav2Letter++ 是一款由 Facebook 的 AI 研究团队于 2 个月之前发布的开源语言识别软件。代码在 BSD 许可下发布。
|
||||
|
||||
Facebook 描述它的库是“最快<ruby>最先进<rt>state-of-the-art</rt></ruby>的语音识别系统”。构建它时的想法使其能在默认情况下对性能进行优化。Facebook 最新的机器学习库 [FlashLight][11] 也被用作 Wav2Letter++ 的底层核心。
|
||||
|
||||
Wav2Letter++ 需要你先为所描述的语言建立一个模型来训练算法。没有任何一种语言(包括英语)的预训练模型,它仅仅是个机器学习驱动的文本语音转换工具,它用 C++ 写成,因此命名为 Wav2Letter++。
|
||||
|
||||
[Project’s homepage][12].
|
||||
|
||||
#### DeepSpeech2
|
||||
|
||||
![5 Good Open Source Speech Recognition/Speech-to-Text Systems 23 open source speech recognition][13]
|
||||
|
||||
中国巨头百度的研究人员也在开发他们自己的语音文字转换引擎,叫做“DeepSpeech2”。它是一个端对端的开源引擎,使用“PaddlePaddle”深度学习框架进行英语或汉语的文字转换。代码在 BSD 许可下发布。
|
||||
|
||||
引擎可以训练在任何模型之上,并且可以用于任何想要的语言。模型并未随代码一同发布。你要像其他软件那样自己建立模型。DeepSpeech2 的源代码由 Python 写成,如果你使用过就会非常容易上手。
|
||||
|
||||
[Project’s homepage][14].
|
||||
|
||||
### 总结
|
||||
|
||||
语音识别领域仍然主要地由专有软件巨头所占据,比如 Google 和 IBM(它们为此提供了闭源商业服务),但是开源同类软件很有前途。这 5 款开源语音识别引擎应当能够帮助你构建应用,随着时间推移,它们会不断地发展。在几年之后,我们希望开源成为这些技术中的常态,就像其他行业那样。
|
||||
|
||||
如果你对清单有其他的建议或评论,我们很乐意在下面听到。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://fosspost.org/lists/open-source-speech-recognition-speech-to-text
|
||||
|
||||
作者:[Simon James][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/LuuMing)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://fosspost.org/author/simonjames
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/hero_speech-machine-learning2.png?resize=820%2C280&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 16 open source speech recognition)
|
||||
[2]: https://github.com/mozilla/DeepSpeech
|
||||
[3]: https://i0.wp.com/fosspost.org/wp-content/uploads/2019/02/Screenshot-at-2019-02-19-1134.png?resize=591%2C138&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 18 open source speech recognition)
|
||||
[4]: http://kaldi-asr.org/doc/index.html
|
||||
[5]: http://kaldi-asr.org
|
||||
[6]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/mic_web.png?resize=385%2C100&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 20 open source speech recognition)
|
||||
[7]: https://github.com/julius-speech/julius/releases
|
||||
[8]: https://github.com/julius-speech/julius
|
||||
[9]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/fully_convolutional_ASR.png?resize=850%2C177&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 22 open source speech recognition)
|
||||
[10]: https://code.fb.com/ai-research/wav2letter/
|
||||
[11]: https://github.com/facebookresearch/flashlight
|
||||
[12]: https://github.com/facebookresearch/wav2letter
|
||||
[13]: https://i2.wp.com/fosspost.org/wp-content/uploads/2019/02/ds2.png?resize=850%2C313&ssl=1 (5 Good Open Source Speech Recognition/Speech-to-Text Systems 24 open source speech recognition)
|
||||
[14]: https://github.com/PaddlePaddle/DeepSpeech
|
Loading…
Reference in New Issue
Block a user