mirror of
https://github.com/LCTT/TranslateProject.git
synced 2025-01-13 22:30:37 +08:00
commit
275b0bfca7
@ -1,142 +0,0 @@
|
||||
[#]: subject: "Use Mozilla DeepSpeech to enable speech to text in your application"
|
||||
[#]: via: "https://opensource.com/article/22/1/voice-text-mozilla-deepspeech"
|
||||
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
|
||||
[#]: collector: "lujun9972"
|
||||
[#]: translator: "geekpi"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
Use Mozilla DeepSpeech to enable speech to text in your application
|
||||
======
|
||||
Speech recognition in applications isn't just a fun trick but an
|
||||
important accessibility feature.
|
||||
![Colorful sound wave graph][1]
|
||||
|
||||
One of the primary functions of computers is to parse data. Some data is easier to parse than other data, and voice input continues to be a work in progress. There have been many improvements in the area in recent years, though, and one of them is in the form of DeepSpeech, a project by Mozilla, the foundation that maintains the Firefox web browser. DeepSpeech is a voice-to-text command and library, making it useful for users who need to transform voice input into text and developers who want to provide voice input for their applications.
|
||||
|
||||
### Install DeepSpeech
|
||||
|
||||
DeepSpeech is open source, released under the Mozilla Public License (MPL). You can download the source code from its [GitHub][2] page.
|
||||
|
||||
To install, first create a virtual environment for Python:
|
||||
|
||||
|
||||
```
|
||||
`$ python3 -m pip install deepspeech --user`
|
||||
```
|
||||
|
||||
DeepSpeech relies on machine learning. You can train it yourself, but it's easiest just to download pre-trained model files when you're just starting.
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ mkdir DeepSpeech
|
||||
$ cd Deepspeech
|
||||
$ curl -LO \
|
||||
<https://github.com/mozilla/DeepSpeech/releases/download/vX.Y.Z/deepspeech-X.Y.Z-models.pbmm>
|
||||
$ curl -LO \
|
||||
<https://github.com/mozilla/DeepSpeech/releases/download/vX.Y.Z/deepspeech-X.Y.Z-models.scorer>
|
||||
|
||||
```
|
||||
|
||||
### Applications for users
|
||||
|
||||
With DeepSpeech, you can transcribe recordings of speech to written text. You get the best results from speech cleanly recorded under optimal conditions. However, in a pinch, you can try any recording, and you'll probably get something you can use as a starting point for manual transcription.
|
||||
|
||||
For test purposes, you might record an audio file containing the simple phrase, "This is a test. Hello world, this is a test." Save the audio as a `.wav` file called `hello-test.wav`.
|
||||
|
||||
In your DeepSpeech folder, launch a transcription by providing the model file, the scorer file, and your audio:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ deepspeech --model deepspeech*pbmm \
|
||||
\--scorer deepspeech*scorer \
|
||||
\--audio hello-test.wav
|
||||
|
||||
```
|
||||
|
||||
Output is provided to the standard out (your terminal):
|
||||
|
||||
|
||||
```
|
||||
`this is a test hello world this is a test`
|
||||
```
|
||||
|
||||
You can get output in JSON format by using the `--json` option:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ deepspeech --model deepspeech*pbmm \
|
||||
\-- json
|
||||
\--scorer deepspeech*scorer \
|
||||
\--audio hello-test.wav
|
||||
|
||||
```
|
||||
|
||||
This renders each word along with a timestamp:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
{
|
||||
"transcripts": [
|
||||
{
|
||||
"confidence": -42.7990608215332,
|
||||
"words": [
|
||||
{
|
||||
"word": "this",
|
||||
"start_time": 2.54,
|
||||
"duration": 0.12
|
||||
},
|
||||
{
|
||||
"word": "is",
|
||||
"start_time": 2.74,
|
||||
"duration": 0.1
|
||||
},
|
||||
{
|
||||
"word": "a",
|
||||
"start_time": 2.94,
|
||||
"duration": 0.04
|
||||
},
|
||||
{
|
||||
"word": "test",
|
||||
"start_time": 3.06,
|
||||
"duration": 0.74
|
||||
},
|
||||
[...]
|
||||
|
||||
```
|
||||
|
||||
### Developers
|
||||
|
||||
DeepSpeech isn't just a command to transcribe pre-recorded audio. You can also use it to process audio streams in real time. The GitHub repository [DeepSpeech-examples][3] is full of JavaScript, Python, C#, and Java for Android.
|
||||
|
||||
Most of the hard work is already done, so integrating DeepSpeech usually is just a matter of referencing the DeepSpeech library and knowing how to obtain the audio from the host device (which you generally do through the `/dev` filesystem on Linux or an SDK on Android and other platforms.)
|
||||
|
||||
### Speech recognition
|
||||
|
||||
As a developer, enabling speech recognition for your application isn't just a fun trick but an important accessibility feature that makes your application easier to use by people with mobility issues, low vision, and chronic multi-taskers who like to keep their hands full. As a user, DeepSpeech is a useful transcription tool that can convert audio files into text. Regardless of your use case, try DeepSpeech and see what it can do for you.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/22/1/voice-text-mozilla-deepspeech
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/colorful_sound_wave.png?itok=jlUJG0bM (Colorful sound wave graph)
|
||||
[2]: https://github.com/mozilla/DeepSpeech
|
||||
[3]: https://github.com/mozilla/DeepSpeech-examples
|
@ -0,0 +1,141 @@
|
||||
[#]: subject: "Use Mozilla DeepSpeech to enable speech to text in your application"
|
||||
[#]: via: "https://opensource.com/article/22/1/voice-text-mozilla-deepspeech"
|
||||
[#]: author: "Seth Kenlon https://opensource.com/users/seth"
|
||||
[#]: collector: "lujun9972"
|
||||
[#]: translator: "geekpi"
|
||||
[#]: reviewer: " "
|
||||
[#]: publisher: " "
|
||||
[#]: url: " "
|
||||
|
||||
使用 Mozilla DeepSpeech 在你的应用中实现语音转文字
|
||||
======
|
||||
应用中的语音识别不仅仅是一个有趣的技巧,而且是一个重要的无障碍功能。
|
||||
![Colorful sound wave graph][1]
|
||||
|
||||
计算机的主要功能之一是解析数据。有些数据比其他数据更容易解析,而语音输入仍然是一项进展中的工作。不过,近年来该领域已经有了许多改进,其中之一就是 DeepSpeech,这是 Mozilla 的一个项目,Mozilla 是维护 Firefox 浏览器的基金会。DeepSpeech 是一个语音到文本的命令和库,使其对需要将语音输入转化为文本的用户和希望为其应用提供语音输入的开发者都很有用。
|
||||
|
||||
### 安装 DeepSpeech
|
||||
|
||||
DeepSpeech 是开源的,使用 Mozilla 公共许可证(MPL)发布。你可以从其 [GitHub][2] 页面下载源码。
|
||||
|
||||
要安装,首先为 Python 创建一个虚拟环境:
|
||||
|
||||
|
||||
```
|
||||
`$ python3 -m pip install deepspeech --user`
|
||||
```
|
||||
|
||||
DeepSpeech 依靠的是机器学习。你可以自己训练它,但最简单的是在刚开始时下载预训练的模型文件。
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ mkdir DeepSpeech
|
||||
$ cd Deepspeech
|
||||
$ curl -LO \
|
||||
<https://github.com/mozilla/DeepSpeech/releases/download/vX.Y.Z/deepspeech-X.Y.Z-models.pbmm>
|
||||
$ curl -LO \
|
||||
<https://github.com/mozilla/DeepSpeech/releases/download/vX.Y.Z/deepspeech-X.Y.Z-models.scorer>
|
||||
|
||||
```
|
||||
|
||||
### 用户的应用
|
||||
|
||||
通过 DeepSpeech,你可以将语音的录音转录成书面文字。你可以从在最佳条件下干净录制的语音中得到最好的结果。然而,在紧要关头,你可以尝试任何录音,你可能会得到一些你需要手动转录的东西。
|
||||
|
||||
为了测试,你可以录制一个包含简单短语的音频文件:“This is a test. Hello world, this is a test”。将音频保存为一个 `.wav` 文件,名为 `hello-test.wav`。
|
||||
|
||||
在你的 DeepSpeech 文件夹中,通过提供模型文件、评分器文件和你的音频启动一个转录:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ deepspeech --model deepspeech*pbmm \
|
||||
\--scorer deepspeech*scorer \
|
||||
\--audio hello-test.wav
|
||||
|
||||
```
|
||||
|
||||
输出到标准输出(你的终端):
|
||||
|
||||
|
||||
```
|
||||
`this is a test hello world this is a test`
|
||||
```
|
||||
|
||||
你可以通过使用 `--json` 选项获得 JSON 格式的输出:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
$ deepspeech --model deepspeech*pbmm \
|
||||
\-- json
|
||||
\--scorer deepspeech*scorer \
|
||||
\--audio hello-test.wav
|
||||
|
||||
```
|
||||
|
||||
这就把每个词和时间戳一起渲染出来:
|
||||
|
||||
|
||||
```
|
||||
|
||||
|
||||
{
|
||||
"transcripts": [
|
||||
{
|
||||
"confidence": -42.7990608215332,
|
||||
"words": [
|
||||
{
|
||||
"word": "this",
|
||||
"start_time": 2.54,
|
||||
"duration": 0.12
|
||||
},
|
||||
{
|
||||
"word": "is",
|
||||
"start_time": 2.74,
|
||||
"duration": 0.1
|
||||
},
|
||||
{
|
||||
"word": "a",
|
||||
"start_time": 2.94,
|
||||
"duration": 0.04
|
||||
},
|
||||
{
|
||||
"word": "test",
|
||||
"start_time": 3.06,
|
||||
"duration": 0.74
|
||||
},
|
||||
[...]
|
||||
|
||||
```
|
||||
|
||||
### 开发者
|
||||
|
||||
DeepSpeech 不仅仅是一个转录预先录制的音频的命令。你也可以用它来实时处理音频流。GitHub 仓库 [DeepSpeech-examples][3] 中充满了 JavaScript、Python、C# 和 Android 的 Java 代码。
|
||||
|
||||
大部分困难的工作已经完成,所以集成 DeepSpeech 通常只是引用 DeepSpeech 库,并知道如何从主机设备上获得音频(你通常通过 Linux 上的 `/dev` 文件系统或 Android 和其他平台上的 SDK 来完成。)
|
||||
|
||||
### 语音识别
|
||||
|
||||
作为一个开发者,为你的应用启用语音识别不只是一个有趣的技巧,而是一个重要的无障碍功能,它使你的应用更容易被有行动问题的人、低视力的人和长期多任务处理的人使用。作为用户,DeepSpeech 是一个有用的转录工具,可以将音频文件转换为文本。无论你的使用情况如何,请尝试 DeepSpeech,看看它能为你做什么。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/22/1/voice-text-mozilla-deepspeech
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[geekpi](https://github.com/geekpi)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/colorful_sound_wave.png?itok=jlUJG0bM (Colorful sound wave graph)
|
||||
[2]: https://github.com/mozilla/DeepSpeech
|
||||
[3]: https://github.com/mozilla/DeepSpeech-examples
|
Loading…
Reference in New Issue
Block a user