桥梁建设杂志有假网站吗,做电商网站用什么软件开发,seo培训课程,网站开发常用哪几种语言一、说明 学习如何使用语音识别 Python 库执行语音识别#xff0c;以在 Python 中将音频语音转换为文本。想要更快地编码吗#xff1f;我们的Python 代码生成器让您只需点击几下即可创建 Python 脚本。现在就现在试试#xff01;
二、语言AI库
2.1 相当给力的转文字库 语音…一、说明 学习如何使用语音识别 Python 库执行语音识别以在 Python 中将音频语音转换为文本。想要更快地编码吗我们的Python 代码生成器让您只需点击几下即可创建 Python 脚本。现在就现在试试
二、语言AI库
2.1 相当给力的转文字库 语音识别是计算机软件识别口语中的单词和短语并将其转换为人类可读文本的能力。在本教程中您将学习如何使用SpeechRecognition 库在 Python 中将语音转换为文本。 因此我们不需要从头开始构建任何机器学习模型这个库为我们提供了各种知名公共语音识别 API例如 Google Cloud Speech API、IBM Speech To Text 等的便捷包装。 请注意如果您不想使用 API而是直接对机器学习模型进行推理那么一定要查看本教程其中我将向您展示如何使用当前最先进的机器学习模型在Python中执行语音识别。 另外如果您想要其他方法来执行 ASR请查看此语音识别综合教程。 另请学习如何在 Python 中翻译文本。
2.2 安装过程 好吧让我们开始使用以下命令安装库pip
pip3 install SpeechRecognition pydub好的打开一个新的 Python 文件并导入它
import speech_recognition as sr这个库的好处是它支持多种识别引擎
CMU Sphinx离线谷歌语音识别谷歌云语音API维特人工智能微软必应语音识别Houndify APIIBM 语音转文本Snowboy 热词检测离线 我们将在这里使用 Google 语音识别因为它很简单并且不需要任何 API 密钥。
2.3 转录音频文件 确保当前目录中有一个包含英语演讲的音频文件如果您想跟我一起学习请在此处获取音频文件
filename 16-122828-0002.wav该文件是从LibriSpeech数据集中获取的但您可以使用任何您想要的音频 WAV 文件只需更改文件名让我们初始化我们的语音识别器
# initialize the recognizer
r sr.Recognizer()下面的代码负责加载音频文件并使用 Google 语音识别将语音转换为文本
# open the file
with sr.AudioFile(filename) as source:# listen for the data (load audio to memory)audio_data r.record(source)# recognize (convert from speech to text)text r.recognize_google(audio_data)print(text)这将需要几秒钟才能完成因为它将文件上传到 Google 并获取输出这是我的结果
I believe youre just talking nonsense上面的代码适用于小型或中型音频文件。在下一节中我们将为大文件编写代码。
2.4 转录大型音频文件 如果您想对长音频文件执行语音识别那么下面的函数可以很好地处理这个问题
# importing libraries
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence# create a speech recognition object
r sr.Recognizer()# a function to recognize speech in the audio file
# so that we dont repeat ourselves in in other functions
def transcribe_audio(path):# use the audio file as the audio sourcewith sr.AudioFile(path) as source:audio_listened r.record(source)# try converting it to texttext r.recognize_google(audio_listened)return text# a function that splits the audio file into chunks on silence
# and applies speech recognition
def get_large_audio_transcription_on_silence(path):Splitting the large audio file into chunksand apply speech recognition on each of these chunks# open the audio file using pydubsound AudioSegment.from_file(path) # split audio sound where silence is 500 miliseconds or more and get chunkschunks split_on_silence(sound,# experiment with this value for your target audio filemin_silence_len 500,# adjust this per requirementsilence_thresh sound.dBFS-14,# keep the silence for 1 second, adjustable as wellkeep_silence500,)folder_name audio-chunks# create a directory to store the audio chunksif not os.path.isdir(folder_name):os.mkdir(folder_name)whole_text # process each chunk for i, audio_chunk in enumerate(chunks, start1):# export audio chunk and save it in# the folder_name directory.chunk_filename os.path.join(folder_name, fchunk{i}.wav)audio_chunk.export(chunk_filename, formatwav)# recognize the chunktry:text transcribe_audio(chunk_filename)except sr.UnknownValueError as e:print(Error:, str(e))else:text f{text.capitalize()}. print(chunk_filename, :, text)whole_text text# return the text for all chunks detectedreturn whole_textemsp;emsp; font face楷体 size4注意您需要安装Pydub才能pip使上述代码正常工作。上述函数使用模块split_on_silence()中的函数pydub.silence在静音时将音频数据分割成块。该min_silence_len参数是用于分割的最小静音长度以毫秒为单位。silence_thresh是阈值任何比这更安静的东西都将被视为静音我将其设置为平均dBFS - 14keep_silence参数是在检测到的每个块的开头和结尾处留下的静音量以毫秒为单位。这些参数并不适合所有声音文件请尝试根据您的大量音频需求尝试这些参数。之后我们迭代所有块并将每个语音音频转换为文本然后将它们加在一起这是一个运行示例path 7601-291468-0006.wav
print(\nFull text:, get_large_audio_transcription_on_silence(path))
注意您可以在此处7601-291468-0006.wav获取文件。输出python
audio-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat.
audio-chunks\chunk2.wav : At a short distance from the city.
audio-chunks\chunk3.wav : Just at what is now called dutch street.
audio-chunks\chunk4.wav : Sooner bounded with proofs of his ingenuity.
audio-chunks\chunk5.wav : Patent smokejacks.
audio-chunks\chunk6.wav : It required a horse to work some.
audio-chunks\chunk7.wav : Dutch oven roasted meat without fire.
audio-chunks\chunk8.wav : Carts that went before the horses.
audio-chunks\chunk9.wav : Weather cox that turned against the wind and other wrongheaded contrivances.
audio-chunks\chunk10.wav : So just understand can found it all beholders. Full text: His abode which you had fixed in a bowery or country seat. At a short distance from the city. Just at what is now called dutch street. Sooner bounded with proofs of his ingenuity. Patent smokejacks. It required a horse to work some. Dutch oven roasted meat without fire. Carts that went before the horses. Weather cox that turned against the wind and other wrongheaded contrivances. So just understand can found it all beholders.因此该函数会自动为我们创建一个文件夹并放置我们指定的原始音频文件块然后对所有这些文件运行语音识别。
如果您想将音频文件分割成固定的间隔我们可以使用以下函数
# a function that splits the audio file into fixed interval chunks
# and applies speech recognition
def get_large_audio_transcription_fixed_interval(path, minutes5):Splitting the large audio file into fixed interval chunksand apply speech recognition on each of these chunks# open the audio file using pydubsound AudioSegment.from_file(path) # split the audio file into chunkschunk_length_ms int(1000 * 60 * minutes) # convert to millisecondschunks [sound[i:i chunk_length_ms] for i in range(0, len(sound), chunk_length_ms)]folder_name audio-fixed-chunks# create a directory to store the audio chunksif not os.path.isdir(folder_name):os.mkdir(folder_name)whole_text # process each chunk for i, audio_chunk in enumerate(chunks, start1):# export audio chunk and save it in# the folder_name directory.chunk_filename os.path.join(folder_name, fchunk{i}.wav)audio_chunk.export(chunk_filename, formatwav)# recognize the chunktry:text transcribe_audio(chunk_filename)except sr.UnknownValueError as e:print(Error:, str(e))else:text f{text.capitalize()}. print(chunk_filename, :, text)whole_text text# return the text for all chunks detectedreturn whole_text上述函数将大音频文件分割成 5 分钟的块。您可以更改minutes参数以满足您的需要。由于我的音频文件不是那么大我尝试将其分成 10 秒的块
print(\nFull text:, get_large_audio_transcription_fixed_interval(path, minutes1/6))输出
audio-fixed-chunks\chunk1.wav : His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called.
audio-fixed-chunks\chunk2.wav : Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some.
audio-fixed-chunks\chunk3.wav : Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong
head.
audio-fixed-chunks\chunk4.wav : Contrivances that astonished and confound it all beholders. Full text: His abode which you had fixed in a bowery or country seat at a short distance from the city just that one is now called. Dutch street soon abounded with proofs of his ingenuity patent smokejacks that required a horse to work some. Oven roasted meat without fire carts that went before the horses weather cox that turned against the wind and other wrong head. Contrivances that astonished and confound it all beholders.2.5 从麦克风读取 这需要在您的计算机上安装PyAudio 以下是根据您的操作系统安装的过程
windows 你可以直接pip 安装它
$ pip3 install pyaudioLinux 您需要先安装依赖项
$ sudo apt-get install python-pyaudio python3-pyaudio
$ pip3 install pyaudio苹果系统 你需要先安装portaudio然后你可以直接 pip 安装它
$ brew install portaudio
$ pip3 install pyaudio现在让我们使用麦克风来转换我们的语音
import speech_recognition as srwith sr.Microphone() as source:# read the audio data from the default microphoneaudio_data r.record(source, duration5)print(Recognizing...)# convert speech to texttext r.recognize_google(audio_data)print(text)这将从您的麦克风中听到 5 秒钟然后尝试将语音转换为文本 它与前面的代码非常相似但是我们在这里使用该Microphone()对象从默认麦克风读取音频然后我们使用函数duration中的参数record()在5秒后停止读取然后将音频数据上传到Google以获取输出文本。 您还可以使用函数offset中的参数在几秒record()后开始录制offset。 此外您可以通过将language参数传递给recognize_google()函数来识别不同的语言。例如如果您想识别西班牙语语音您可以使用
text r.recognize_google(audio_data, languagees-ES)在此 StackOverflow 答案中查看支持的语言。
三、结论 正如您所看到的使用这个库将语音转换为文本非常容易和简单。这个库在野外被广泛使用。查看官方文档。 如果您也想在 Python 中将文本转换为语音请查看本教程。 另请阅读 如何使用 Python 识别图像中的光学字符。快乐编码