当前位置：首页 > news >正文

网站信息订阅如何做平面设计软件图标图片

news 2025/11/15 9:01:19

网站信息订阅如何做,平面设计软件图标图片,wordpress的分类,能访问所有网站的浏览器文章目录一、选择系统1.1 更新环境二、安装使用whisper2.1 创建环境2.1 安装2.1.1安装基础包2.1.2安装依赖 3测试13测试2 语着分离创建代码报错ModuleNotFoundError: No module named pyannote报错No module named pyannote_whisper 三、安装使用funASR1 安装1.1 安装 Conda报错ModuleNotFoundError: No module named pyannote报错No module named pyannote_whisper 三、安装使用funASR1 安装1.1 安装 Conda可选1.2 安装 Pytorch版本 1.11.01.3 安装funASR1.4 安装 modelscope可选1.5 如何从本地模型路径推断可选 2 使用funASR2.1 使用funASR2.2 使用 pyannote.audio 进行语者分离第一步安装依赖第二步创建key第三步测试pyannote.audio 2.3 funAS整合pyannote.audio1.1编写算法1.2调用 3.微调一、选择系统这个镜像可以 1.1 更新环境 python -m pip install --upgrade pip 二、安装使用whisper 2.1 创建环境 # ssh登录系统 # 切换到root用户 mkdir /opt/tools/ cd /opt/tools/ # 安装miniconda wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh chmod x Miniconda3-latest-Linux-x86_64.sh ./Miniconda3-latest-Linux-x86_64.sh #按提示操作安装目录建议选择/opt/miniconda3 #创建软链接 ln -s /opt/miniconda3/bin/conda /usr/local/bin/conda #退出shell重新登陆然后后续操作 #创建环境 conda create -n whisper python3.9 conda activate whisper 2.1 安装 2.1.1安装基础包 pip install -U openai-whisper 或者 pip install githttps://github.com/openai/whisper.git 或者 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple openai-whisper2.1.2安装依赖 pip install tiktoken pip install setuptools-rust #在conda whisper环境外执行安装ffmpeg sudo apt update sudo apt install ffmpeg3测试1 whisper audio.mp3 --model medium --language Chinese代码调用 import whisper import arrow# 定义模型、音频地址、录音开始时间 def excute(model_name,file_path,start_time):model whisper.load_model(model_name)result model.transcribe(file_path)for segment in result[segments]:now arrow.get(start_time)start now.shift(secondssegment[start]).format(YYYY-MM-DD HH:mm:ss)end now.shift(secondssegment[end]).format(YYYY-MM-DD HH:mm:ss)print(【start- end】segment[text])if __name__ __main__:excute(base,1001.mp3,2022-10-24 16:23:00) 3测试2 语着分离创建代码 import os import whisper from pyannote.audio import Pipeline from pyannote_whisper.utils import diarize_text import concurrent.futurespipeline Pipeline.from_pretrained(pyannote/speaker-diarization, use_auth_tokenhf_eWdNZccHiWHuHOZCxUjKbTEIeIMLdLNBDS) output_dir /root/autodl-tmp/pyannote-whisperdef process_audio(file_path):model whisper.load_model(large)asr_result model.transcribe(file_path, initial_prompt语音转换)diarization_result pipeline(file_path)final_result diarize_text(asr_result, diarization_result)output_file os.path.join(output_dir, os.path.basename(file_path)[:-4] .txt)with open(output_file, w) as f:for seg, spk, sent in final_result:line f{seg.start:.2f} {seg.end:.2f} {spk} {sent}\nf.write(line)if not os.path.exists(output_dir):os.makedirs(output_dir)wave_dir /root/autodl-tmp/pyannote-whisper # 获取当前目录下所有wav文件名 wav_files [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith(.wav)]# 处理每个wav文件 with concurrent.futures.ThreadPoolExecutor(max_workers3) as executor:executor.map(process_audio, wav_files)print(处理完成)报错ModuleNotFoundError: No module named pyannote 解决方案 pip install pyannote.audio报错No module named pyannote_whisper 如果你使用使用AutoDL平台你可以使用学术代理加速 source /etc/network_turbogit clone https://github.com/yinruiqing/pyannote-whisper.git在项目里面写代码就可以了,或者复制代码里面的pyannote_whisper.utils模块代码三、安装使用funASR 1 安装官网 1.1 安装 Conda可选 wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh sh Miniconda3-latest-Linux-x86_64.sh source ~/.bashrc conda create -n funasr python3.8 conda activate funasr1.2 安装 Pytorch版本 1.11.0 pip3 install torch torchaudio如果您的环境中存在CUDA您应该安装与CUDA匹配的版本的pytorch。匹配列表可以在docs中找到。 1.3 安装funASR 从 pip 安装 pip3 install -U funasr # 对于中国的用户您可以使用以下命令进行安装 # pip3 install -U funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple 或者从源码安装funASR git clone https://github.com/alibaba/FunASR.git cd FunASR pip3 install -e ./1.4 安装 modelscope可选如果您想使用 ModelScope 中的预训练模型您应该安装 modelscope pip3 install -U modelscope # 对于中国的用户您可以使用以下命令进行安装 # pip3 install -U modelscope -i https://mirror.sjtu.edu.cn/pypi/web/simple1.5 如何从本地模型路径推断可选通过 modelscope-sdk 将模型下载到本地目录 from modelscope.hub.snapshot_download import snapshot_downloadlocal_dir_root ./models_from_modelscope model_dir snapshot_download(damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch, cache_dirlocal_dir_root)或者通过 git lfs 将模型下载到本地目录 git lfs install # git clone https://www.modelscope.cn/namespace/model-name.git git clone https://www.modelscope.cn/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch.git 使用本地模型路径进行推断 local_dir_root ./models_from_modelscope/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch inference_pipeline pipeline(taskTasks.auto_speech_recognition,modellocal_dir_root, )2 使用funASR 2.1 使用funASR from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasksinference_pipeline pipeline(taskTasks.auto_speech_recognition,modeldamo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch,model_revisionv1.2.4)rec_result inference_pipeline(audio_in1001.wav)print(rec_result[sentences])with open(result.txt, w, encodingutf-8) as f:print(rec_result, filef)print(rec_result)2.2 使用 pyannote.audio 进行语者分离第一步安装依赖 pip install pyannote.audio第二步创建key https://huggingface.co/settings/tokens 第三步测试pyannote.audio from pyannote.audio import Pipeline pipeline Pipeline.from_pretrained(pyannote/speaker-diarization, use_auth_tokenhf_eWdNZccHiWHuHOZCxUjKbTEIeIMLdLNBDS)# send pipeline to GPU (when available) import torch pipeline.to(torch.device(cuda))# apply pretrained pipeline diarization pipeline(1002.wav) print(diarization) # print the result for turn, _, speaker in diarization.itertracks(yield_labelTrue):print(fstart{turn.start:.1f}s stop{turn.end:.1f}s speaker_{speaker}) # start0.2s stop1.5s speaker_0 # start1.8s stop3.9s speaker_1 # start4.2s stop5.7s speaker_0 # ...2.3 funAS整合pyannote.audio 1.1编写算法 from pyannote.core import Segment, Annotation, Timelinedef get_text_with_timestamp(transcribe_res):timestamp_texts []for item in transcribe_res[segments]:start item[start]end item[end]text item[text]timestamp_texts.append((Segment(start, end), text))print(timestamp_texts)return timestamp_textsdef get_text_with_timestampFun(transcribe_res):print(transcribe_res[sentences])timestamp_texts []for item in transcribe_res[sentences]:start item[start]/1000.0end item[end]/1000.0text item[text]timestamp_texts.append((Segment(start, end), text))return timestamp_textsdef add_speaker_info_to_text(timestamp_texts, ann):spk_text []for seg, text in timestamp_texts:#这行代码的作用是在给定的时间段 seg 中根据说话人分离结果 ann 获取出现次数最多的说话人。spk ann.crop(seg).argmax()spk_text.append((seg, spk, text))return spk_textdef merge_cache(text_cache):sentence .join([item[-1] for item in text_cache])spk text_cache[0][1]start text_cache[0][0].startend text_cache[-1][0].endreturn Segment(start, end), spk, sentencePUNC_SENT_END [., ?, !, 。, , ]def merge_sentence(spk_text):merged_spk_text []pre_spk Nonetext_cache []for seg, spk, text in spk_text:if spk ! pre_spk and pre_spk is not None and len(text_cache) 0:merged_spk_text.append(merge_cache(text_cache))text_cache [(seg, spk, text)]pre_spk spkelif text[-1] in PUNC_SENT_END:text_cache.append((seg, spk, text))merged_spk_text.append(merge_cache(text_cache))text_cache []pre_spk spkelse:text_cache.append((seg, spk, text))pre_spk spkif len(text_cache) 0:merged_spk_text.append(merge_cache(text_cache))return merged_spk_textdef diarize_text(transcribe_res, diarization_result):timestamp_texts get_text_with_timestampFun(transcribe_res)spk_text add_speaker_info_to_text(timestamp_texts, diarization_result)res_processed merge_sentence(spk_text)return res_processeddef write_to_txt(spk_sent, file):with open(file, w) as fp:for seg, spk, sentence in spk_sent:line f{seg.start:.2f} {seg.end:.2f} {spk} {sentence}\nfp.write(line) 1.2调用 import os import whisper from pyannote.audio import Pipeline from pyannote_funasr.utils import diarize_text import concurrent.futures from modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks# 输出位置 output_dir /root/autodl-tmp/pyannote-whisperfrom modelscope.pipelines import pipeline from modelscope.utils.constant import Tasks# 语音转文字的模型 inference_pipeline pipeline(taskTasks.auto_speech_recognition,modeldamo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch,model_revisionv1.2.4)# rec_result inference_pipeline(audio_in1002.wav)# with open(result.txt, w, encodingutf-8) as f: # print(rec_result, filef)# # print(rec_result)def process_audio(file_path):print(----------1)asr_result inference_pipeline(audio_infile_path) print(-----------2.2)# 语者分离pipelinepipeline Pipeline.from_pretrained(pyannote/speaker-diarization, use_auth_tokenhf_eWdNZccHiWHuHOZCxUjKbTEIeIMLdLNBDS)# 使用显卡加速import torchpipeline.to(torch.device(cuda))#num_speakers 几个说话者可以不带diarization_result pipeline(file_path, num_speakers2)# 转文字结果print(diarization_result)# 进行语着分离final_result diarize_text(asr_result, diarization_result)print(-----------5)# 输出结果output_file os.path.join(output_dir, os.path.basename(file_path)[:-4] .txt)with open(output_file, w) as f:for seg, spk, sent in final_result:line f{seg.start:.2f} {seg.end:.2f} {spk} {sent}\nf.write(line)print(line)# 判断输出文件夹是否存在 if not os.path.exists(output_dir):os.makedirs(output_dir) wave_dir /root/autodl-tmp/pyannote-whisper # 获取当前目录下所有wav文件名 wav_files [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith(.wav)]# 处理每个wav文件 with concurrent.futures.ThreadPoolExecutor() as executor:executor.map(process_audio, wav_files)print(处理完成) 3.微调微调.py import os from modelscope.metainfo import Trainers from modelscope.trainers import build_trainer from modelscope.msdatasets.audio.asr_dataset import ASRDatasetdef modelscope_finetune(params):if not os.path.exists(params.output_dir):os.makedirs(params.output_dir, exist_okTrue)# dataset split [train, validation]ds_dict ASRDataset.load(params.data_path, namespacespeech_asr)kwargs dict(modelparams.model,data_dirds_dict,dataset_typeparams.dataset_type,work_dirparams.output_dir,batch_binsparams.batch_bins,max_epochparams.max_epoch,lrparams.lr)trainer build_trainer(Trainers.speech_asr_trainer, default_argskwargs)trainer.train()if __name__ __main__:from funasr.utils.modelscope_param import modelscope_argsparams modelscope_args(modeldamo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch)params.output_dir ./checkpoint # 模型保存路径params.data_path speech_asr_aishell1_trainsets # 数据路径可以为modelscope中已上传数据也可以是本地数据params.dataset_type small # 小数据量设置small若数据量大于1000小时请使用largeparams.batch_bins 2000 # batch size如果dataset_typesmallbatch_bins单位为fbank特征帧数如果dataset_typelargebatch_bins单位为毫秒params.max_epoch 50 # 最大训练轮数params.lr 0.00005 # 设置学习率modelscope_finetune(params)

查看全文

http://www.zqtcl.cn/news/574271/