Spark TTS 本地文本转语音调试备忘
目录
- 前言
- 一、安装 Conda
- 二、创建 Conda 环境
- 三、安装依赖
- 四、安装 PyTorch
- 下载 Spark-TTS 模型
- 六、运行 Spark-TTS
- 七、调试与问题
- 八、启动 webui.py 失败可能原因
- 九、CUDA 运行版本对不上
前言
5月时只是小朋友为了智能体会话可以用周深的声音对话,所以折腾了台机器,搭建了一个本地的 Spark-TTS。折腾了这老版本 CUDA、PyTorch 的匹配,多次下载安装不同版本和驱动,抽几个晚上最终定格了版本匹配。
一、安装 Conda
如果还没有安装 Conda 的话。
- Download Miniconda and install it.
- Make sure to check “Add Conda to PATH” during installation.
Download Spark-TTS
You have two options to get the files:
Option 1 (Recommended for Windows): Download ZIP manually
- Go to Spark-TTS GitHub
- Click “Code” > “Download ZIP”, then extract it.
Option 2: Use Git (Optional)
-
If you prefer using Git, install Git and run:
二、创建 Conda 环境
Open Command Prompt (cmd) and run:
conda create -n sparktts python=3.12 -y conda activate sparktts
This creates and activates a Python 3.12 environment for Spark-TTS.
三、安装依赖
Inside the Spark-TTS folder (whether from ZIP or Git), run:
pip install -r requirements.txt
四、安装 PyTorch
注:自动检测 CUDA 或 CPU,会自动安装对应版本的 PyTorch。
pip install torch torchvision torchaudio --index-url https://pytorch.org/get-started/previous-versions/
# OR Manually install a specific CUDA version (if needed)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Older GPUs
可能有些电脑比较旧,安装的 cuda 版本比较低,导致没有找到正确的 torch 版本。
(sparktts) I:\AI-src\Spark-TTS>pip install torch torchvision torchaudio --index-url https://pytorch.org/get-started/previous-versions/
Looking in indexes: https://pytorch.org/get-started/previous-versions/
Requirement already satisfied: torch in d:\anaconda3\envs\sparktts\lib\site-packages (2.5.1)
ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision
解决:
先检查本地 cuda 的版本:
(sparktts) I:\AI-src\Spark-TTS>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
注:CUDA 一开始 runtime 版本是 10,后来可支持最新升级到 12.1,但由于 CUDA 12.1 一直安装的+cpu 版本,不能支持 CUDA,所以只能使用 11.8 CUDA 版本。
根据版本 11.8 指定 index-url 重新安装:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
确认
(sparktts) I:\AI-src\Spark-TTS>python ver.py
PyTorch version: 2.6.0+cu118
TorchVision version: 0.21.0+cu118
下载 Spark-TTS 模型
There are two ways to get the model files. Pick one:
Option 1 (Recommended): Using Python
Create a new file in the Spark-TTS folder called download_model.py
, paste this inside, and run it:
from huggingface_hub import snapshot_download import os
# Set download path model_dir = “pretrained_models/Spark-TTS-0.5B”
# Check if model already exists if os.path.exists(model_dir) and len(os.listdir(model_dir)) > 0: print(“Model files already exist. Skipping download.") else: print(“Downloading model files…") snapshot_download( repo_id="SparkAudio/Spark-TTS-0.5B”, local_dir=model_dir, resume_download=True # Resumes partial downloads ) print(“Download complete!")
Run it with:
python download_model.py
✅ Option 2: Using Git (If You Installed It)
mkdir pretrained_models git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B
Either method works—choose whichever is easier for you.
建议直接使用 git:
(sparktts) I:\AI-src\Spark-TTS\pretrained_models>git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B
Cloning into 'pretrained_models/Spark-TTS-0.5B'...
remote: Enumerating objects: 80, done.
remote: Counting objects: 100% (76/76), done.
remote: Compressing objects: 100% (76/76), done.
remote: Total 80 (delta 21), reused 0 (delta 0), pack-reused 4 (from 1)
Unpacking objects: 100% (80/80), 3.63 MiB | 1.43 MiB/s, done.
Updating files: 100% (31/31), done.
Filtering content: 100% (4/4), 3.66 GiB | 8.44 MiB/s, done.
六、运行 Spark-TTS
Web UI (Recommended)
For an interactive browser-based interface, run:
python webui.py
This launches a local web server where you can enter text and generate speech or clone a voice.
(sparktts) I:\AI-src\Spark-TTS>python webui.py
D:\anaconda3\envs\sparktts\Lib\site-packages\torch\nn\utils\weight_norm.py:143: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.
WeightNorm.apply(module, name, dim)
Missing tensor: mel_transformer.spectrogram.window
Missing tensor: mel_transformer.mel_scale.fb
* Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
七、调试与问题
🔎 Before Asking for Help
Many common issues are already covered in existing discussions, documentation, or online resources. Please:
- Search GitHub issues first 🕵️♂️
- Check the documentation 📖
- Google or use AI tools (ChatGPT, DeepSeek, etc.)
If you still need help, please explain what you’ve already tried so we can assist you better!
Now you’re good to go! 🚀🔥
Happy TTS-ing.
八、启动 webui.py 失败可能原因
torch
和 torchvision
版本不匹配
错误信息如下:
(sparktts) I:\AI-src\Spark-TTS>python webui.py
Traceback (most recent call last):
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\utils\import_utils.py", line 1778, in _get_module
return importlib.import_module("." + module_name, self.__name__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\sparktts\Lib\importlib\__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 999, in exec_module
File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py", line 40, in <module>
from ...modeling_utils import PreTrainedModel
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\modeling_utils.py", line 48, in <module>
from .loss.loss_utils import LOSS_MAPPING
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\loss\loss_utils.py", line 19, in <module>
from .loss_deformable_detr import DeformableDetrForObjectDetectionLoss, DeformableDetrForSegmentationLoss
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\loss\loss_deformable_detr.py", line 4, in <module>
from ..image_transforms import center_to_corners_format
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\image_transforms.py", line 22, in <module>
from .image_utils import (
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\image_utils.py", line 58, in <module>
from torchvision.transforms import InterpolationMode
ImportError: cannot import name 'InterpolationMode' from 'torchvision.transforms' (D:\anaconda3\envs\sparktts\Lib\site-packages\torchvision\transforms\__init__.py)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "I:\AI-src\Spark-TTS\webui.py", line 23, in <module>
from cli.SparkTTS import SparkTTS
File "I:\AI-src\Spark-TTS\cli\SparkTTS.py", line 23, in <module>
from sparktts.models.audio_tokenizer import BiCodecTokenizer
File "I:\AI-src\Spark-TTS\sparktts\models\audio_tokenizer.py", line 22, in <module>
from transformers import Wav2Vec2FeatureExtractor, Wav2Vec2Model
File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\utils\import_utils.py", line 1767, in __getattr__
value = getattr(module, name)
^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\utils\import_utils.py", line 1766, in __getattr__
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\envs\sparktts\Lib\site-packages\transformers\utils\import_utils.py", line 1780, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.wav2vec2.modeling_wav2vec2 because of the following error (look up to see its traceback):
cannot import name 'InterpolationMode' from 'torchvision.transforms' (D:\anaconda3\envs\sparktts\Lib\site-packages\torchvision\transforms\__init__.py)
检查 torch
和 torchvision
的版本是否匹配,可以通过以下步骤进行:
1. 查看当前安装的 torch
和 torchvision
版本
在 Python 环境中运行以下代码,获取当前安装的 torch
和 torchvision
版本:
import torch
import torchvision
print("PyTorch version:", torch.__version__)
print("TorchVision version:", torchvision.__version__)
执行后:
(sparktts) I:\AI-src\Spark-TTS>python ver.py
PyTorch version: 2.5.1+cpu
TorchVision version: 0.2.0
2. 检查版本是否匹配
根据 PyTorch 的版本,torchvision
通常会有一个对应的兼容版本。以下是一些常见版本的对应关系:
PyTorch 版本 | TorchVision 版本 |
---|---|
2.5.x | 0.20.x |
2.4.x | 0.19.x |
2.3.x | 0.18.x |
2.2.x | 0.17.x |
2.1.x | 0.16.x |
2.0.x | 0.15.x |
1.13.x | 0.14.x |
1.12.x | 0.13.x |
1.11.x | 0.12.x |
1.10.x | 0.11.x |
1.9.x | 0.10.x |
1.8.x | 0.9.x |
1.7.x | 0.8.x |
1.6.x | 0.7.x |
1.5.x | 0.6.x |
1.4.x | 0.5.x |
1.3.x | 0.4.x |
1.2.x | 0.4.x |
1.1.x | 0.3.x |
1.0.x | 0.2.x |
如果你的 torch
和 torchvision
版本不匹配,可能会导致兼容性问题。
3. 调整版本以确保匹配
如果发现版本不匹配,可以通过以下方法调整:
方法 1:升级或降级 torchvision
根据你的 torch
版本,安装对应的 torchvision
版本。例如:
pip install torchvision==0.20.1 # 对应 torch 2.5.x
(sparktts) I:\AI-src\Spark-TTS>pip install torchvision==0.20.1
Collecting torchvision==0.20.1
Downloading torchvision-0.20.1-cp312-cp312-win_amd64.whl.metadata (6.2 kB)
Requirement already satisfied: numpy in d:\anaconda3\envs\sparktts\lib\site-packages (from torchvision==0.20.1) (2.2.3)
Requirement already satisfied: torch==2.5.1 in d:\anaconda3\envs\sparktts\lib\site-packages (from torchvision==0.20.1) (2.5.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in d:\anaconda3\envs\sparktts\lib\site-packages (from torchvision==0.20.1) (11.1.0)
Requirement already satisfied: filelock in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (3.17.0)
Requirement already satisfied: typing-extensions>=4.8.0 in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (4.12.2)
Requirement already satisfied: networkx in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (3.4.2)
Requirement already satisfied: jinja2 in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (3.1.6)
Requirement already satisfied: fsspec in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (2025.3.0)
Requirement already satisfied: setuptools in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (75.8.0)
Requirement already satisfied: sympy==1.13.1 in d:\anaconda3\envs\sparktts\lib\site-packages (from torch==2.5.1->torchvision==0.20.1) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in d:\anaconda3\envs\sparktts\lib\site-packages (from sympy==1.13.1->torch==2.5.1->torchvision==0.20.1) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in d:\anaconda3\envs\sparktts\lib\site-packages (from jinja2->torch==2.5.1->torchvision==0.20.1) (2.1.5)
Downloading torchvision-0.20.1-cp312-cp312-win_amd64.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 53.9 kB/s eta 0:00:00
Installing collected packages: torchvision
Attempting uninstall: torchvision
Found existing installation: torchvision 0.2.0
Uninstalling torchvision-0.2.0:
Successfully uninstalled torchvision-0.2.0
Successfully installed torchvision-0.20.1
方法 2:升级或降级 torch
如果你希望使用某个特定版本的 torchvision
,可以调整 torch
的版本。例如:
pip install torch==2.4.1 torchvision==0.19.1
方法 3:使用 Conda 管理版本
如果你使用的是 Conda,可以通过以下命令安装匹配的版本:
conda install pytorch==2.4.1 torchvision==0.19.1 -c pytorch
4. 验证安装
调整版本后,重新运行以下代码,确保版本匹配且没有兼容性问题,再次验证:
(sparktts) I:\AI-src\Spark-TTS>python ver.py
PyTorch version: 2.5.1+cpu
TorchVision version: 0.20.1+cpu
如果版本匹配,但仍然遇到问题,可以尝试以下操作:
- 清理旧版本的缓存:
pip cache purge
- 在一个全新的虚拟环境中重新安装依赖:
conda create -n new_env python=3.9 conda activate new_env pip install torch torchvision
通过上述步骤,你可以确保 torch
和 torchvision
的版本匹配,从而避免兼容性问题。
九、CUDA 运行版本对不上
PS C:\Users\jm> nvidia-smi
Mon Mar 10 22:12:53 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 538.78 Driver Version: 538.78 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce MX150 WDDM | 00000000:01:00.0 Off | N/A |
| N/A 50C P0 N/A / ERR! | 0MiB / 2048MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
PS C:\Users\jm> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018
Cuda compilation tools, release 10.0, V10.0.130
PS C:\Users\jm>
显卡最高支持 12.2,但实际 CUDA 运行时版本是 10.0,需要安装个 12.2 的版本。
CUDA Toolkit Archive | NVIDIA Developer
找到对应的版本下载到本地安装。安装过程中,系统重启了,没有安装成功,重新又安装了 1 次,大约花了一小时。
C:\Users\jm>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:42:34_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
注:根据电脑实际情况,可能有的需要安装新版本 12.1,有的只能安装旧版本 11.8。
9ong@TsingChan 文章内容由 AI 辅助生成。