来自AI助手的总结
本文介绍了如何使用Git克隆和构建项目,并提供了CPU和GPU版本的构建步骤及相关代码实现。
克隆项目:
使用Git工具,将项目克隆到本地:
git clone https://github.com/ggml-org/whisper.cpp.git
构建项目:
构建说明
构建可以分为CPU版本和GPU版本,即一个通过CPU的算力来为模型提供服务,另一个通过GPU的算力来为模型服务。
如果你的PC拥有GPU,那么更推荐你使用GPU版本的构建,这会极大提高转录速度。
以下是文章作者的环境配置:
- IDE:VS2022
- C++版本:C++17
- 构建工具:cmake
- GPU:RTX 3060(6G显存)
CPU:
构建命令(请尽可能构建Release版本的):
| Release版本: |
|
| Debug版本: |
|
GPU:
使用GPU请确保你已经安装了CUDA(点击前往CUDA官网),安装好后,在命令行窗口中输入:
|
|
出现的结果如下图,那么说明环境没问题
![]() |
![]() |
构建命令(请尽可能使用Release版本的,而不是Debug,构建时请尽可能使用x64 Native Tools Command Prompt for VS 2022工具,开始菜单搜索即可):
Release版本:
cmake -S . -B build-gpu -G "Ninja" -DGGML_CUDA=ON -DGGML_CUBLAS=ON -DGGML_CUDA_KERNELS=ON -DGGML_CUDA_ARCHITECTURES="86" -DCMAKE_CUDA_COMPILER="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/bin/nvcc.exe" -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0" -DCMAKE_BUILD_TYPE=Release
cmake --build build-gpu -j
Debug版本:
cmake -S . -B build-gpu-debug -G "Visual Studio 17 2022" -A x64 -DGGML_CUDA=ON -DGGML_CUBLAS=ON -DGGML_CUDA_KERNELS=ON -DCMAKE_CUDA_ARCHITECTURES="86" -DCUDAToolkit_ROOT="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0"
make --build build-gpu-debug --config Debug --target ALL_BUILD -j
构建成功:
如果构建成功,那么你将会看到build-gpu,构建过程中不会出现错误提示

导入库:
构建完成后,在/build-gpu目录下,找到:

新建一个库文件夹,新建的文件夹中,再新建图1中的两个文件夹
- 将上图中bin目录下的这几个.dll文件(图2)全部复制到
lib文件夹中; - 将上图中src目录下的.lib文件(图3)复制到lib文件夹中;
- 将上图中ggml目录下的.lib文件(图4)复制到lib文件夹中;
|
图1 |
图2 |
|
图3 |
图4 |
在项目的根目录下(whisper.cpp目录中,下图1),将
- 将ggml目录完整复制到之前新建的include文件夹中;
- 将项目根目录下的include目录(下图2)中的文件完整复制到之前创建的include文件夹中
|
图1 |
图2 |
链接库:
新建一个VS空项目,将刚刚的库文件夹放入项目目录中:
- 解决方案 -> 属性 -> 配置熟悉 -> 常规 -> C++语言标准 -> 选择C++17 (下图1);
- 在项目属性页:C/C++ -> 常规 -> 附加包含目录,将添加
include和include/ggml/include路径,参考图2; - 属性 -> 链接器 -> 常规 -> 附加库目录,将lib文件夹添加,参考图3;
- 属性 -> 链接器 -> 输入 -> 附加依赖项,添加下方依赖(参考图4):
whisper.lib
ggml.lib
ggml-base.lib
|
图1 |
图2 |
|
图3 |
图4 |
代码实现:
头文件(QAppWhisper.h):
/*
*
* ******** WhisperEngine ********
* ******** By Ciallo ********
* ******** 2025/11/23 ********
* ******** V1.0 ********
* ******** https://wang-sz.cn ********
* * Github:https://github.com/WinterShadowy *
*
*/
#pragma once
#include <string>
#include <vector>
#include <functional>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <whisper.h>
class WhisperEngine {
public:
// 单例访问
static WhisperEngine& instance();
// 禁用拷贝/移动
WhisperEngine(const WhisperEngine&) = delete;
WhisperEngine& operator=(const WhisperEngine&) = delete;
WhisperEngine(WhisperEngine&&) = delete;
WhisperEngine& operator=(WhisperEngine&&) = delete;
// 异步初始化模型:立即返回,加载在后台线程执行
// model_path: 模型文件路径(例如 "ggml-whisper-large.bin")
void InitializeAsync(const std::string& model_path);
// 查询状态
bool IsInitializing() const;
bool IsInitialized() const;
bool InitSucceeded() const;
void SetNumThreads(unsigned int n);
void SetGpuLayers(int n);
void SetUseMmap(bool v);
// 同步转写(阻塞当前线程)
// waitInitMs: 若模型未就绪,等待最多 waitInitMs 毫秒(0 表示不等待)
// 返回值:识别文本或以 "[Error]" 开头的错误信息
std::string TranscribeFromWav(const std::string& wav_path, int waitInitMs = 0);
// 异步转写:识别在后台线程执行,回调在后台线程调用(若需要在 UI 线程处理结果,请在回调内转发)
// callback 参数类型为 void(const std::string& result)
void TranscribeAsync(const std::string& wav_path, std::function<void(std::string)> callback);
// 停止/释放模型(再次调用 initializeAsync 可以重新加载)
void Shutdown();
~WhisperEngine();
private:
WhisperEngine();
// 后台初始化线程入口
void initThreadFunc_v_1_1(const std::string model_path);
void initThreadFunc_v_1_2(const std::string model_path);
void initThreadFunc_v_1_3(const std::string model_path);
void initThreadFunc_v_1_4(const std::string model_path);
// 内部用于转写的工作函数,假定模型已加载且调用方已加锁或序列化
std::string doTranscribe_v_1_1(const std::string& wav_path);
private:
// whisper 上下文
whisper_context* m_ctx = nullptr;
unsigned int m_n_threads = 0; // 0 表示使用 hardware_concurrency()
int m_n_gpu_layers = -1; // -1 表示使用默认/不启用 GPU(或自动选择)
bool m_use_mmap = false;
// 初始化状态
std::atomic<bool> m_initializing{ false };
std::atomic<bool> m_initialized{ false };
std::atomic<bool> m_init_success{ false };
// 互斥与条件变量:用于等待初始化完成
std::mutex m_init_mutex;
std::condition_variable m_init_cv;
// 用于序列化对 whisper_full 的调用
std::mutex m_whisper_call_mutex;
// 后台初始化线程(持有 thread 对象以便 join)
std::thread m_init_thread;
// 若使用 transcribeAsync,可用线程池或单后台线程来执行任务;此示例用 std::thread 每个任务一个线程
};
源文件(QAppWhisper.cpp):
/*
*
* ******** WhisperEngine ********
* ******** By Ciallo ********
* ******** 2025/11/23 ********
* ******** V1.0 ********
* ******** https://wang-sz.cn ********
* * Github:https://github.com/WinterShadowy *
*
*/
#include "QAppWhisper.h"
#define _CRT_SECURE_NO_WARNINGS
#pragma warning(disable:4996)
#include <whisper.h>
#include <cstdio>
#include <cstring>
#include <QDebug>
#include <chrono>
#include <iostream>
#pragma comment(lib, "whisper.lib")
__declspec(deprecated("This function will be removed in future versions. Use newFunction() instead."))
bool read_wav_v_1_0(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);
bool read_wav_v_1_1(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);
bool read_wav_v_1_2(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);
// 单例实现
WhisperEngine& WhisperEngine::instance()
{
static WhisperEngine inst;
return inst;
}
WhisperEngine::WhisperEngine()
: m_ctx(nullptr)
{
}
WhisperEngine::~WhisperEngine()
{
Shutdown();
}
void WhisperEngine::InitializeAsync(const std::string& model_path)
{
bool expected = false;
// 如果已经在初始化或已初始化,就忽略或重新加载
if (m_initializing.load() || m_initialized.load())
{
// 如果已经初始化成功且是同一路径,可以选择忽略或重载;示例直接返回
return;
}
m_initializing.store(true);
m_init_success.store(false);
m_initialized.store(false);
// 启动后台线程执行初始化
m_init_thread = std::thread([this, model_path]() {
this->initThreadFunc_v_1_4(model_path);
});
// 若需要立即 detach 可改为 detach,但建议 join 在析构时进行
}
void WhisperEngine::initThreadFunc_v_1_1(const std::string model_path)
{
// 这里在后台线程中实际加载模型
// 注意:whisper_init_from_file_with_params 可能比较耗时
whisper_context_params cparams = whisper_context_default_params();
// 你可以修改 cparams(例如 n_threads)如需要
whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);
{
std::lock_guard<std::mutex> lk(m_init_mutex);
m_ctx = ctx;
if (ctx)
{
m_init_success.store(true);
m_initialized.store(true);
}
else
{
m_init_success.store(false);
m_initialized.store(false);
}
m_initializing.store(false);
}
m_init_cv.notify_all();
}
void WhisperEngine::initThreadFunc_v_1_2(const std::string model_path)
{
whisper_context_params cparams = whisper_context_default_params();
cparams.use_gpu = true;
#ifdef WHISPER_HAS_USE_MMAP
cparams.use_mmap = m_use_mmap;
#endif
if (m_n_gpu_layers >= 0) {
#ifdef WHISPER_HAS_N_GPU_LAYERS
cparams.n_gpu_layers = m_n_gpu_layers;
#else
// 如果库没有这个字段,忽略或记录提示
// printf / qDebug 提示用户需要用支持 GPU 的 whisper.cpp 重建库
qDebug() << "whisper lib: n_gpu_layers not available in whisper_context_params; ignoring";
#endif
}
qDebug() << "Initializing model:" << QString::fromStdString(model_path)
<< " gpu_layers=" << m_n_gpu_layers << " use_mmap=" << m_use_mmap;
whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);
{
std::lock_guard<std::mutex> lk(m_init_mutex);
m_ctx = ctx;
m_init_success.store(ctx != nullptr);
m_initialized.store(true); // 标记初始化完成(成功或失败)
m_initializing.store(false);
}
m_init_cv.notify_all();
}
void WhisperEngine::initThreadFunc_v_1_3(const std::string model_path)
{
whisper_context_params cparams = whisper_context_default_params();
cparams.use_gpu = true;
cparams.gpu_device = 0; // 你的 3060 通常是 0 号设备
cparams.flash_attn = true; // 开启 Flash Attention(构建支持时生效)
#ifdef WHISPER_HAS_USE_MMAP
cparams.use_mmap = m_use_mmap;
#endif
#ifdef WHISPER_HAS_N_GPU_LAYERS
// m_n_gpu_layers < 0 时,尽量把能下放的层都交给 GPU(库内部会裁剪到可支持的最大层数)
cparams.n_gpu_layers = (m_n_gpu_layers < 0) ? 999 : m_n_gpu_layers;
#else
qDebug() << "whisper lib: n_gpu_layers not available in whisper_context_params; ignoring";
#endif
qDebug() << "Initializing model:" << QString::fromStdString(model_path)
<< " gpu_layers=" << m_n_gpu_layers << " use_mmap=" << m_use_mmap
<< " flash_attn=" << cparams.flash_attn << " gpu_device=" << cparams.gpu_device;
whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);
{
std::lock_guard<std::mutex> lk(m_init_mutex);
m_ctx = ctx;
m_init_success.store(ctx != nullptr);
m_initialized.store(true);
m_initializing.store(false);
}
if (ctx) {
qDebug() << "Whisper backend:" << whisper_print_system_info();
}
m_init_cv.notify_all();
}
void WhisperEngine::initThreadFunc_v_1_4(const std::string model_path)
{
// 构造 GPU 参数
whisper_context_params cparams = whisper_context_default_params();
cparams.use_gpu = true; // 启用 CUDA
cparams.gpu_device = 0; // 3060 单卡
cparams.flash_attn = true; // Ampere 可开 true,需编译支持
// 真正加载(可能耗时数秒)
whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);
// 写回状态
{
std::lock_guard<std::mutex> lk(m_init_mutex);
m_ctx = ctx;
if (ctx)
{
m_init_success.store(true);
m_initialized.store(true);
std::fprintf(stdout, "[WhisperEngine] GPU model loaded success from %s\n", model_path.c_str());
}
else
{
m_init_success.store(false);
m_initialized.store(false);
std::fprintf(stderr, "[WhisperEngine] GPU model loaded FAILED from %s\n", model_path.c_str());
}
m_initializing.store(false);
}
m_init_cv.notify_all();
}
bool WhisperEngine::IsInitializing() const
{
return m_initializing.load();
}
bool WhisperEngine::IsInitialized() const
{
return m_initialized.load();
}
bool WhisperEngine::InitSucceeded() const
{
return m_init_success.load();
}
void WhisperEngine::SetNumThreads(unsigned int n)
{
m_n_threads = n;
return;
}
void WhisperEngine::SetGpuLayers(int n)
{
m_n_gpu_layers = n;
return;
}
void WhisperEngine::SetUseMmap(bool v)
{
m_use_mmap = v;
return;
}
std::string WhisperEngine::doTranscribe_v_1_1(const std::string& wav_path)
{
if (!m_ctx)
return std::string("[Error] model not loaded");
// 读取 wav 到 float buffer(使用你已有函数)
std::vector<float> pcmf32;
if (!read_wav_v_1_1(wav_path.c_str(), pcmf32, 16000))
{
return std::string("[Error] read WAV failed");
}
// 调用 whisper_full
whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
wparams.language = "zh";
wparams.no_timestamps = true;
wparams.n_threads = 4; // CPU 并行
wparams.n_max_text_ctx = 16384;// 可识别 30 s 块
// 保护对 whisper_full 的并发调用(串行化)
std::lock_guard<std::mutex> lk(m_whisper_call_mutex);
int rc = whisper_full(m_ctx, wparams, pcmf32.data(), pcmf32.size());
if (rc != 0)
{
return std::string("[Error] whisper_full failed");
}
// 提取结果
std::string result;
int n = whisper_full_n_segments(m_ctx);
for (int i = 0; i < n; ++i)
{
const char* seg = whisper_full_get_segment_text(m_ctx, i);
if (seg)
result += seg;
}
return result;
}
std::string WhisperEngine::TranscribeFromWav(const std::string& wav_path, int waitInitMs)
{
// 如果模型正在初始化且 caller 指定等待时间,则等待
if (!m_initialized.load())
{
if (waitInitMs > 0)
{
std::unique_lock<std::mutex> lk(m_init_mutex);
m_init_cv.wait_for(lk, std::chrono::milliseconds(waitInitMs), [this]() {
return this->m_initialized.load() || !this->m_initializing.load();
});
}
}
if (!m_initialized.load() || !m_init_success.load())
{
return std::string("[Error] model not ready");
}
return doTranscribe_v_1_1(wav_path);
}
void WhisperEngine::TranscribeAsync(const std::string& wav_path, std::function<void(std::string)> callback)
{
// 在后台线程执行识别任务
std::thread task([this, wav_path, callback]() {
// 等待模型就绪,最多等待 30s(可调整)
{
std::unique_lock<std::mutex> lk(m_init_mutex);
if (!m_initialized.load())
{
m_init_cv.wait_for(lk, std::chrono::seconds(30), [this]() {
return this->m_initialized.load() || !this->m_initializing.load();
});
}
}
if (!m_initialized.load() || !m_init_success.load())
{
if (callback)
callback("[Error] model not ready");
return;
}
std::string res = doTranscribe_v_1_1(wav_path);
if (callback)
callback(res);
});
task.detach(); // fire-and-forget; 管理任务线程池可改造
}
void WhisperEngine::Shutdown()
{
// 如果 init thread 仍在运行,等待其结束
if (m_init_thread.joinable())
{
// 如果在初始化中,不能强制停止 whisper init; 等待一段时间
m_init_thread.join();
}
{
std::lock_guard<std::mutex> lk(m_init_mutex);
if (m_ctx)
{
whisper_free(m_ctx);
m_ctx = nullptr;
}
m_initialized.store(false);
m_init_success.store(false);
m_initializing.store(false);
}
m_init_cv.notify_all();
}
// 支持信息:
// 16 kHz / 32-bit-float / 单声道 WAV 读取
// 不支持其他格式
bool read_wav_v_1_0(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
FILE* fp = std::fopen(fname, "rb");
if (!fp)
{ // 文件打不开
std::fprintf(stderr, "[read_wav] fopen fail: %s\n", fname);
return false;
}
struct {
char riff[4] = {};
uint32_t size = 0;
char wave[4] = {};
char fmt[4] = {};
uint32_t fmt_sz = 0;
uint16_t fmt_tag = 0;
uint16_t ch = 0;
uint32_t sr = 0;
uint32_t br = 0;
uint16_t ba = 0;
uint16_t bps = 0;
char data[4] = {};
uint32_t data_sz = 0;
} h;
bool ok = true;
// 读固定头
if (ok && std::fread(&h, sizeof(h), 1, fp) != 1)
ok = false;
if (ok && (std::memcmp(h.riff, "RIFF", 4) || std::memcmp(h.wave, "WAVE", 4)))
{
std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file\n");
ok = false;
}
if (ok && h.fmt_tag != 1)
{ // 1 = PCM
std::fprintf(stderr, "[read_wav] not PCM (fmt_tag=%u)\n", h.fmt_tag);
ok = false;
}
// 跳过 fmt 扩展字节
if (ok && h.fmt_sz > 16)
std::fseek(fp, h.fmt_sz - 16, SEEK_CUR);
// 替换原来的 “找 data chunk” 代码
bool found = false;
while (ok)
{
char id[4];
uint32_t sz = 0;
if (std::fread(id, 4, 1, fp) != 1)
break;
if (std::fread(&sz, 4, 1, fp) != 1)
break;
if (memcmp(id, "data", 4) == 0)
{
h.data_sz = sz;
found = true;
break;
}
// 跳过当前 chunk 数据(按字节对齐到偶数)
std::fseek(fp, sz + (sz & 1), SEEK_CUR);
}
if (!found)
{
fprintf(stderr, "[read_wav] no 'data' chunk\n");
ok = false;
}
// 格式检查
if (ok && h.sr != (uint32_t)expect_sample_rate)
{
std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d\n", h.sr, expect_sample_rate);
ok = false;
}
if (ok && h.ch != 1)
{
std::fprintf(stderr, "[read_wav] not mono (channels=%u)\n", h.ch);
ok = false;
}
if (ok && h.bps != 32)
{
std::fprintf(stderr, "[read_wav] not 32-bit float (bits=%u)\n", h.bps);
ok = false;
}
// 读样本
if (ok)
{
size_t n = h.data_sz / sizeof(float);
pcmf32.resize(n);
if (std::fread(pcmf32.data(), sizeof(float), n, fp) != n)
{
std::fprintf(stderr, "[read_wav] fread samples failed\n");
ok = false;
}
}
std::fclose(fp);
return ok && !pcmf32.empty();
}
// 支持信息:
// sample_rate == expect_sample_rate(默认 16000)
// format == IEEE float (fmt_tag == 3)
// bits_per_sample == 32
// channels == 1 (单声道)
bool read_wav_v_1_1(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
FILE* fp = std::fopen(fname, "rb");
if (!fp)
{
std::fprintf(stderr, "[read_wav] fopen fail: %s", fname);
return false;
}
// 读 RIFF header (12 bytes)
char riff[4];
uint32_t riff_sz = 0;
char wave[4];
if (std::fread(riff, 1, 4, fp) != 4 ||
std::fread(&riff_sz, sizeof(riff_sz), 1, fp) != 1 ||
std::fread(wave, 1, 4, fp) != 4)
{
std::fprintf(stderr, "[read_wav] read header failed");
std::fclose(fp);
return false;
}
if (std::memcmp(riff, "RIFF", 4) != 0 || std::memcmp(wave, "WAVE", 4) != 0)
{
std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file");
std::fclose(fp);
return false;
}
// variables to hold fmt/data info
bool got_fmt = false;
bool got_data = false;
uint16_t fmt_tag = 0;
uint16_t channels = 0;
uint32_t sample_rate = 0;
uint16_t bits_per_sample = 0;
uint32_t data_sz = 0;
std::vector<uint8_t> data_buf;
// 遍历 chunks 查找 "fmt " 和 "data"
while (!got_data)
{
char id[4];
uint32_t chunk_sz = 0;
if (std::fread(id, 1, 4, fp) != 4)
break;
if (std::fread(&chunk_sz, sizeof(chunk_sz), 1, fp) != 1)
break;
if (std::memcmp(id, "fmt ", 4) == 0)
{
// 读取 fmt chunk(chunk_sz >= 16 常见)
std::vector<uint8_t> fmt(chunk_sz);
if (chunk_sz > 0)
{
if (std::fread(fmt.data(), 1, chunk_sz, fp) != chunk_sz)
{
std::fprintf(stderr, "[read_wav] read fmt chunk failed");
std::fclose(fp);
return false;
}
}
if (chunk_sz < 16)
{
std::fprintf(stderr, "[read_wav] fmt chunk too small");
std::fclose(fp);
return false;
}
// 直接按小端解析常见字段(RIFF/WAV 为 little-endian)
fmt_tag = *reinterpret_cast<const uint16_t*>(&fmt[0]);
channels = *reinterpret_cast<const uint16_t*>(&fmt[2]);
sample_rate = *reinterpret_cast<const uint32_t*>(&fmt[4]);
// skip byte rate (4) block align (2)
bits_per_sample = *reinterpret_cast<const uint16_t*>(&fmt[14]);
got_fmt = true;
// 如果 fmt chunk 比 16 大,已经通过 read 跳过了扩展部分
}
else if (std::memcmp(id, "data", 4) == 0)
{
// 读 data chunk
if (chunk_sz > 0)
{
data_buf.resize(chunk_sz);
if (std::fread(data_buf.data(), 1, chunk_sz, fp) != chunk_sz)
{
std::fprintf(stderr, "[read_wav] read data chunk failed");
std::fclose(fp);
return false;
}
data_sz = chunk_sz;
}
else
{
data_buf.clear();
data_sz = 0;
}
got_data = true;
}
else
{
// 跳过未知 chunk(按偶数字节对齐)
if (chunk_sz > 0)
{
if (std::fseek(fp, chunk_sz, SEEK_CUR) != 0)
{
std::fprintf(stderr, "[read_wav] fseek failed while skipping chunk");
std::fclose(fp);
return false;
}
}
}
// chunk 大小若为奇数,文件中有 pad 字节
if (chunk_sz & 1)
{
std::fseek(fp, 1, SEEK_CUR);
}
}
if (!got_fmt)
{
std::fprintf(stderr, "[read_wav] no 'fmt ' chunk");
std::fclose(fp);
return false;
}
if (!got_data)
{
std::fprintf(stderr, "[read_wav] no 'data' chunk");
std::fclose(fp);
return false;
}
// 检查格式:只接受 IEEE float (fmt_tag == 3)
if (fmt_tag != 3)
{
std::fprintf(stderr, "[read_wav] not IEEE float (fmt_tag=%u)\n", fmt_tag);
std::fclose(fp);
return false;
}
if (bits_per_sample != 32)
{
std::fprintf(stderr, "[read_wav] not 32-bit float (bits=%u)\n", bits_per_sample);
std::fclose(fp);
return false;
}
if (channels != 1)
{
std::fprintf(stderr, "[read_wav] not mono (channels=%u)\n", channels);
std::fclose(fp);
return false;
}
if (sample_rate != (uint32_t)expect_sample_rate)
{
std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d", sample_rate, expect_sample_rate);
std::fclose(fp);
return false;
}
// 读取样本(每个样本 4 字节 float)
if (data_sz == 0)
{
std::fprintf(stderr, "[read_wav] data chunk is empty");
std::fclose(fp);
return false;
}
size_t n_samples = data_sz / sizeof(float);
pcmf32.resize(n_samples);
// 直接内存拷贝(假定文件为小端 IEEE float)
std::memcpy(pcmf32.data(), data_buf.data(), n_samples * sizeof(float));
std::fclose(fp);
return !pcmf32.empty();
}
bool read_wav_v_1_2(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
FILE* fp = std::fopen(fname, "rb");
if (!fp)
{
std::fprintf(stderr, "[read_wav] fopen fail: %s", fname);
return false;
}
// read RIFF header (12 bytes)
char riff[4];
uint32_t riff_sz;
char wave[4];
if (std::fread(riff, 1, 4, fp) != 4 ||
std::fread(&riff_sz, sizeof(riff_sz), 1, fp) != 1 ||
std::fread(wave, 1, 4, fp) != 4)
{
std::fprintf(stderr, "[read_wav] read header failed");
std::fclose(fp);
return false;
}
if (std::memcmp(riff, "RIFF", 4) != 0 || std::memcmp(wave, "WAVE", 4) != 0)
{
std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file");
std::fclose(fp);
return false;
}
// variables to hold fmt and data info
bool got_fmt = false;
bool got_data = false;
uint16_t audio_format = 0;
uint16_t num_channels = 0;
uint32_t sample_rate = 0;
uint16_t bits_per_sample = 0;
std::vector<uint8_t> data_buf;
uint32_t data_size = 0;
// iterate chunks
while (!got_data) {
char id[4];
uint32_t chunk_sz = 0;
if (std::fread(id, 1, 4, fp) != 4) break;
if (std::fread(&chunk_sz, sizeof(chunk_sz), 1, fp) != 1) break;
// handle chunk id
if (std::memcmp(id, "fmt ", 4) == 0)
{
// read fmt chunk
std::vector<uint8_t> fmt(chunk_sz);
if (chunk_sz > 0 && std::fread(fmt.data(), 1, chunk_sz, fp) != chunk_sz) {
std::fprintf(stderr, "[read_wav] read fmt chunk failed");
std::fclose(fp);
return false;
}
// parse basic fields (first 16 bytes expected)
if (chunk_sz < 16) {
std::fprintf(stderr, "[read_wav] fmt chunk too small");
std::fclose(fp);
return false;
}
// little-endian parsing
audio_format = *reinterpret_cast<const uint16_t*>(&fmt[0]);
num_channels = *reinterpret_cast<const uint16_t*>(&fmt[2]);
sample_rate = *reinterpret_cast<const uint32_t*>(&fmt[4]);
// skip byte rate (4 bytes) and block align (2 bytes)
bits_per_sample = *reinterpret_cast<const uint16_t*>(&fmt[14]);
// if fmt chunk has extension (e.g., WAVE_FORMAT_EXTENSIBLE), we could inspect subformat,
// but for typical files, audio_format == 1 (PCM) or 3 (IEEE float) is enough.
got_fmt = true;
}
else if (std::memcmp(id, "data", 4) == 0)
{
// read data chunk into buffer
if (chunk_sz > 0)
{
data_buf.resize(chunk_sz);
if (std::fread(data_buf.data(), 1, chunk_sz, fp) != chunk_sz)
{
std::fprintf(stderr, "[read_wav] read data chunk failed");
std::fclose(fp);
return false;
}
data_size = chunk_sz;
got_data = true;
}
else
{
// empty data chunk
data_buf.clear();
data_size = 0;
got_data = true;
}
}
else
{
// skip unknown chunk (pad to even)
// chunk_sz may be odd; skip chunk_sz bytes, plus pad byte if needed
uint32_t skip = chunk_sz;
if (skip > 0)
{
if (std::fseek(fp, skip, SEEK_CUR) != 0)
{
std::fprintf(stderr, "[read_wav] fseek failed while skipping chunk");
std::fclose(fp);
return false;
}
}
}
// chunk sizes are word aligned; if chunk_sz is odd, there may be a pad byte
if (chunk_sz & 1)
{
std::fseek(fp, 1, SEEK_CUR);
}
}
if (!got_fmt)
{
std::fprintf(stderr, "[read_wav] no 'fmt ' chunk");
std::fclose(fp);
return false;
}
if (!got_data)
{
std::fprintf(stderr, "[read_wav] no 'data' chunk");
std::fclose(fp);
return false;
}
// sample rate check
if (expect_sample_rate > 0 && sample_rate != (uint32_t)expect_sample_rate)
{
std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d", sample_rate, expect_sample_rate);
// 如果你希望仍然接受不同采样率,可以注释掉下面返回 false 的行并改为警告
std::fclose(fp);
return false;
}
// channels check (we will accept multi-channel but downmix to mono)
if (num_channels < 1)
{
std::fprintf(stderr, "[read_wav] invalid channel count: %u", num_channels);
std::fclose(fp);
return false;
}
// convert data buffer into float vector
pcmf32.clear();
if (audio_format == 3)
{
// IEEE float
if (bits_per_sample != 32)
{
std::fprintf(stderr, "[read_wav] unexpected float bits: %u", bits_per_sample);
std::fclose(fp);
return false;
}
size_t total_samples = data_size / sizeof(float);
if (total_samples == 0)
{
std::fprintf(stderr, "[read_wav] no samples");
std::fclose(fp);
return false;
}
// if multi-channel, average channels to mono
size_t frames = total_samples / num_channels;
pcmf32.resize(frames);
const float* src = reinterpret_cast<const float*>(data_buf.data());
for (size_t i = 0; i < frames; ++i)
{
float acc = 0.0f;
for (uint16_t c = 0; c < num_channels; ++c)
{
acc += src[i * num_channels + c];
}
pcmf32[i] = acc / static_cast<float>(num_channels);
}
}
else if (audio_format == 1)
{
// PCM integer
if (bits_per_sample == 16)
{
size_t total_samples = data_size / sizeof(int16_t);
size_t frames = total_samples / num_channels;
pcmf32.resize(frames);
const int16_t* src = reinterpret_cast<const int16_t*>(data_buf.data());
for (size_t i = 0; i < frames; ++i)
{
int64_t acc = 0;
for (uint16_t c = 0; c < num_channels; ++c)
{
acc += src[i * num_channels + c];
}
float v = static_cast<float>(acc) / (32768.0f * static_cast<float>(num_channels));
pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
}
}
else if (bits_per_sample == 24)
{
// 24-bit packed little-endian
size_t bytes_per_sample = 3;
size_t total_samples = data_size / bytes_per_sample;
size_t frames = total_samples / num_channels;
pcmf32.resize(frames);
const uint8_t* b = data_buf.data();
for (size_t i = 0; i < frames; ++i)
{
int64_t acc = 0;
for (uint16_t c = 0; c < num_channels; ++c) {
size_t idx = (i * num_channels + c) * 3;
int32_t sample = (int32_t)((b[idx]) | (b[idx + 1] << 8) | (b[idx + 2] << 16));
// sign extension for 24-bit
if (sample & 0x800000) sample |= ~0xFFFFFF;
acc += sample;
}
float v = static_cast<float>(acc) / (8388608.0f * static_cast<float>(num_channels)); // 2^23
pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
}
}
else if (bits_per_sample == 32)
{
// 32-bit PCM integer
size_t total_samples = data_size / sizeof(int32_t);
size_t frames = total_samples / num_channels;
pcmf32.resize(frames);
const int32_t* src = reinterpret_cast<const int32_t*>(data_buf.data());
for (size_t i = 0; i < frames; ++i)
{
int64_t acc = 0;
for (uint16_t c = 0; c < num_channels; ++c)
{
acc += src[i * num_channels + c];
}
float v = static_cast<float>(acc) / (2147483648.0f * static_cast<float>(num_channels));
pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
}
}
else
{
std::fprintf(stderr, "[read_wav] unsupported PCM bit depth: %u", bits_per_sample);
std::fclose(fp);
return false;
}
}
else
{
std::fprintf(stderr, "[read_wav] unsupported audio format (fmt_tag=%u)\n", audio_format);
std::fclose(fp);
return false;
}
std::fclose(fp);
return !pcmf32.empty();
}
使用:
© 版权声明
THE END
























暂无评论内容