Whisper.cpp：VS2022下的语音转文本示例-C佳家

来自AI助手的总结

本文介绍了如何使用Git克隆和构建项目，并提供了CPU和GPU版本的构建步骤及相关代码实现。

克隆项目：

使用Git工具，将项目克隆到本地：

git clone https://github.com/ggml-org/whisper.cpp.git

构建项目：

构建说明

构建可以分为CPU版本和GPU版本，即一个通过CPU的算力来为模型提供服务，另一个通过GPU的算力来为模型服务。

如果你的PC拥有GPU，那么更推荐你使用GPU版本的构建，这会极大提高转录速度。

以下是文章作者的环境配置：

IDE：VS2022
C++版本：C++17
构建工具：cmake
GPU：RTX 3060（6G显存）

CPU：

构建命令（请尽可能构建Release版本的）：

Release版本：

cmake -B build -G "Visual Studio 17 2022" -A x64 -D WHISPER_SHARED_LIB=ON

cmake --build build --config Release

Debug版本：

cmake -B build-cpu-dbg -G "Visual Studio 17 2022" -A x64 -DWHISPER_SHARED_LIB=ON

cmake --build build-cpu-dbg --config Debug -j%NUMBER_OF_PROCESSORS%

GPU:

使用GPU请确保你已经安装了CUDA(点击前往CUDA官网)，安装好后，在命令行窗口中输入：

nvcc --version

nvidia-smi

出现的结果如下图，那么说明环境没问题

构建命令（请尽可能使用Release版本的，而不是Debug，构建时请尽可能使用x64 Native Tools Command Prompt for VS 2022工具，开始菜单搜索即可）：

Release版本：

cmake -S . -B build-gpu -G "Ninja" -DGGML_CUDA=ON -DGGML_CUBLAS=ON -DGGML_CUDA_KERNELS=ON -DGGML_CUDA_ARCHITECTURES="86" -DCMAKE_CUDA_COMPILER="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/bin/nvcc.exe" -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0" -DCMAKE_BUILD_TYPE=Release

cmake --build build-gpu -j

Debug版本：

cmake -S . -B build-gpu-debug -G "Visual Studio 17 2022" -A x64 -DGGML_CUDA=ON -DGGML_CUBLAS=ON -DGGML_CUDA_KERNELS=ON -DCMAKE_CUDA_ARCHITECTURES="86" -DCUDAToolkit_ROOT="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0"

make --build build-gpu-debug --config Debug --target ALL_BUILD -j

构建成功：

如果构建成功，那么你将会看到build-gpu，构建过程中不会出现错误提示

20251116223641948-image

导入库：

构建完成后，在/build-gpu目录下，找到：

20251128144524639-image

新建一个库文件夹，新建的文件夹中，再新建图1中的两个文件夹

将上图中bin目录下的这几个.dll文件（图2）全部复制到lib文件夹中；
将上图中src目录下的.lib文件（图3）复制到lib文件夹中；
将上图中ggml目录下的.lib文件（图4）复制到lib文件夹中；

20251128144740403-image

图1

20251128144614145-image

图2

20251128145148113-image

图3

20251128145845122-image

图4

在项目的根目录下（whisper.cpp目录中，下图1），将

将ggml目录完整复制到之前新建的include文件夹中；
将项目根目录下的include目录（下图2）中的文件完整复制到之前创建的include文件夹中

20251128150201611-image

图1

20251128150415901-image

图2

链接库：

新建一个VS空项目，将刚刚的库文件夹放入项目目录中：

解决方案 -> 属性 -> 配置熟悉 -> 常规 -> C++语言标准 -> 选择C++17 (下图1)；
在项目属性页：C/C++ -> 常规 -> 附加包含目录，将添加include和include/ggml/include路径，参考图2；
属性 -> 链接器 -> 常规 -> 附加库目录，将lib文件夹添加，参考图3；
属性 -> 链接器 -> 输入 -> 附加依赖项，添加下方依赖（参考图4）：

whisper.lib
ggml.lib
ggml-base.lib

20251128154310878-image

图1

20251128154826969-image

图2

20251128155206765-image

图3

20251128155538398-image

图4

代码实现：

头文件（QAppWhisper.h）：

/*
*
* ********  WhisperEngine            ********
* ********  By Ciallo                ********
* ********  2025/11/23               ********
* ********  V1.0                     ********
* ********  https://wang-sz.cn       ********
* * Github:https://github.com/WinterShadowy *
*
*/

#pragma once

#include <string>
#include <vector>
#include <functional>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <atomic>
#include <whisper.h>

class WhisperEngine {
public:
    // 单例访问
    static WhisperEngine& instance();

    // 禁用拷贝/移动
    WhisperEngine(const WhisperEngine&) = delete;
    WhisperEngine& operator=(const WhisperEngine&) = delete;
    WhisperEngine(WhisperEngine&&) = delete;
    WhisperEngine& operator=(WhisperEngine&&) = delete;

    // 异步初始化模型：立即返回，加载在后台线程执行
    // model_path: 模型文件路径（例如 "ggml-whisper-large.bin"）
    void InitializeAsync(const std::string& model_path);

    // 查询状态
    bool IsInitializing() const;
    bool IsInitialized() const;
    bool InitSucceeded() const;
    void SetNumThreads(unsigned int n);
    void SetGpuLayers(int n);
    void SetUseMmap(bool v);
    // 同步转写（阻塞当前线程）
    // waitInitMs: 若模型未就绪，等待最多 waitInitMs 毫秒（0 表示不等待）
    // 返回值：识别文本或以 "[Error]" 开头的错误信息
    std::string TranscribeFromWav(const std::string& wav_path, int waitInitMs = 0);

    // 异步转写：识别在后台线程执行，回调在后台线程调用（若需要在 UI 线程处理结果，请在回调内转发）
    // callback 参数类型为 void(const std::string& result)
    void TranscribeAsync(const std::string& wav_path, std::function<void(std::string)> callback);

    // 停止/释放模型（再次调用 initializeAsync 可以重新加载）
    void Shutdown();

    ~WhisperEngine();

private:
    WhisperEngine();

    // 后台初始化线程入口
    void initThreadFunc_v_1_1(const std::string model_path);
    void initThreadFunc_v_1_2(const std::string model_path);
    void initThreadFunc_v_1_3(const std::string model_path);

    void initThreadFunc_v_1_4(const std::string model_path);

    // 内部用于转写的工作函数，假定模型已加载且调用方已加锁或序列化
    std::string doTranscribe_v_1_1(const std::string& wav_path);
private:
    // whisper 上下文
    whisper_context* m_ctx = nullptr;

    unsigned int m_n_threads = 0;    // 0 表示使用 hardware_concurrency()
    int m_n_gpu_layers = -1;         // -1 表示使用默认/不启用 GPU（或自动选择）
    bool m_use_mmap = false;
    // 初始化状态
    std::atomic<bool> m_initializing{ false };
    std::atomic<bool> m_initialized{ false };
    std::atomic<bool> m_init_success{ false };

    // 互斥与条件变量：用于等待初始化完成
    std::mutex m_init_mutex;
    std::condition_variable m_init_cv;

    // 用于序列化对 whisper_full 的调用
    std::mutex m_whisper_call_mutex;

    // 后台初始化线程（持有 thread 对象以便 join）
    std::thread m_init_thread;

    // 若使用 transcribeAsync，可用线程池或单后台线程来执行任务；此示例用 std::thread 每个任务一个线程
};

源文件（QAppWhisper.cpp）：

/*
*
* ********  WhisperEngine            ********
* ********  By Ciallo                ********
* ********  2025/11/23               ********
* ********  V1.0                     ********
* ********  https://wang-sz.cn       ********
* * Github:https://github.com/WinterShadowy *
*
*/

#include "QAppWhisper.h"

#define _CRT_SECURE_NO_WARNINGS
#pragma warning(disable:4996)

#include <whisper.h>
#include <cstdio>
#include <cstring>
#include <QDebug>
#include <chrono>
#include <iostream>

#pragma comment(lib, "whisper.lib")

__declspec(deprecated("This function will be removed in future versions. Use newFunction() instead."))
bool read_wav_v_1_0(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);
bool read_wav_v_1_1(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);
bool read_wav_v_1_2(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate = 16000);

// 单例实现
WhisperEngine& WhisperEngine::instance()
{
    static WhisperEngine inst;
    return inst;
}

WhisperEngine::WhisperEngine()
    : m_ctx(nullptr)
{
}

WhisperEngine::~WhisperEngine()
{
    Shutdown();
}

void WhisperEngine::InitializeAsync(const std::string& model_path)
{
    bool expected = false;
    // 如果已经在初始化或已初始化，就忽略或重新加载
    if (m_initializing.load() || m_initialized.load()) 
    {
        // 如果已经初始化成功且是同一路径，可以选择忽略或重载；示例直接返回
        return;
    }

    m_initializing.store(true);
    m_init_success.store(false);
    m_initialized.store(false);

    // 启动后台线程执行初始化
    m_init_thread = std::thread([this, model_path]() {
        this->initThreadFunc_v_1_4(model_path);
        });

    // 若需要立即 detach 可改为 detach，但建议 join 在析构时进行
}

void WhisperEngine::initThreadFunc_v_1_1(const std::string model_path)
{

    // 这里在后台线程中实际加载模型
    // 注意：whisper_init_from_file_with_params 可能比较耗时
    whisper_context_params cparams = whisper_context_default_params();
    // 你可以修改 cparams（例如 n_threads）如需要

    whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);
    {
        std::lock_guard<std::mutex> lk(m_init_mutex);
        m_ctx = ctx;
        if (ctx) 
        {
            m_init_success.store(true);
            m_initialized.store(true);
        }
        else 
        {
            m_init_success.store(false);
            m_initialized.store(false);
        }
        m_initializing.store(false);
    }
    m_init_cv.notify_all();
}

void WhisperEngine::initThreadFunc_v_1_2(const std::string model_path)
{
    whisper_context_params cparams = whisper_context_default_params();
    cparams.use_gpu = true;
#ifdef WHISPER_HAS_USE_MMAP
    cparams.use_mmap = m_use_mmap;
#endif

    if (m_n_gpu_layers >= 0) {
#ifdef WHISPER_HAS_N_GPU_LAYERS
        cparams.n_gpu_layers = m_n_gpu_layers;
#else
        // 如果库没有这个字段，忽略或记录提示
        // printf / qDebug 提示用户需要用支持 GPU 的 whisper.cpp 重建库
        qDebug() << "whisper lib: n_gpu_layers not available in whisper_context_params; ignoring";
#endif
    }

    qDebug() << "Initializing model:" << QString::fromStdString(model_path)
        << " gpu_layers=" << m_n_gpu_layers << " use_mmap=" << m_use_mmap;

    whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);

    {
        std::lock_guard<std::mutex> lk(m_init_mutex);
        m_ctx = ctx;
        m_init_success.store(ctx != nullptr);
        m_initialized.store(true);   // 标记初始化完成（成功或失败）
        m_initializing.store(false);
    }
    m_init_cv.notify_all();
}

void WhisperEngine::initThreadFunc_v_1_3(const std::string model_path)
{
    whisper_context_params cparams = whisper_context_default_params();
    cparams.use_gpu = true;
    cparams.gpu_device = 0;      // 你的 3060 通常是 0 号设备
    cparams.flash_attn = true;   // 开启 Flash Attention（构建支持时生效）
#ifdef WHISPER_HAS_USE_MMAP
    cparams.use_mmap = m_use_mmap;
#endif

#ifdef WHISPER_HAS_N_GPU_LAYERS
    // m_n_gpu_layers < 0 时，尽量把能下放的层都交给 GPU（库内部会裁剪到可支持的最大层数）
    cparams.n_gpu_layers = (m_n_gpu_layers < 0) ? 999 : m_n_gpu_layers;
#else
    qDebug() << "whisper lib: n_gpu_layers not available in whisper_context_params; ignoring";
#endif

    qDebug() << "Initializing model:" << QString::fromStdString(model_path)
        << " gpu_layers=" << m_n_gpu_layers << " use_mmap=" << m_use_mmap
        << " flash_attn=" << cparams.flash_attn << " gpu_device=" << cparams.gpu_device;

    whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);

    {
        std::lock_guard<std::mutex> lk(m_init_mutex);
        m_ctx = ctx;
        m_init_success.store(ctx != nullptr);
        m_initialized.store(true);
        m_initializing.store(false);
    }

    if (ctx) {
        qDebug() << "Whisper backend:" << whisper_print_system_info();
    }
    m_init_cv.notify_all();
}

void WhisperEngine::initThreadFunc_v_1_4(const std::string model_path)
{
    // 构造 GPU 参数
    whisper_context_params cparams = whisper_context_default_params();
    cparams.use_gpu = true;   // 启用 CUDA
    cparams.gpu_device = 0;      // 3060 单卡
    cparams.flash_attn = true;  // Ampere 可开 true，需编译支持
    // 真正加载（可能耗时数秒）
    whisper_context* ctx = whisper_init_from_file_with_params(model_path.c_str(), cparams);

    // 写回状态
    {
        std::lock_guard<std::mutex> lk(m_init_mutex);
        m_ctx = ctx;
        if (ctx)
        {
            m_init_success.store(true);
            m_initialized.store(true);
            std::fprintf(stdout, "[WhisperEngine] GPU model loaded success from %s\n", model_path.c_str());
        }
        else
        {
            m_init_success.store(false);
            m_initialized.store(false);
            std::fprintf(stderr, "[WhisperEngine] GPU model loaded FAILED from %s\n", model_path.c_str());
        }
        m_initializing.store(false);
    }
    m_init_cv.notify_all();
}

bool WhisperEngine::IsInitializing() const 
{ 
    return m_initializing.load(); 
}
bool WhisperEngine::IsInitialized() const
{
    return m_initialized.load();
}
bool WhisperEngine::InitSucceeded() const 
{ 
    return m_init_success.load(); 
}
void WhisperEngine::SetNumThreads(unsigned int n)
{
    m_n_threads = n;
    return;
}
void WhisperEngine::SetGpuLayers(int n)
{
    m_n_gpu_layers = n;
    return;
}
void WhisperEngine::SetUseMmap(bool v)
{
    m_use_mmap = v;
    return;
}

std::string WhisperEngine::doTranscribe_v_1_1(const std::string& wav_path)
{
    if (!m_ctx) 
        return std::string("[Error] model not loaded");

    // 读取 wav 到 float buffer（使用你已有函数）
    std::vector<float> pcmf32;
    if (!read_wav_v_1_1(wav_path.c_str(), pcmf32, 16000)) 
    {
        return std::string("[Error] read WAV failed");
    }

    // 调用 whisper_full
    whisper_full_params wparams = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
    wparams.language = "zh";
    wparams.no_timestamps = true;
    wparams.n_threads = 4;       // CPU 并行
    wparams.n_max_text_ctx = 16384;// 可识别 30 s 块
    // 保护对 whisper_full 的并发调用（串行化）
    std::lock_guard<std::mutex> lk(m_whisper_call_mutex);

    int rc = whisper_full(m_ctx, wparams, pcmf32.data(), pcmf32.size());
    if (rc != 0) 
    {
        return std::string("[Error] whisper_full failed");
    }

    // 提取结果
    std::string result;
    int n = whisper_full_n_segments(m_ctx);
    for (int i = 0; i < n; ++i) 
    {
        const char* seg = whisper_full_get_segment_text(m_ctx, i);
        if (seg) 
            result += seg;
    }
    return result;
}
std::string WhisperEngine::TranscribeFromWav(const std::string& wav_path, int waitInitMs)
{
    // 如果模型正在初始化且 caller 指定等待时间，则等待
    if (!m_initialized.load()) 
    {
        if (waitInitMs > 0)
        {
            std::unique_lock<std::mutex> lk(m_init_mutex);
            m_init_cv.wait_for(lk, std::chrono::milliseconds(waitInitMs), [this]() {
                return this->m_initialized.load() || !this->m_initializing.load();
                });
        }
    }

    if (!m_initialized.load() || !m_init_success.load()) 
    {
        return std::string("[Error] model not ready");
    }

    return doTranscribe_v_1_1(wav_path);
}
void WhisperEngine::TranscribeAsync(const std::string& wav_path, std::function<void(std::string)> callback)
{
    // 在后台线程执行识别任务
    std::thread task([this, wav_path, callback]() {
        // 等待模型就绪，最多等待 30s（可调整）
        {
            std::unique_lock<std::mutex> lk(m_init_mutex);
            if (!m_initialized.load()) 
            {
                m_init_cv.wait_for(lk, std::chrono::seconds(30), [this]() {
                    return this->m_initialized.load() || !this->m_initializing.load();
                    });
            }
        }
        if (!m_initialized.load() || !m_init_success.load()) 
        {
            if (callback) 
                callback("[Error] model not ready");
            return;
        }
        std::string res = doTranscribe_v_1_1(wav_path);
        if (callback) 
            callback(res);
        });
    task.detach(); // fire-and-forget; 管理任务线程池可改造
}

void WhisperEngine::Shutdown()
{
    // 如果 init thread 仍在运行，等待其结束
    if (m_init_thread.joinable()) 
    {
        // 如果在初始化中，不能强制停止 whisper init; 等待一段时间
        m_init_thread.join();
    }

    {
        std::lock_guard<std::mutex> lk(m_init_mutex);
        if (m_ctx) 
        {
            whisper_free(m_ctx);
            m_ctx = nullptr;
        }
        m_initialized.store(false);
        m_init_success.store(false);
        m_initializing.store(false);
    }
    m_init_cv.notify_all();
}

// 支持信息：
// 16 kHz / 32-bit-float / 单声道 WAV 读取
// 不支持其他格式
bool read_wav_v_1_0(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
    FILE* fp = std::fopen(fname, "rb");
    if (!fp) 
    {                                    // 文件打不开
        std::fprintf(stderr, "[read_wav] fopen fail: %s\n", fname);
        return false;
    }

    struct {
        char     riff[4] = {};
        uint32_t size = 0;
        char     wave[4] = {};
        char     fmt[4] = {};
        uint32_t fmt_sz = 0;
        uint16_t fmt_tag = 0;
        uint16_t ch = 0;
        uint32_t sr = 0;
        uint32_t br = 0;
        uint16_t ba = 0;
        uint16_t bps = 0;
        char     data[4] = {};
        uint32_t data_sz = 0;
    } h;

    bool ok = true;

    // 读固定头
    if (ok && std::fread(&h, sizeof(h), 1, fp) != 1) 
        ok = false;
    if (ok && (std::memcmp(h.riff, "RIFF", 4) || std::memcmp(h.wave, "WAVE", 4))) 
    {
        std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file\n");
        ok = false;
    }
    if (ok && h.fmt_tag != 1) 
    {                       // 1 = PCM
        std::fprintf(stderr, "[read_wav] not PCM (fmt_tag=%u)\n", h.fmt_tag);
        ok = false;
    }

    // 跳过 fmt 扩展字节
    if (ok && h.fmt_sz > 16) 
        std::fseek(fp, h.fmt_sz - 16, SEEK_CUR);

    // 替换原来的 “找 data chunk” 代码
    bool found = false;
    while (ok) 
    {
        char id[4];
        uint32_t sz = 0;
        if (std::fread(id, 4, 1, fp) != 1) 
            break;
        if (std::fread(&sz, 4, 1, fp) != 1) 
            break;
        if (memcmp(id, "data", 4) == 0) 
        {
            h.data_sz = sz;
            found = true;
            break;
        }
        // 跳过当前 chunk 数据（按字节对齐到偶数）
        std::fseek(fp, sz + (sz & 1), SEEK_CUR);
    }
    if (!found) 
    {
        fprintf(stderr, "[read_wav] no 'data' chunk\n");
        ok = false;
    }

    // 格式检查
    if (ok && h.sr != (uint32_t)expect_sample_rate) 
    {
        std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d\n", h.sr, expect_sample_rate);
        ok = false;
    }
    if (ok && h.ch != 1) 
    {
        std::fprintf(stderr, "[read_wav] not mono (channels=%u)\n", h.ch);
        ok = false;
    }
    if (ok && h.bps != 32) 
    {
        std::fprintf(stderr, "[read_wav] not 32-bit float (bits=%u)\n", h.bps);
        ok = false;
    }

    // 读样本
    if (ok) 
    {
        size_t n = h.data_sz / sizeof(float);
        pcmf32.resize(n);
        if (std::fread(pcmf32.data(), sizeof(float), n, fp) != n) 
        {
            std::fprintf(stderr, "[read_wav] fread samples failed\n");
            ok = false;
        }
    }
    std::fclose(fp);
    return ok && !pcmf32.empty();
}

// 支持信息：
// sample_rate == expect_sample_rate（默认 16000）
// format == IEEE float (fmt_tag == 3)
//           bits_per_sample == 32
//           channels == 1 (单声道)
bool read_wav_v_1_1(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
    FILE* fp = std::fopen(fname, "rb");
    if (!fp)
    {
        std::fprintf(stderr, "[read_wav] fopen fail: %s", fname);
        return false;
    }

    // 读 RIFF header (12 bytes)
    char riff[4];
    uint32_t riff_sz = 0;
    char wave[4];
    if (std::fread(riff, 1, 4, fp) != 4 ||
        std::fread(&riff_sz, sizeof(riff_sz), 1, fp) != 1 ||
        std::fread(wave, 1, 4, fp) != 4)
    {
        std::fprintf(stderr, "[read_wav] read header failed");
        std::fclose(fp);
        return false;
    }
    if (std::memcmp(riff, "RIFF", 4) != 0 || std::memcmp(wave, "WAVE", 4) != 0)
    {
        std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file");
        std::fclose(fp);
        return false;
    }

    // variables to hold fmt/data info
    bool got_fmt = false;
    bool got_data = false;
    uint16_t fmt_tag = 0;
    uint16_t channels = 0;
    uint32_t sample_rate = 0;
    uint16_t bits_per_sample = 0;
    uint32_t data_sz = 0;
    std::vector<uint8_t> data_buf;

    // 遍历 chunks 查找 "fmt " 和 "data"
    while (!got_data)
    {
        char id[4];
        uint32_t chunk_sz = 0;
        if (std::fread(id, 1, 4, fp) != 4)
            break;
        if (std::fread(&chunk_sz, sizeof(chunk_sz), 1, fp) != 1)
            break;

        if (std::memcmp(id, "fmt ", 4) == 0)
        {
            // 读取 fmt chunk（chunk_sz >= 16 常见）
            std::vector<uint8_t> fmt(chunk_sz);
            if (chunk_sz > 0)
            {
                if (std::fread(fmt.data(), 1, chunk_sz, fp) != chunk_sz)
                {
                    std::fprintf(stderr, "[read_wav] read fmt chunk failed");
                    std::fclose(fp);
                    return false;
                }
            }
            if (chunk_sz < 16)
            {
                std::fprintf(stderr, "[read_wav] fmt chunk too small");
                std::fclose(fp);
                return false;
            }
            // 直接按小端解析常见字段（RIFF/WAV 为 little-endian）
            fmt_tag = *reinterpret_cast<const uint16_t*>(&fmt[0]);
            channels = *reinterpret_cast<const uint16_t*>(&fmt[2]);
            sample_rate = *reinterpret_cast<const uint32_t*>(&fmt[4]);
            // skip byte rate (4) block align (2)
            bits_per_sample = *reinterpret_cast<const uint16_t*>(&fmt[14]);

            got_fmt = true;
            // 如果 fmt chunk 比 16 大，已经通过 read 跳过了扩展部分
        }
        else if (std::memcmp(id, "data", 4) == 0)
        {
            // 读 data chunk
            if (chunk_sz > 0)
            {
                data_buf.resize(chunk_sz);
                if (std::fread(data_buf.data(), 1, chunk_sz, fp) != chunk_sz)
                {
                    std::fprintf(stderr, "[read_wav] read data chunk failed");
                    std::fclose(fp);
                    return false;
                }
                data_sz = chunk_sz;
            }
            else
            {
                data_buf.clear();
                data_sz = 0;
            }
            got_data = true;
        }
        else
        {
            // 跳过未知 chunk（按偶数字节对齐）
            if (chunk_sz > 0)
            {
                if (std::fseek(fp, chunk_sz, SEEK_CUR) != 0)
                {
                    std::fprintf(stderr, "[read_wav] fseek failed while skipping chunk");
                    std::fclose(fp);
                    return false;
                }
            }
        }

        // chunk 大小若为奇数，文件中有 pad 字节
        if (chunk_sz & 1)
        {
            std::fseek(fp, 1, SEEK_CUR);
        }
    }

    if (!got_fmt)
    {
        std::fprintf(stderr, "[read_wav] no 'fmt ' chunk");
        std::fclose(fp);
        return false;
    }
    if (!got_data)
    {
        std::fprintf(stderr, "[read_wav] no 'data' chunk");
        std::fclose(fp);
        return false;
    }

    // 检查格式：只接受 IEEE float (fmt_tag == 3)
    if (fmt_tag != 3)
    {
        std::fprintf(stderr, "[read_wav] not IEEE float (fmt_tag=%u)\n", fmt_tag);
        std::fclose(fp);
        return false;
    }
    if (bits_per_sample != 32)
    {
        std::fprintf(stderr, "[read_wav] not 32-bit float (bits=%u)\n", bits_per_sample);
        std::fclose(fp);
        return false;
    }
    if (channels != 1)
    {
        std::fprintf(stderr, "[read_wav] not mono (channels=%u)\n", channels);
        std::fclose(fp);
        return false;
    }
    if (sample_rate != (uint32_t)expect_sample_rate)
    {
        std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d", sample_rate, expect_sample_rate);
        std::fclose(fp);
        return false;
    }

    // 读取样本（每个样本 4 字节 float）
    if (data_sz == 0)
    {
        std::fprintf(stderr, "[read_wav] data chunk is empty");
        std::fclose(fp);
        return false;
    }
    size_t n_samples = data_sz / sizeof(float);
    pcmf32.resize(n_samples);
    // 直接内存拷贝（假定文件为小端 IEEE float）
    std::memcpy(pcmf32.data(), data_buf.data(), n_samples * sizeof(float));

    std::fclose(fp);
    return !pcmf32.empty();
}

bool read_wav_v_1_2(const char* fname, std::vector<float>& pcmf32, int expect_sample_rate)
{
    FILE* fp = std::fopen(fname, "rb");
    if (!fp)
    {
        std::fprintf(stderr, "[read_wav] fopen fail: %s", fname);
        return false;
    }

    // read RIFF header (12 bytes)
    char riff[4];
    uint32_t riff_sz;
    char wave[4];
    if (std::fread(riff, 1, 4, fp) != 4 ||
        std::fread(&riff_sz, sizeof(riff_sz), 1, fp) != 1 ||
        std::fread(wave, 1, 4, fp) != 4)
    {
        std::fprintf(stderr, "[read_wav] read header failed");
        std::fclose(fp);
        return false;
    }
    if (std::memcmp(riff, "RIFF", 4) != 0 || std::memcmp(wave, "WAVE", 4) != 0)
    {
        std::fprintf(stderr, "[read_wav] not a RIFF/WAVE file");
        std::fclose(fp);
        return false;
    }

    // variables to hold fmt and data info
    bool got_fmt = false;
    bool got_data = false;
    uint16_t audio_format = 0;
    uint16_t num_channels = 0;
    uint32_t sample_rate = 0;
    uint16_t bits_per_sample = 0;
    std::vector<uint8_t> data_buf;
    uint32_t data_size = 0;

    // iterate chunks
    while (!got_data) {
        char id[4];
        uint32_t chunk_sz = 0;
        if (std::fread(id, 1, 4, fp) != 4) break;
        if (std::fread(&chunk_sz, sizeof(chunk_sz), 1, fp) != 1) break;

        // handle chunk id
        if (std::memcmp(id, "fmt ", 4) == 0) 
        {
            // read fmt chunk
            std::vector<uint8_t> fmt(chunk_sz);
            if (chunk_sz > 0 && std::fread(fmt.data(), 1, chunk_sz, fp) != chunk_sz) {
                std::fprintf(stderr, "[read_wav] read fmt chunk failed");
                std::fclose(fp);
                return false;
            }
            // parse basic fields (first 16 bytes expected)
            if (chunk_sz < 16) {
                std::fprintf(stderr, "[read_wav] fmt chunk too small");
                std::fclose(fp);
                return false;
            }
            // little-endian parsing
            audio_format = *reinterpret_cast<const uint16_t*>(&fmt[0]);
            num_channels = *reinterpret_cast<const uint16_t*>(&fmt[2]);
            sample_rate = *reinterpret_cast<const uint32_t*>(&fmt[4]);
            // skip byte rate (4 bytes) and block align (2 bytes)
            bits_per_sample = *reinterpret_cast<const uint16_t*>(&fmt[14]);

            // if fmt chunk has extension (e.g., WAVE_FORMAT_EXTENSIBLE), we could inspect subformat,
            // but for typical files, audio_format == 1 (PCM) or 3 (IEEE float) is enough.
            got_fmt = true;
        }
        else if (std::memcmp(id, "data", 4) == 0)
        {
            // read data chunk into buffer
            if (chunk_sz > 0)
            {
                data_buf.resize(chunk_sz);
                if (std::fread(data_buf.data(), 1, chunk_sz, fp) != chunk_sz)
                {
                    std::fprintf(stderr, "[read_wav] read data chunk failed");
                    std::fclose(fp);
                    return false;
                }
                data_size = chunk_sz;
                got_data = true;
            }
            else
            {
                // empty data chunk
                data_buf.clear();
                data_size = 0;
                got_data = true;
            }
        }
        else
        {
            // skip unknown chunk (pad to even)
            // chunk_sz may be odd; skip chunk_sz bytes, plus pad byte if needed
            uint32_t skip = chunk_sz;
            if (skip > 0)
            {
                if (std::fseek(fp, skip, SEEK_CUR) != 0)
                {
                    std::fprintf(stderr, "[read_wav] fseek failed while skipping chunk");
                    std::fclose(fp);
                    return false;
                }
            }
        }

        // chunk sizes are word aligned; if chunk_sz is odd, there may be a pad byte
        if (chunk_sz & 1)
        {
            std::fseek(fp, 1, SEEK_CUR);
        }
    }

    if (!got_fmt)
    {
        std::fprintf(stderr, "[read_wav] no 'fmt ' chunk");
        std::fclose(fp);
        return false;
    }
    if (!got_data)
    {
        std::fprintf(stderr, "[read_wav] no 'data' chunk");
        std::fclose(fp);
        return false;
    }

    // sample rate check
    if (expect_sample_rate > 0 && sample_rate != (uint32_t)expect_sample_rate)
    {
        std::fprintf(stderr, "[read_wav] sample rate mismatch: file=%u, expected=%d", sample_rate, expect_sample_rate);
        // 如果你希望仍然接受不同采样率，可以注释掉下面返回 false 的行并改为警告
        std::fclose(fp);
        return false;
    }

    // channels check (we will accept multi-channel but downmix to mono)
    if (num_channels < 1)
    {
        std::fprintf(stderr, "[read_wav] invalid channel count: %u", num_channels);
        std::fclose(fp);
        return false;
    }

    // convert data buffer into float vector
    pcmf32.clear();
    if (audio_format == 3)
    {
        // IEEE float
        if (bits_per_sample != 32)
        {
            std::fprintf(stderr, "[read_wav] unexpected float bits: %u", bits_per_sample);
            std::fclose(fp);
            return false;
        }
        size_t total_samples = data_size / sizeof(float);
        if (total_samples == 0)
        {
            std::fprintf(stderr, "[read_wav] no samples");
            std::fclose(fp);
            return false;
        }
        // if multi-channel, average channels to mono
        size_t frames = total_samples / num_channels;
        pcmf32.resize(frames);
        const float* src = reinterpret_cast<const float*>(data_buf.data());
        for (size_t i = 0; i < frames; ++i)
        {
            float acc = 0.0f;
            for (uint16_t c = 0; c < num_channels; ++c)
            {
                acc += src[i * num_channels + c];
            }
            pcmf32[i] = acc / static_cast<float>(num_channels);
        }
    }
    else if (audio_format == 1)
    {
        // PCM integer
        if (bits_per_sample == 16)
        {
            size_t total_samples = data_size / sizeof(int16_t);
            size_t frames = total_samples / num_channels;
            pcmf32.resize(frames);
            const int16_t* src = reinterpret_cast<const int16_t*>(data_buf.data());
            for (size_t i = 0; i < frames; ++i)
            {
                int64_t acc = 0;
                for (uint16_t c = 0; c < num_channels; ++c)
                {
                    acc += src[i * num_channels + c];
                }
                float v = static_cast<float>(acc) / (32768.0f * static_cast<float>(num_channels));
                pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
            }
        }
        else if (bits_per_sample == 24)
        {
            // 24-bit packed little-endian
            size_t bytes_per_sample = 3;
            size_t total_samples = data_size / bytes_per_sample;
            size_t frames = total_samples / num_channels;
            pcmf32.resize(frames);
            const uint8_t* b = data_buf.data();
            for (size_t i = 0; i < frames; ++i)
            {
                int64_t acc = 0;
                for (uint16_t c = 0; c < num_channels; ++c) {
                    size_t idx = (i * num_channels + c) * 3;
                    int32_t sample = (int32_t)((b[idx]) | (b[idx + 1] << 8) | (b[idx + 2] << 16));
                    // sign extension for 24-bit
                    if (sample & 0x800000) sample |= ~0xFFFFFF;
                    acc += sample;
                }
                float v = static_cast<float>(acc) / (8388608.0f * static_cast<float>(num_channels)); // 2^23
                pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
            }
        }
        else if (bits_per_sample == 32)
        {
            // 32-bit PCM integer
            size_t total_samples = data_size / sizeof(int32_t);
            size_t frames = total_samples / num_channels;
            pcmf32.resize(frames);
            const int32_t* src = reinterpret_cast<const int32_t*>(data_buf.data());
            for (size_t i = 0; i < frames; ++i)
            {
                int64_t acc = 0;
                for (uint16_t c = 0; c < num_channels; ++c)
                {
                    acc += src[i * num_channels + c];
                }
                float v = static_cast<float>(acc) / (2147483648.0f * static_cast<float>(num_channels));
                pcmf32[i] = std::max(-1.0f, std::min(1.0f, v));
            }
        }
        else
        {
            std::fprintf(stderr, "[read_wav] unsupported PCM bit depth: %u", bits_per_sample);
            std::fclose(fp);
            return false;
        }
    }
    else
    {
        std::fprintf(stderr, "[read_wav] unsupported audio format (fmt_tag=%u)\n", audio_format);
        std::fclose(fp);
        return false;
    }

    std::fclose(fp);
    return !pcmf32.empty();
}

使用：

版权声明 1 本网站名称：C佳家
2 本站永久网址： https://wang-sz.cn/
3 本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请进行删除处理。
4 本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5 本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请
6 本站资源大多存储在云盘，如发现链接失效，请我们会第一时间更新。

THE END

C++GitHUb 开源
# 开源项目 # VS2022 # Whisper.cpp # Whisper

广告位
子比主题易上手 · 好扩展 · 多功能本站同款WordPress第三方主题	小兔互联低价多类 · 全球机房 · 售后无忧适合个人博客的云服务器供应商	C佳家-Pay 免认证 · 多渠道 · 秒到账适合个人博客的易/码支付平台	C API 低延时·多模型·高兼容适用于AI站点的API中转站	C佳家小店售卖本站其它规格会员服务于本站卡密充值 \| 卡密代售咨询站长
文字广告位10￥/月站点简介产品简介
友情链接
津桥OJ 在线OJ，提高您的编程能力	小兔社区小兔互联官方社区	ChatAI-在线在线AI网站，支持多种大语言模型	苏晨拾光苏晨拾光致力于分享优质实用的互联网资源，内容包括网站搭建、建站源码、样式特效、主题美化、实用工具、素材资源及技术教程等，努力打造一个实用的IT技术博客。	糖果小屋一个专为萌新打造的小屋~
星空知本站致力于分享优质实用的互联网资源内容包括有网站搭建、建站源码、美化教程、SEO优化、技术教程等，应有尽有！	樵渔匹可复制的创业实战库，打破普通人的信息壁垒。	虚位以待站点简介	虚位以待站点简介	虚位以待站点简介
建设致谢（按时间先后排名）：
咕咕月网站早期logo设计	斯特兰奇星明资金赞助	咖咖咔嚓少部分功能测试	buyi 内容建设	ciweidawang 部分功能测试/资金赞助
老师网站logo设计	Franky_Morales 网站logo修改/图标制作	虚位以待 · 期待您的加入期待您的贡献
温馨提示：需要友情链接请点击友情链接自助 \| 如需广告位请点击

Whisper.cpp：VS2022下的语音转文本示例

克隆项目：

构建项目：

构建说明

CPU：

GPU:

构建成功：

导入库：

链接库：

代码实现：

请登录后发表评论

小广告

用户反馈

您的建议对我们很重要

网站信息统计

Whisper.cpp：VS2022下的语音转文本示例

克隆项目：

构建项目：

构建说明

CPU：

GPU:

构建成功：

导入库：

链接库：

代码实现：

请登录后发表评论

小广告

用户反馈

您的建议对我们很重要

网站信息统计

获取您的IP地址和地理信息中...