前言 在 NAudio 中, 常用类型有 WaveIn, WaveOut, WaveStream, WaveFileWriter, WaveFileReader, AudioFileReader 以及接口: IWaveProvider, ISampleProvider, IWaveIn, IWavePlayer
WaveIn 表示波形输入, 继承了 IWaveIn, 例如麦克风输入, 或者计算机正在播放的音频流.
WaveOut 表示波形输出, 继承了 IWavePlayer, 用来播放波形音乐, 以 IWaveProvider 作为播放源播放音频, 通过拓展方法也支持以 ISampleProvider 作为播放源播放音频
WaveStream 表示波形流, 它继承了 IWaveProvider, 可以用来作为播放源.
WaveFileReader 继承了 WaveStream, 用来读取波形文件
WaveFileWriter 继承了Stream, 用来写入文件, 常用于保存音频录制的数据.
AudioFileReader 通用的音频文件读取器, 可以读取波形文件, 也可以读取其他类型的音频文件例如 Aiff, MP3
IWaveProvider 波形提供者, 上面已经提到, 是音频播放的提供者, 通过拓展方法可以转换为 ISampleProvider
ISampleProvider 采样提供者, 上面已经提到, 通过拓展方法可以作为 WaveOut 的播放源
NAudio 使用NAudio
安装
1 Install-Package NAudio -Version 1.9.0
麦克风列表 1 2 3 4 5 6 7 8 9 10 11 using NAudio.Wave;public static void GetAudioMicrophone2 (){ for (int n = -1 ; n < WaveIn.DeviceCount; n++) { var caps = WaveIn.GetCapabilities(n); Console.WriteLine($@"{n} : {caps.ProductName} " ); } }
打印如下
-1: Microsoft Sound Mapper 0: 麦克风 (Realtek(R) Audio)
注意上面是从-1开始遍历的,我们获取麦克风设备的时候可以从0遍历。
默认的设备 1 2 3 4 MMDevice defaultCaptureDevice = WasapiCapture.GetDefaultCaptureDevice(); Console.WriteLine($@"默认麦克风:{defaultCaptureDevice.FriendlyName} " ); MMDevice defaultLoopbackCaptureDevice = WasapiLoopbackCapture.GetDefaultLoopbackCaptureDevice(); Console.WriteLine($@"默认扬声器:{defaultLoopbackCaptureDevice.FriendlyName} " );
或者
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 using NAudio.CoreAudioApi;class Program { private static MMDevice GetDefaultAudioDevice (DataFlow dataFlow ) { MMDeviceEnumerator enumerator = new MMDeviceEnumerator(); return enumerator.GetDefaultAudioEndpoint(dataFlow, Role.Multimedia); } static void Main () { MMDevice microphone = GetDefaultAudioDevice(DataFlow.Capture); if (microphone != null ) { System.Console.WriteLine($"默认麦克风设备名称: {microphone.FriendlyName} " ); } MMDevice speaker = GetDefaultAudioDevice(DataFlow.Render); if (speaker != null ) { System.Console.WriteLine($"默认扬声器设备名称: {speaker.FriendlyName} " ); } } }
获取支持的采样率 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 public static List<WaveFormat> GetCompatibleFormat (){ List<WaveFormat> formatList = new List<WaveFormat>(); MMDeviceEnumerator enumerator = new MMDeviceEnumerator(); MMDevice device = enumerator.GetDefaultAudioEndpoint(DataFlow.Capture, Role.Multimedia); AudioClient audioClient = device.AudioClient; int [] commonSampleRates = new [] { 8000 , 11025 , 16000 , 22050 , 44100 , 48000 }; int [] commonChannelCounts = new [] { 1 , 2 }; foreach (int sampleRate in commonSampleRates) { foreach (int channelCount in commonChannelCounts) { WaveFormat format = new WaveFormat(sampleRate, channelCount); if (audioClient.IsFormatSupported( AudioClientShareMode.Shared, format, out _ )) { Console.WriteLine($@"找到兼容格式: 采样率 {sampleRate} Hz, 声道数 {channelCount} " ); formatList.Add(format); } } } return formatList; }
采集 WaveIn 和 WasapiCapture 都可以采集麦克风。
他两个的区别
WaveIn :
WasapiCapture :
支持Windows Vista后的系统。
可以实现极低的延迟。
资源占用高。
只能使用系统设置的采样率。
WaveIn录制 WaveIn可以修改采样率。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 WaveIn cap = new WaveIn(); cap.WaveFormat = new WaveFormat(16000 , 1 ); WaveFileWriter writer = new WaveFileWriter("recorded_audio.wav" , cap.WaveFormat); cap.DataAvailable += (s, args ) => writer.Write( args .Buffer, 0 , args .BytesRecorded ); Console.WriteLine("录制开始" ); cap.StartRecording(); System.Timers.Timer timer = new System.Timers.Timer(); timer.Enabled = true ; timer.Interval = 3000 ; timer.Start(); timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => { Console.WriteLine("录制结束" ); timer.Stop(); cap.StopRecording(); writer.Close(); };
WasapiCapture录制 WasapiCapture不能修改采样率,只能使用系统设置的采样率。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 WasapiCapture cap = new WasapiCapture(); WaveFileWriter writer = new WaveFileWriter("recorded_audio.wav" , cap.WaveFormat); cap.DataAvailable += (s, args ) => writer.Write( args .Buffer, 0 , args .BytesRecorded ); Console.WriteLine("录制开始" ); cap.StartRecording(); System.Timers.Timer timer = new System.Timers.Timer(); timer.Enabled = true ; timer.Interval = 3000 ; timer.Start(); timer.Elapsed += (object sender, System.Timers.ElapsedEventArgs e) => { Console.WriteLine("录制结束" ); timer.Stop(); cap.StopRecording(); writer.Close(); };
FFT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 private static void Capture_DataAvailable (object sender, WaveInEventArgs e ){ _buffer.AddSamples( e.Buffer, 0 , e.BytesRecorded ); while (_buffer.BufferedBytes >= BUFFER_SIZE * 2 ) { byte [] readBuffer = new byte [BUFFER_SIZE * 2 ]; _buffer.Read( readBuffer, 0 , BUFFER_SIZE * 2 ); for (int i = 0 ; i < BUFFER_SIZE; i++) { short sample = (short )((readBuffer[i * 2 + 1 ] << 8 ) | readBuffer[i * 2 ]); FftBuffer[_fftPos].X = sample / 32768.0f ; FftBuffer[_fftPos].Y = 0 ; _fftPos++; if (_fftPos >= BUFFER_SIZE) { FFT(); _fftPos = 0 ; } } } } private static void FFT (){ for (int i = 0 ; i < BUFFER_SIZE; i++) { FftBuffer[i].X *= (float )FastFourierTransform.HammingWindow(i, BUFFER_SIZE); } FastFourierTransform.FFT( true , (int )Math.Log(BUFFER_SIZE, 2 ), FftBuffer ); for (int i = 0 ; i < BUFFER_SIZE / 2 ; i++) { double magnitude = Math.Sqrt(FftBuffer[i].X * FftBuffer[i].X + FftBuffer[i].Y * FftBuffer[i].Y); } }
匹配代码 特征提取 ZMfcc
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 namespace ZKeywordSpotting.utils { using System; using System.Collections; using System.Collections.Generic; using System.IO; using System.Linq; using System.Text; class ZMfcc { public static float SP_EMPHASIS_FACTOR = 0.97f ; public static int FS = 16 ; public static long FrmLen = 1024 ; public static ulong FFTLen = 512 ; public static double PI = 3.1415926536 ; public static int FiltNum = 40 ; public static int PCEP = 13 ; public static double [] Hamming = new double [FrmLen]; public static int temp_1; public static List<double > MFCCcoefficient = new List<double >(); static double _last; public static List<double > GetMfcc (string infilename ) { short [] buffer = new short [FrmLen]; double [] dBuff = new double [FrmLen]; double [] result = new double [FrmLen]; double [] data = new double [FrmLen]; double [] filtCoe1 = new double [FFTLen / 2 + 1 ]; double [] filtCoe2 = new double [FFTLen / 2 + 1 ]; int [] num = new int [FFTLen / 2 + 1 ]; double [] en = new double [FiltNum + 1 ]; double [] cep = new double [PCEP]; temp_1 = 0 ; int i; for (i = 0 ; i < FrmLen; i++) { buffer[i] = 0 ; dBuff[i] = result[i] = data[i] = 0.0f ; } for (i = 0 ; i < (int )FFTLen / 2 + 1 ; i++) { filtCoe1[i] = filtCoe2[i] = 0.0f ; num[i] = 0 ; } for (i = 0 ; i < FiltNum + 1 ; i++) { en[i] = 0.0f ; } List<ZComplex> vecList = new List<ZComplex>(); InitHamming(); InitFilt( filtCoe1, filtCoe2, num ); FileStream fs = new FileStream(infilename, FileMode.Open); BinaryReader br = new BinaryReader(fs, Encoding.Default); int counter = 0 ; List<double > resultList = new List<double >(); while (fs.Position < fs.Length) { short temp = br.ReadInt16(); if (counter < 1024 ) { buffer[counter] = temp; dBuff[counter] = buffer[counter]; counter++; } else { counter = 0 ; Preemphasis( dBuff, result, (short )FrmLen ); HammingWindow(result, data); compute_fft(data, vecList); CFilt( filtCoe1, filtCoe2, num, en, vecList ); Mfcc(en, cep); for (int j = 0 ; j < PCEP - 1 ; j++) { if (j == 1 ) { cep[j] = Math.Round(cep[j], 1 ); resultList.Add(cep[j]); temp_1++; } } vecList.Clear(); fs.Seek(-FrmLen / 2 , SeekOrigin.Current); } } Console.WriteLine(resultList.Count); fs.Close(); return resultList; } public static List<double > GetMfcc (byte [] inBytes ) { short [] buffer = new short [FrmLen]; double [] dBuff = new double [FrmLen]; double [] result = new double [FrmLen]; double [] data = new double [FrmLen]; double [] filtCoe1 = new double [FFTLen / 2 + 1 ]; double [] filtCoe2 = new double [FFTLen / 2 + 1 ]; int [] num = new int [FFTLen / 2 + 1 ]; double [] en = new double [FiltNum + 1 ]; double [] cep = new double [PCEP]; temp_1 = 0 ; int i; for (i = 0 ; i < FrmLen; i++) { buffer[i] = 0 ; dBuff[i] = result[i] = data[i] = 0.0f ; } for (i = 0 ; i < (int )FFTLen / 2 + 1 ; i++) { filtCoe1[i] = filtCoe2[i] = 0.0f ; num[i] = 0 ; } for (i = 0 ; i < FiltNum + 1 ; i++) { en[i] = 0.0f ; } List<ZComplex> vecList = new List<ZComplex>(); InitHamming(); InitFilt( filtCoe1, filtCoe2, num ); long byteRead = 0 ; int counter = 0 ; List<double > resultList = new List<double >(); while (byteRead < inBytes.Length) { byte [] bytes = new byte [] { inBytes[byteRead], inBytes[byteRead + 1 ] }; short temp = BitConverter.ToInt16(bytes, 0 ); byteRead += 2 ; if (counter < 1024 ) { buffer[counter] = temp; dBuff[counter] = buffer[counter]; counter++; } else { counter = 0 ; Preemphasis( dBuff, result, (short )FrmLen ); HammingWindow(result, data); compute_fft(data, vecList); CFilt( filtCoe1, filtCoe2, num, en, vecList ); Mfcc(en, cep); for (int j = 0 ; j < PCEP - 1 ; j++) { if (j == 1 ) { cep[j] = Math.Round(cep[j], 1 ); resultList.Add(cep[j]); temp_1++; } } vecList.Clear(); byteRead += (-FrmLen / 2 ); } } Console.WriteLine($@"resultList.Count:{resultList.Count} " ); return resultList; } private static void Preemphasis ( double [] buf, double [] result, short frmLen ) { int i; result[0 ] = buf[0 ] - SP_EMPHASIS_FACTOR * _last; for (i = 1 ; i < frmLen; i++) { result[i] = buf[i] - SP_EMPHASIS_FACTOR * buf[i - 1 ]; } _last = buf[(frmLen - 1 ) / 2 ]; } private static void InitHamming () { int i; double twopi = 2 * PI; for (i = 0 ; i < FrmLen; i++) { Hamming[i] = 0.54 - 0.46 * Math.Cos(i * twopi / (FrmLen - 1 )); } } private static void HammingWindow (double [] result, double [] data ) { int i; for (i = 0 ; i < FrmLen; i++) { data[i] = result[i] * Hamming[i]; } } private static void compute_fft (double [] data, List<ZComplex> vecList ) { for (int i = 0 ; i < (int )FFTLen; ++i) { if (i < FrmLen) { ZComplex temp = new ZComplex(data[i]); vecList.Add(temp); } else { ZComplex temp = new ZComplex(0 ); vecList.Add(temp); } } Fft(512 , vecList); } private static void Fft (uint ulN, List<ZComplex> vecList ) { uint ulPower = 0 ; uint ulN1 = ulN - 1 ; while (ulN1 > 0 ) { ulPower++; ulN1 /= 2 ; } for (ulong p = 0 ; p < ulN; p++) { uint ulIndex = 0 ; uint ulK = 1 ; BitArray bsIndex = new BitArray(BitConverter.GetBytes((uint )p)); for (uint j = 0 ; j < ulPower; j++) { ulIndex += bsIndex[(int )(ulPower - j - 1 )] ? ulK : 0 ; ulK *= 2 ; } if (ulIndex > p) { ZComplex c = vecList[(int )p]; vecList[(int )p] = vecList[(int )ulIndex]; vecList[(int )ulIndex] = c; } } List<ZComplex> vecW = new List<ZComplex>(); for (uint i = 0 ; i < ulN / 2 ; i++) { vecW.Add(new ZComplex(Math.Cos(2 * i * PI / ulN), -1 * Math.Sin(2 * i * PI / ulN))); } uint ulGroupLength = 1 ; ZComplex[] vecW1 = vecW.ToArray<ZComplex>(); for (uint b = 0 ; b < ulPower; b++) { uint ulHalfLength = ulGroupLength; ulGroupLength *= 2 ; for (int j = 0 ; j < ulN; j += (int )ulGroupLength) { for (int k = 0 ; k < (int )ulHalfLength; k++) { ZComplex cw = vecW1[k * ulN / ulGroupLength] * vecList[j + k + (int )ulHalfLength]; ZComplex c1 = vecList[j + k] + cw; ZComplex c2 = vecList[j + k] - cw; vecList[j + k] = c1; vecList[j + k + (int )ulHalfLength] = c2; } } } } private static void InitFilt ( double [] filtCoe1, double [] filtCoe2, int [] num ) { int i, k; double [] filtFreq = new double [FiltNum + 2 ]; double [] bw = new double [FiltNum + 1 ]; double low = 400.0 / 3.0 ; short lin = 13 ; const double linSpacing = 200.0 / 3.0 ; short log = 27 ; const double logSpacing = 1.0711703f ; for (i = 0 ; i < lin; i++) { filtFreq[i] = low + i * linSpacing; } for (i = lin; i < lin + log + 2 ; i++) { filtFreq[i] = filtFreq[lin - 1 ] * Math.Pow(logSpacing, i - lin + 1 ); } for (i = 0 ; i < FiltNum + 1 ; i++) { bw[i] = filtFreq[i + 1 ] - filtFreq[i]; } for (i = 0 ; i <= (int )FFTLen / 2 ; i++) { num[i] = 0 ; } bool bFindFilt; for (i = 0 ; i <= (int )FFTLen / 2 ; i++) { double freq = FS * 1000.0F * i / (double )(FFTLen); bFindFilt = false ; for (k = 0 ; k <= FiltNum; k++) { if (freq >= filtFreq[k] && freq <= filtFreq[k + 1 ]) { bFindFilt = true ; if (k == FiltNum) { filtCoe1[i] = 0.0F ; } else { filtCoe1[i] = (freq - filtFreq[k]) / bw[k] * 2.0f / (bw[k] + bw[k + 1 ]); } if (k == 0 ) { filtCoe2[i] = 0.0F ; } else { filtCoe2[i] = (filtFreq[k + 1 ] - freq) / bw[k] * 2.0f / (bw[k] + bw[k - 1 ]); } num[i] = k; break ; } } if (!bFindFilt) { num[i] = 0 ; filtCoe1[i] = 0.0F ; filtCoe2[i] = 0.0F ; } } } private static void CFilt ( double [] filtCoe1, double [] filtCoe2, int [] num, double [] en, List<ZComplex> vecList ) { double temp; int id, id1, id2; for (id = 0 ; id < FiltNum; id++) { en[id] = 0.0F ; } for (id = 0 ; id <= (int )FFTLen / 2 ; id++) { temp = vecList[id].Real * vecList[id].Real + vecList[id].Image * vecList[id].Image; temp = temp / ((FrmLen / 2 ) * (FrmLen / 2 )); id1 = num[id]; if (id1 == 0 ) en[id1] = en[id1] + filtCoe1[id] * temp; if (id1 == FiltNum) en[id1 - 1 ] = en[id1 - 1 ] + filtCoe2[id] * temp; if ((id1 > 0 ) && (id1 < FiltNum)) { id2 = id1 - 1 ; en[id1] = en[id1] + filtCoe1[id] * temp; en[id2] = en[id2] + filtCoe2[id] * temp; } } for (id = 0 ; id < FiltNum; id++) { if (en[id] != 0 ) en[id] = Math.Log10(en[id]); } } private static void Mfcc (double [] en, double [] cep ) { int idcep, iden; for (idcep = 0 ; idcep < PCEP; idcep++) { cep[idcep] = 0.0f ; for (iden = 0 ; iden < FiltNum; iden++) { if (iden == 0 ) cep[idcep] = cep[idcep] + en[iden] * Math.Cos(idcep * (iden + 0.5f ) * PI / (FiltNum)) * 10.0f * Math.Sqrt(1 / (double )FiltNum); else cep[idcep] = cep[idcep] + en[iden] * Math.Cos(idcep * (iden + 0.5f ) * PI / (FiltNum)) * 10.0f * Math.Sqrt(2 / (double )FiltNum); } MFCCcoefficient.Add(cep[idcep]); } } } }
ZComplex
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 namespace ZKeywordSpotting.utils { using System; public class ZComplex { public ZComplex () : this (0 , 0 ) { } public ZComplex (double real ) : this (real, 0 ) { } public ZComplex (double real, double image ) { this ._real = real; this ._image = image; } private double _real; public double Real { get => _real; set => _real = value ; } private double _image; public double Image { get => _image; set => _image = value ; } public static ZComplex operator +(ZComplex c1, ZComplex c2) { return new ZComplex(c1._real + c2._real, c1._image + c2._image); } public static ZComplex operator -(ZComplex c1, ZComplex c2) { return new ZComplex(c1._real - c2._real, c1._image - c2._image); } public static ZComplex operator *(ZComplex c1, ZComplex c2) { return new ZComplex(c1._real * c2._real - c1._image * c2._image, c1._image * c2._real + c1._real * c2._image); } public double ToModul () { return Math.Sqrt(_real * _real + _image * _image); } public override string ToString () { if (Real == 0 && Image == 0 ) { return "0" ; } if (Real == 0 && (Math.Abs(Image) - 1.0 != 0 )) { return $"{Image} i" ; } if (Image == 0 ) { return $"{Real} " ; } if (Image - 1 == 0 ) { return "i" ; } if (Image + 1 == 0 ) { return "- i" ; } if (Image < 0 ) { return string .Format( "{0} - {1} i" , Real, -Image ); } return string .Format( "{0} + {1} i" , Real, Image ); } } }
相似度 ZDtw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 namespace ZKeywordSpotting.utils { using System; class ZDtw { protected const int DTWMAXNUM = 3000 ; protected static double DTWVERYBIG = 100000000.0 ; protected double [,] distance = new double [DTWMAXNUM, DTWMAXNUM]; protected double [,] dtwpath = new double [DTWMAXNUM, DTWMAXNUM]; public static double GetDtw (double [] a, double [] b ) { ZDtw dtw = new ZDtw(); double ret1 = dtw.DtwDistanceFun(a, b); return ret1; } public double DtwDistanceFun (double [] a, double [] b ) { int aLength = a.Length; int bLength = b.Length; int r = Math.Min(aLength, bLength) / 30 ; int i, j; int r2 = r + Math.Abs(aLength - bLength); if (aLength > DTWMAXNUM || bLength > DTWMAXNUM) { return -1.0 ; } for (i = 0 ; i < aLength; i++) { for (j = 0 ; j < bLength; j++) { dtwpath[i, j] = 0 ; distance[i, j] = DTWVERYBIG; } } distance[0 , 0 ] = 2 * Math.Abs(a[0 ] - b[0 ]); for (i = 1 ; i < r2; i++) { if (i < aLength) distance[i, 0 ] = distance[i - 1 , 0 ] + Math.Abs(a[i] - b[0 ]); else break ; } for (j = 1 ; j < r2; j++) { if (j < bLength) distance[0 , j] = distance[0 , j - 1 ] + Math.Abs(a[0 ] - b[j]); else break ; } for (j = 1 ; j < bLength; j++) { int istart = j - r2; if (j <= r2) istart = 1 ; int imax = j + r2; if (imax >= aLength) imax = aLength - 1 ; for (i = istart; i <= imax; i++) { double g1 = distance[i - 1 , j] + Math.Abs(a[i] - b[j]); double g2 = distance[i - 1 , j - 1 ] + 2 * Math.Abs(a[i] - b[j]); double g3 = distance[i, j - 1 ] + Math.Abs(a[i] - b[j]); g2 = (g1 > g2) ? g2 : g1; g3 = (g2 > g3) ? g3 : g2; distance[i, j] = g3; } } double dist = distance[aLength - 1 , bLength - 1 ] / (aLength + bLength); return dist; } } }
概念 byte/short byte 是 1 个字节(即 8 位)。
short 是 2 个字节(即 16 位)。
C# 中的 short 实际上就是对 System.Int16 的别名映射。
采样率 采样率(Sample Rate)是指在单位时间内对音频信号进行采样的次数,单位为赫兹(Hz)。
16000Hz 的采样率意味着每秒钟会对音频信号进行 16000 次采样。
音频单位 在音频处理领域,音频数据常常以 16 位(也就是 2 字节)的整数形式进行存储。
1byte(字节) = 8 bit(比特)
运算 音频数据量的计算公式为:
数据量(字节/秒) = 采样率(Hz)× 声道数 × 每个样本的字节数
假设是单声道(声道数为 1),每个样本是 16 位(即 2 字节),采样率为 16000 Hz,那么每秒产生的数据量为:
数据量 = 16000×1×2 = 32000 字节/秒
已知缓冲区大小为 bufferSize * 2 字节,这里 bufferSize 为 1024,所以缓冲区大小为 (1024*2 = 2048) 字节。
根据时间的计算公式:
将缓冲区大小 2048 字节和每秒数据量 96000 字节 / 秒代入公式,可得:
4096*2/32000 = 0.256
所以,在采样率为 48000 Hz、单声道、每个样本 16 位且 bufferSize 为 4096的情况下,填满 bufferSize * 2 字节的缓冲区大约需要 0.02133 秒,即约 21.33 毫秒。
DTW 动态时间规整(DTW)算法概述
动态时间规整(Dynamic Time Warping,DTW)是一种用于衡量两个时间序列之间相似度的算法,在语音识别、手势识别等领域应用广泛。
当要比较两个时间序列时,由于它们的长度可能不同,或者时间上的伸缩不一致,直接比较可能会不准确。
DTW 算法通过寻找两个序列之间的最优匹配路径,从而计算出它们之间的相似度。
特征向量维度的意义
在语音处理场景中,通常会对语音信号进行特征提取,将语音信号转换为一系列的特征向量。每个特征向量包含了语音在某一时刻的特征信息,例如使用梅尔频率倒谱系数(MFCC)作为特征时,每个 MFCC 特征向量通常有一定的维度。
这里的 13 就表示每个特征向量的维度数量。也就是说,在进行 DTW 匹配时,输入的每个时间点上的特征向量都是一个 13 维的向量。比如,你提取了实时语音和模板语音的 MFCC 特征,每个 MFCC 特征向量有 13 个元素,在使用 DTW 算法比较这两个语音的特征序列时,就需要告诉 DTW 算法每个特征向量的维度是 13。