Towards Bitrate-Efficient and Noise-Robust Speech Coding with Variable Bitrate RVQ

1Interdisciplinary Program in Artificial Intelligence
2Department of Intelligence and Information
3Artificial Intelligence Institute
Seoul National University, Republic of Korea

Abstract

Residual Vector Quantization (RVQ) has become a dominant approach in neural speech and audio coding, providing high fidelity compression. However, speech coding presents additional challenges due to real-world noise, which degrades compression efficiency. Standard codecs allocate bits uniformly, wasting bitrate on noise components that do not contribute to intelligibility. This paper introduces a Variable Bitrate RVQ (VRVQ) framework for noise-robust speech coding, dynamically adjusting bitrate per frame to optimize rate-distortion trade-offs. Unlike constant bitrate (CBR) RVQ, our method prioritizes critical speech components while suppressing resid- ual noise. Additionally, we integrate a feature denoiser to further improve noise robustness. Experimental results show that VRVQ improves rate-distortion trade-offs over conventional methods, achieving better compression efficiency and perceptual quality in noisy conditions.

16 kHz samples

Sample 1

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -7.07 1.0 Impmap Image 0.97 4.44
3 0.94 1.01 2.0 Impmap Image 1.14 4.53
5 1.56 3.19 4.0 Impmap Image 1.24 4.73
7 2.19 4.3 6.0 Impmap Image 1.28 4.77

Sample 2

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -4.57 1.0 Impmap Image 1.52 5.73
3 0.94 3.2 2.0 Impmap Image 1.89 5.98
5 1.56 4.76 4.0 Impmap Image 2.09 5.98
7 2.19 5.85 6.0 Impmap Image 2.15 6.01

Sample 3

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -4.52 1.0 Impmap Image 1.88 7.66
3 0.94 3.79 2.0 Impmap Image 2.26 7.82
5 1.56 6.34 4.0 Impmap Image 2.39 7.85
7 2.19 7.42 6.0 Impmap Image 2.41 7.87

Sample 4

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -3.28 1.0 Impmap Image 1.49 9.32
3 0.94 5.34 2.0 Impmap Image 1.76 9.5
5 1.56 8.25 4.0 Impmap Image 1.89 9.51
7 2.19 9.33 6.0 Impmap Image 1.95 9.51

Sample 5

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -2.57 1.0 Impmap Image 1.21 8.22
3 0.94 3.88 2.0 Impmap Image 1.4 8.45
5 1.56 6.83 4.0 Impmap Image 1.56 8.46
7 2.19 7.75 6.0 Impmap Image 1.63 8.49

Sample 6

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -4.08 1.0 Impmap Image 1.87 6.86
3 0.94 2.52 2.0 Impmap Image 2.16 7.01
5 1.56 5.35 4.0 Impmap Image 2.28 7.01
7 2.19 6.58 6.0 Impmap Image 2.33 7.02

Sample 7

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -1.97 1.0 Impmap Image 1.0 5.0
3 0.94 3.82 2.0 Impmap Image 1.24 5.39
5 1.56 5.29 4.0 Impmap Image 1.38 5.41
7 2.19 5.9 6.0 Impmap Image 1.44 5.51

Sample 8

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -3.1 1.0 Impmap Image 1.58 7.95
3 0.94 4.87 2.0 Impmap Image 1.9 8.12
5 1.56 6.93 4.0 Impmap Image 2.05 8.15
7 2.19 8.08 6.0 Impmap Image 2.1 8.15

Sample 9

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -0.47 1.0 Impmap Image 1.17 9.55
3 0.94 6.32 2.0 Impmap Image 1.41 9.87
5 1.56 8.33 4.0 Impmap Image 1.57 9.9
7 2.19 9.5 6.0 Impmap Image 1.64 9.91

Sample 10

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -3.39 1.0 Impmap Image 1.27 5.72
3 0.94 0.95 2.0 Impmap Image 1.57 5.84
5 1.56 4.88 4.0 Impmap Image 1.73 5.78
7 2.19 5.37 6.0 Impmap Image 1.78 5.85

Sample 11

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -0.1 1.0 Impmap Image 1.34 10.62
3 0.94 6.92 2.0 Impmap Image 1.6 10.71
5 1.56 9.9 4.0 Impmap Image 1.74 10.8
7 2.19 11.01 6.0 Impmap Image 1.79 10.81

Sample 12

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -7.69 1.0 Impmap Image 1.6 9.26
3 0.94 4.36 2.0 Impmap Image 1.89 9.48
5 1.56 7.79 4.0 Impmap Image 2.05 9.51
7 2.19 8.84 6.0 Impmap Image 2.11 9.52

Sample 13

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -0.55 1.0 Impmap Image 1.13 7.95
3 0.94 5.92 2.0 Impmap Image 1.4 8.27
5 1.56 7.78 4.0 Impmap Image 1.55 8.37
7 2.19 8.49 6.0 Impmap Image 1.61 8.41

Sample 14

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 -1.31 1.0 Impmap Image 1.19 5.74
3 0.94 4.44 2.0 Impmap Image 1.5 5.84
5 1.56 5.45 4.0 Impmap Image 1.68 5.92
7 2.19 6.01 6.0 Impmap Image 1.77 5.94

Sample 15

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.31 1.0 1.0 Impmap Image 2.11 12.02
3 0.94 7.79 2.0 Impmap Image 2.45 12.4
5 1.56 10.8 4.0 Impmap Image 2.54 12.43
7 2.19 12.16 6.0 Impmap Image 2.56 12.43

48 kHz samples

Sample 1

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 0.27 0.5 Impmap Image 1.77 4.37
3 2.81 4.06 1.0 Impmap Image 2.51 5.73
5 4.69 5.67 2.0 Impmap Image 3.11 6.08
7 6.56 6.63 4.0 Impmap Image 3.43 6.16

Sample 2

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 0.51 0.5 Impmap Image 2.19 5.39
3 2.81 5.34 1.0 Impmap Image 3.52 6.78
5 4.69 7.0 2.0 Impmap Image 4.58 7.32
7 6.56 7.87 4.0 Impmap Image 5.19 7.46

Sample 3

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 1.31 0.5 Impmap Image 2.56 6.73
3 2.81 6.5 1.0 Impmap Image 4.34 8.78
5 4.69 8.13 2.0 Impmap Image 5.61 9.34
7 6.56 9.22 4.0 Impmap Image 6.24 9.43

Sample 4

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 2.62 0.5 Impmap Image 2.26 8.92
3 2.81 8.27 1.0 Impmap Image 3.66 11.11
5 4.69 9.87 2.0 Impmap Image 4.68 11.75
7 6.56 11.35 4.0 Impmap Image 5.21 11.86

Sample 5

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 1.37 0.5 Impmap Image 1.99 7.25
3 2.81 7.09 1.0 Impmap Image 3.0 9.17
5 4.69 8.77 2.0 Impmap Image 3.65 9.62
7 6.56 9.66 4.0 Impmap Image 3.99 9.66

Sample 6

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 0.31 0.5 Impmap Image 2.61 5.87
3 2.81 5.78 1.0 Impmap Image 4.45 8.26
5 4.69 8.13 2.0 Impmap Image 5.71 8.87
7 6.56 9.42 4.0 Impmap Image 6.29 8.97

Sample 7

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 1.54 0.5 Impmap Image 1.75 4.82
3 2.81 5.21 1.0 Impmap Image 2.47 5.84
5 4.69 6.34 2.0 Impmap Image 3.19 6.27
7 6.56 6.74 4.0 Impmap Image 3.52 6.47

Sample 8

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 2.75 0.5 Impmap Image 2.26 7.22
3 2.81 7.28 1.0 Impmap Image 3.65 9.09
5 4.69 9.05 2.0 Impmap Image 4.73 9.77
7 6.56 10.07 4.0 Impmap Image 5.26 9.91

Sample 9

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 5.45 0.5 Impmap Image 1.9 8.87
3 2.81 9.62 1.0 Impmap Image 2.81 10.68
5 4.69 11.07 2.0 Impmap Image 3.72 11.57
7 6.56 11.99 4.0 Impmap Image 4.14 11.72

Sample 10

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 -1.84 0.5 Impmap Image 1.97 6.15
3 2.81 5.81 1.0 Impmap Image 3.0 7.52
5 4.69 7.84 2.0 Impmap Image 3.95 7.48
7 6.56 8.74 4.0 Impmap Image 4.5 7.5

Sample 11

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 4.33 0.5 Impmap Image 2.05 9.46
3 2.81 9.28 1.0 Impmap Image 3.2 11.51
5 4.69 11.21 2.0 Impmap Image 4.15 12.0
7 6.56 11.86 4.0 Impmap Image 4.79 12.15

Sample 12

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 -1.01 0.5 Impmap Image 2.38 9.79
3 2.81 7.91 1.0 Impmap Image 3.87 12.29
5 4.69 10.32 2.0 Impmap Image 4.79 12.75
7 6.56 11.86 4.0 Impmap Image 5.38 12.85

Sample 13

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 4.81 0.5 Impmap Image 1.8 7.11
3 2.81 8.02 1.0 Impmap Image 2.59 8.11
5 4.69 9.28 2.0 Impmap Image 3.53 8.55
7 6.56 9.95 4.0 Impmap Image 4.17 8.73

Sample 14

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 2.74 0.5 Impmap Image 1.95 5.06
3 2.81 5.75 1.0 Impmap Image 2.94 6.47
5 4.69 7.04 2.0 Impmap Image 4.0 7.11
7 6.56 7.67 4.0 Impmap Image 4.66 7.29

Sample 15

Noisy input Clean speech
Noisy spectrogram Clean spectrogram
CBR VBR
$N_q$ Est. kbps SI-SDR Level $l$ Imp. map Est. kbps SI-SDR
1 0.94 6.43 0.5 Impmap Image 2.64 11.17
3 2.81 11.03 1.0 Impmap Image 4.53 13.54
5 4.69 13.18 2.0 Impmap Image 6.3 14.63
7 6.56 14.46 4.0 Impmap Image 7.1 14.88