Skip to content

v1.6.0

Compare
Choose a tag to compare
@ggerganov ggerganov released this 15 May 07:13
· 12 commits to master since this release
08981d1

Overview

  • Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152)
  • Faster ppc64 performance (40aeeee) (not tested)
  • Fix main slowdown bug (#2070)

Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels

Some performance numbers for this release:

M1 Pro

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 0 39.21 1.74 0.61 0.04 22c96b4
M1 Pro METAL base 1 0 70.76 2.60 0.93 0.06 22c96b4
M1 Pro METAL small 1 0 217.28 6.42 2.14 0.17 22c96b4
M1 Pro METAL medium 1 0 596.74 14.43 4.75 0.45 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M1 Pro METAL tiny 1 1 30.77 1.59 0.54 0.03 22c96b4
M1 Pro METAL base 1 1 60.42 2.29 0.81 0.05 22c96b4
M1 Pro METAL small 1 1 183.82 5.12 1.81 0.14 22c96b4
M1 Pro METAL medium 1 1 517.92 11.60 4.01 0.38 22c96b4

M2 Ultra

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 0 12.32 1.35 0.49 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 0 11.65 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 0 12.08 1.30 0.51 0.01 22c96b4
M2 ULTRA METAL base 1 0 17.58 1.90 0.76 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 0 18.89 1.86 0.79 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 0 20.69 1.88 0.79 0.02 22c96b4
M2 ULTRA METAL small 1 0 49.32 3.85 1.71 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 0 54.91 3.81 1.82 0.06 22c96b4
M2 ULTRA METAL small-q5_1 1 0 54.92 3.81 1.79 0.06 22c96b4
M2 ULTRA METAL medium 1 0 134.34 8.04 3.82 0.13 22c96b4
M2 ULTRA METAL medium-q5_0 1 0 151.68 7.59 4.07 0.14 22c96b4
M2 ULTRA METAL medium-q5_1 1 0 151.58 7.67 4.07 0.14 22c96b4
M2 ULTRA METAL medium-dis 1 0 120.82 1.07 0.41 0.02 22c96b4
M2 ULTRA METAL large-v2 1 0 235.63 12.27 5.85 0.22 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 0 273.38 11.17 6.40 0.26 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 0 272.44 11.32 6.29 0.26 22c96b4
M2 ULTRA METAL large-v2-dis 1 0 212.51 1.20 0.47 0.02 22c96b4
CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
M2 ULTRA METAL tiny 1 1 9.07 1.33 0.45 0.01 22c96b4
M2 ULTRA METAL tiny-q5_0 1 1 9.74 1.33 0.47 0.01 22c96b4
M2 ULTRA METAL tiny-q5_1 1 1 8.93 1.31 0.46 0.01 22c96b4
M2 ULTRA METAL base 1 1 15.75 1.87 0.71 0.02 22c96b4
M2 ULTRA METAL base-q5_0 1 1 17.04 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL base-q5_1 1 1 17.17 1.83 0.74 0.02 22c96b4
M2 ULTRA METAL small 1 1 42.33 3.64 1.60 0.05 22c96b4
M2 ULTRA METAL small-q5_0 1 1 47.61 3.63 1.70 0.05 22c96b4
M2 ULTRA METAL small-q5_1 1 1 47.70 3.66 1.68 0.05 22c96b4
M2 ULTRA METAL medium 1 1 114.42 7.53 3.55 0.11 22c96b4
M2 ULTRA METAL medium-q5_0 1 1 132.63 7.02 3.77 0.13 22c96b4
M2 ULTRA METAL medium-q5_1 1 1 132.28 7.10 3.76 0.13 22c96b4
M2 ULTRA METAL medium-dis 1 1 102.34 1.01 0.42 0.01 22c96b4
M2 ULTRA METAL large-v2 1 1 203.01 11.03 5.45 0.20 22c96b4
M2 ULTRA METAL large-v2-q5_0 1 1 240.05 10.18 5.98 0.23 22c96b4
M2 ULTRA METAL large-v2-q5_1 1 1 239.22 10.23 5.87 0.23 22c96b4
M2 ULTRA METAL large-v2-dis 1 1 181.14 1.14 0.48 0.02 22c96b4

Ryzen 9 5950X + RTX 2060

CPU Config Model Th FA Enc. Dec. Bch5 PP Commit
Ryzen 9 5950X AVX2 tiny 8 0 195.29 1.57 0.51 0.26 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_0 8 0 213.33 1.10 0.50 0.30 22c96b4
Ryzen 9 5950X AVX2 tiny-q5_1 8 0 219.38 1.18 0.53 0.32 22c96b4
Ryzen 9 5950X AVX2 base 8 0 424.85 3.71 1.03 0.46 22c96b4
Ryzen 9 5950X AVX2 base-q5_0 8 0 473.61 1.81 0.82 0.52 22c96b4
Ryzen 9 5950X AVX2 base-q5_1 8 0 484.14 1.92 0.85 0.56 22c96b4
Ryzen 9 5950X AVX2 small 8 0 1458.32 12.66 3.09 1.26 22c96b4
Ryzen 9 5950X AVX2 small-q5_0 8 0 1673.22 6.42 2.18 1.45 22c96b4
Ryzen 9 5950X AVX2 small-q5_1 8 0 1724.78 6.72 2.32 1.52 22c96b4
Ryzen 9 5950X AVX2 medium 8 0 4333.87 36.80 8.56 3.37 22c96b4
Ryzen 9 5950X AVX2 medium-q5_0 8 0 5194.09 19.21 5.71 3.97 22c96b4
Ryzen 9 5950X AVX2 medium-q5_1 8 0 5450.39 20.01 5.99 4.17 22c96b4
Ryzen 9 5950X AVX2 medium-dis 8 0 3995.19 5.08 1.21 0.55 22c96b4
Ryzen 9 5950X AVX2 large-v2 8 0 8056.16 69.74 16.11 6.13 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_0 8 0 9799.58 35.16 10.49 7.28 22c96b4
Ryzen 9 5950X AVX2 large-v2-q5_1 8 0 ms 36.74 11.02 7.65 22c96b4
Ryzen 9 5950X AVX2 large-v2-dis 8 0 7490.03 7.40 1.70 0.72 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 0 12.54 0.93 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 0 12.73 0.98 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 0 12.72 0.99 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA base 8 0 24.14 1.28 0.41 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_0 8 0 24.58 1.38 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_1 8 0 24.58 1.37 0.35 0.03 22c96b4
RTX 2060 AVX2 CUDA small 8 0 74.70 2.91 0.84 0.07 22c96b4
RTX 2060 AVX2 CUDA small-q5_0 8 0 76.12 2.84 0.77 0.08 22c96b4
RTX 2060 AVX2 CUDA small-q5_1 8 0 76.14 2.84 0.76 0.08 22c96b4
RTX 2060 AVX2 CUDA medium 8 0 200.69 6.46 1.83 0.17 22c96b4
RTX 2060 AVX2 CUDA medium-q5_0 8 0 204.80 5.90 1.65 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-q5_1 8 0 205.61 5.85 1.61 0.19 22c96b4
RTX 2060 AVX2 CUDA medium-dis 8 0 186.17 0.86 0.24 0.02 22c96b4
RTX 2060 AVX2 CUDA large-v2 8 0 347.22 10.36 2.82 0.29 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_0 8 0 357.06 8.81 2.58 0.34 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_1 8 0 356.97 8.62 2.49 0.33 22c96b4
RTX 2060 AVX2 CUDA large-v2-dis 8 0 318.05 1.03 0.34 0.04 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
RTX 2060 AVX2 CUDA tiny 8 1 7.21 0.76 0.29 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_0 8 1 7.42 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA tiny-q5_1 8 1 7.38 0.82 0.18 0.02 22c96b4
RTX 2060 AVX2 CUDA base 8 1 13.49 1.04 0.36 0.02 22c96b4
RTX 2060 AVX2 CUDA base-q5_0 8 1 13.94 1.13 0.26 0.03 22c96b4
RTX 2060 AVX2 CUDA base-q5_1 8 1 13.94 1.14 0.26 0.03 22c96b4
RTX 2060 AVX2 CUDA small 8 1 42.81 2.33 0.69 0.05 22c96b4
RTX 2060 AVX2 CUDA small-q5_0 8 1 44.43 2.25 0.59 0.06 22c96b4
RTX 2060 AVX2 CUDA small-q5_1 8 1 44.11 2.24 0.58 0.06 22c96b4
RTX 2060 AVX2 CUDA medium 8 1 115.47 5.17 1.45 0.11 22c96b4
RTX 2060 AVX2 CUDA medium-q5_0 8 1 120.37 4.63 1.25 0.13 22c96b4
RTX 2060 AVX2 CUDA medium-q5_1 8 1 120.28 4.55 1.21 0.13 22c96b4
RTX 2060 AVX2 CUDA medium-dis 8 1 101.69 0.75 0.20 0.02 22c96b4
RTX 2060 AVX2 CUDA large-v2 8 1 205.67 8.49 2.19 0.18 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_0 8 1 214.07 6.88 1.94 0.22 22c96b4
RTX 2060 AVX2 CUDA large-v2-q5_1 8 1 213.98 6.70 1.86 0.22 22c96b4
RTX 2060 AVX2 CUDA large-v2-dis 8 1 176.71 0.91 0.31 0.03 22c96b4

V100

GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 1 0 6.21 1.11 0.30 0.02 22c96b4
V100 AVX2 CUDA tiny-q5_1 1 0 5.97 1.10 0.26 0.02 22c96b4
V100 AVX2 CUDA base 1 0 10.95 1.47 0.42 0.03 22c96b4
V100 AVX2 CUDA base-q5_1 1 0 11.13 1.53 0.36 0.03 22c96b4
V100 AVX2 CUDA small 1 0 31.57 2.96 0.84 0.05 22c96b4
V100 AVX2 CUDA small-q5_1 1 0 32.19 3.14 0.75 0.05 22c96b4
V100 AVX2 CUDA medium 1 0 85.88 6.49 1.80 0.10 22c96b4
V100 AVX2 CUDA medium-q5_0 1 0 87.53 5.82 1.37 0.10 22c96b4
V100 AVX2 CUDA large-v2 1 0 142.23 8.92 2.62 0.15 22c96b4
GPU Config Model Th FA Enc. Dec. Bch5 PP Commit
V100 AVX2 CUDA tiny 1 1 3.96 0.82 0.24 0.02 22c96b4
V100 AVX2 CUDA tiny-q5_1 1 1 4.05 0.85 0.18 0.02 22c96b4
V100 AVX2 CUDA base 1 1 7.21 1.16 0.36 0.02 22c96b4
V100 AVX2 CUDA base-q5_1 1 1 7.39 1.21 0.26 0.02 22c96b4
V100 AVX2 CUDA small 1 1 19.81 2.41 0.71 0.04 22c96b4
V100 AVX2 CUDA small-q5_1 1 1 20.50 2.31 0.51 0.04 22c96b4
V100 AVX2 CUDA medium 1 1 56.02 4.89 1.44 0.07 22c96b4
V100 AVX2 CUDA medium-q5_0 1 1 57.85 4.73 1.09 0.08 22c96b4
V100 AVX2 CUDA large-v2 1 1 92.73 7.18 2.14 0.10 22c96b4

For reference, here is the performance for v1.5.0

What's Changed

New Contributors

Full Changelog: v1.5.5...v1.6.0

Binaries

https://github.com/ggerganov/whisper.cpp/actions/runs/9091347125