1.3090
- Date: 2020 Ampere
- Pielines/ Cuda cores: 10496
2.V100
- Date: 2018 Volta
- Pielines/ Cuda cores: 5129
3.结构 & Core比较:
-
v100优点:
- v100功耗小
- v100较快的双精度(fp64)和混合精度(fp16+fp32)
- pcie版的NVLink与2080ti完全一致
-
v100缺点:
- 不支持整数格式计算,即INT4、INT8, 即无量化推理能力, 除非上turing架构
- 不支持半精度的bf16, 只支持fp16 (bf: Google brain floating point format, 可加快训练速度)
- 不支持单精度fp32
- 不支持awq量化、支持gptq量化
- 不支持flash-attention、支持vllm
4. 理论性能(Theoretial Performance)
- 3090 vs. A100 NVIDIA RTX 3090 NVIDIA A100 40 GB (PCIe) Difference
- FP16 (half) performance 35.58 TFLOPS 77.97 TFLOPS 42.39 TFLOPS (119%)
- FP32 (float) performance 35.58 TFLOPS 19.49 TFLOPS 16.09 TFLOPS (-45%)
- FP64 (double) performance 556 GFLOPS 9746 GFLOPS 9190 GFLOPS (1653%)
- Pixel Rate 189.8 GPixel/s 225.6 GPixel/s 35.8 GPixel/s (19%)
- Texture Rate 556 GTexel/s 609.1 GTexel/s 53.1 GTexel/s (10%)
- 多类N卡比较
4.性能详细对比
Reference
- https://technical.city/en/video/GeForce-RTX-3090-vs-Tesla-V100-PCIe-32-GB
- https://zhuanlan.zhihu.com/p/667255235
- https://bizon-tech.com/gpu-benchmarks/NVIDIA-RTX-3090-vs-NVIDIA-A100-40-GB-(PCIe)/579vs592
- https://www.bilibili.com/read/cv33373992/?from=readlist