[News] AI / HPC Weekly Clips — 2026.04.14
· One min read
A quick summary of notable updates in deep learning inference, GPU architecture, and HPC this week.
Highlights
1. NVIDIA Blackwell 2nd-Gen Inference Benchmarks Published
New benchmarks show up to 4× throughput improvement over H100 for FP8 inference workloads. Memory bandwidth efficiency during LLM decoding stages appears to be the biggest gain.
2. FlashAttention-3 Posted on arXiv
The third iteration of the Flash Attention series is out. It leverages Hopper's Tensor Memory Accelerator (TMA) and asynchronous pipelines to further reduce attention kernel overhead on H100.
3. PyTorch 2.7 Released
Stability improvements for torch.compile and enhanced CUDA Graph automation are the headline features.
These are personal notes — please verify details from original sources!
