Skip to main content

[News] AI / HPC Weekly Clips — 2026.04.14

· One min read
hwkim-dev
Developer

A quick summary of notable updates in deep learning inference, GPU architecture, and HPC this week.

Highlights

1. NVIDIA Blackwell 2nd-Gen Inference Benchmarks Published

New benchmarks show up to 4× throughput improvement over H100 for FP8 inference workloads. Memory bandwidth efficiency during LLM decoding stages appears to be the biggest gain.

2. FlashAttention-3 Posted on arXiv

The third iteration of the Flash Attention series is out. It leverages Hopper's Tensor Memory Accelerator (TMA) and asynchronous pipelines to further reduce attention kernel overhead on H100.

3. PyTorch 2.7 Released

Stability improvements for torch.compile and enhanced CUDA Graph automation are the headline features.


These are personal notes — please verify details from original sources!