gau-nernst's blog
Thien Tran
Thien Tran
A personal blog by Thien Tran

My first Multi-GPU kernel: Writing All-to-all for AMD MI300X

My first Multi-GPU kernel: Writing All-to-all for AMD MI300X

Use NVRTC to explore MMA instruction variants

Use NVRTC to explore MMA instruction variants

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++

Writing Speed-of-Light Flash Attention for 5090 in CUDA C++