gau-nernst's blog
twitter
github
Thien Tran
A personal blog by Thien Tran
My first Multi-GPU kernel: Writing All-to-all for AMD MI300X
Nov 2, 2025
My first Multi-GPU kernel: Writing All-to-all for AMD MI300X
Use NVRTC to explore MMA instruction variants
Sep 8, 2025
Use NVRTC to explore MMA instruction variants
Writing Speed-of-Light Flash Attention for 5090 in CUDA C++
Aug 23, 2025
Writing Speed-of-Light Flash Attention for 5090 in CUDA C++