Calling CUDA in 3000 Words
You’ve probably seen a CUDA tutorial like this one — a classic “Hello World” blending CPU and GPU code in a single “heterogeneous” CUDA C++ source file, with the kernel launched using NVCC’s now-iconic triple-bracket <<<>>> syntax: 1 2 3 4 5 6 7 8 9 10 11 #include <cuda_runtime.h> #include <stdio.h> __global__ void kernel() { printf("Hello World from block %d, thread %d\n", blockIdx.x, threadIdx.x); } int main() { kernel<<<1, 1>>>(); // Returns `void`?! 🤬 return cudaDeviceSynchronize() == cudaSuccess ? 0 : -1; } I still see this exact pattern in production code — and I’ll admit, it shows up in some of my own toy projects too - one, two, and three. But relying on triple-bracket kernel launches in production isn’t ideal. They don’t return error codes, and they encourage a false sense of simplicity. So in the next ~25 KBytes of text, we’ll explore the less wrong ways to launch kernels. ...