CLA is a simple toy library for basic vector/matrix operations in C. This project main goal is to learn the foundations of CUDA, and Python bindings, using ctypes as a wrapper, through simple Linear ...
CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te… ...