Abstract: Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power overhead, significantly ...
We took this version of HeCBench and are modifying it to build the CUDA and OMP codes to gather their roofline performance data. So far we have a large portion of the CUDA and OMP codes building ...
This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.
Abstract: Matrix-matrix multiplication is one of the most important kernel in linear algebra operations with a multitude of applications in scientific and engineering computing. Sparse matrix ...
From the UCSB The Current article "Innovative Hardware for Rapidly Solving High-order Optimization Problems" The rise of AI, graphic processing, combinatorial optimization, and other data-intensive ...