MapReduce Matrix Multiplication in Java

DenSparSA: A Balanced Systolic Array Approach for Dense and Sparse Matrix Multiplication

Abstract: Numerous studies have proposed hardware architectures to accelerate sparse matrix multiplication, but these approaches often incur substantial area and power overhead, significantly ...

GitHub

Can Large Language Models Predict Parallel Code Performance?

We took this version of HeCBench and are modifying it to build the CUDA and OMP codes to gather their roofline performance data. So far we have a large portion of the CUDA and OMP codes building ...

GitHub

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.

IEEE

Multiplication of Sparse Matrices and their Transpose Using Compressed Sparse Diagonals

Abstract: Matrix-matrix multiplication is one of the most important kernel in linear algebra operations with a multitude of applications in scientific and engineering computing. Sparse matrix ...

ece.ucsb.edu

Bhattacharya – HW for Rapidly Solving High-order Optimization Problems

From the UCSB The Current article "Innovative Hardware for Rapidly Solving High-order Optimization Problems" The rise of AI, graphic processing, combinatorial optimization, and other data-intensive ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results