This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.
It also includes automatic tuning, caching, and a Pythonic interface for ease of use. Tilus is pronounced as tie-lus, /ˈtaɪləs/. Tilus supports Ampere architecture, and we are actively working on the ...