Project description
Working on GPU support for OpenAI/Triton — a language and compiler for writing highly efficient custom Deep-Learning primitives. Work with the open-source community to analyze, develop, test, and deploy performance improvements for neural networks implemented with Triton on GPUs with ROCm.
Responsibilities
New features development, support and optimization of OpenAI/Triton project for GPUs. Communication with other developers, customers and project managers. Test implementation, project documentation and verification of system with unit/component/functional tests.
Skills
Must have
* Strong C/C++ programming skills
* Experience with compiler internals (llvm, gcc or any other)
* Basic Python programming skills
* Experience in performance analysis
Nice to have skills
* Basic understanding of ML technologies
* Experience with GPGPU (General purpose GPU) computing (HIP, CUDA, OpenCL, etc.)
* Experience with PyTorch
* Experience with LLVM and MLIR compiler infrastructure, analysis or optimizations implementation
* Knowledge of ROCm infrastructure
* Experience in CMake, make/ninja build system
* GEMM performance fundamentals
* Experience with Docker
Languages:
English - C1 Fluent