Technologies — HiCAIN

🟢

CUDA

NVIDIA's parallel programming model for GPUs — kernel launches, memory hierarchy (registers, shared, global, unified), streams, events, and the broader runtime API. HiCAIN covers CUDA from first principles up to performance-optimised multi-GPU code.

🔴

ROCm

AMD's open-source GPU compute stack — HIP, ROCBLAS, ROCFFT, and the platform tooling. Courses cover HIP-as-a-translation-layer (CUDA→HIP) and native ROCm development for Instinct accelerators.

🚀

InfiniBand & RoCE

The two RDMA transports that dominate AI fabrics. Subnet management, LID routing, QPs, CQs, memory regions, and the practical differences between native IB and RoCEv2-over-Ethernet — including DCB (PFC/ETS/ECN) configuration.

🛠️

OFED & Verbs

The OpenFabrics Enterprise Distribution and the libibverbs API. From ibv_open_device through to building production RDMA applications.

🔄

Open MPI

The MPI implementation used across most academic and HPC sites. Collective operations, point-to-point patterns, the BTL/MTL transport layers and tuning Open MPI on top of an RDMA fabric.

🧩

DOCA & NCCL

NVIDIA's DOCA SDK for DPU programming and the NCCL/RCCL collective libraries that power distributed training. How the building blocks combine for production AI workloads.