From GPU programming to high-speed fabrics, HiCAIN courses cover the technologies modern AI clusters depend on.
NVIDIA's parallel programming model for GPUs — kernel launches, memory hierarchy (registers, shared, global, unified), streams, events, and the broader runtime API. HiCAIN covers CUDA from first principles up to performance-optimised multi-GPU code.
AMD's open-source GPU compute stack — HIP, ROCBLAS, ROCFFT, and the platform tooling. Courses cover HIP-as-a-translation-layer (CUDA→HIP) and native ROCm development for Instinct accelerators.
The two RDMA transports that dominate AI fabrics. Subnet management, LID routing, QPs, CQs, memory regions, and the practical differences between native IB and RoCEv2-over-Ethernet — including DCB (PFC/ETS/ECN) configuration.
The OpenFabrics Enterprise Distribution and the libibverbs API.
From ibv_open_device through to building production RDMA applications.
The MPI implementation used across most academic and HPC sites. Collective operations, point-to-point patterns, the BTL/MTL transport layers and tuning Open MPI on top of an RDMA fabric.
NVIDIA's DOCA SDK for DPU programming and the NCCL/RCCL collective libraries that power distributed training. How the building blocks combine for production AI workloads.
Every HiCAIN lab runs on the PacketFive Virtual Datacenter (VDC) — a complete software-emulated AI/HPC cluster, so the same workstation runs RDMA pingpong, multi-GPU NCCL and Open MPI jobs without a single physical NIC or GPU.
Learn About VDC ↗