The oneAPI Collective Communications Library (oneCCL) provides primitives for the communication patterns that occur in deep learning applications. oneCCL supports both scale-up for platforms with multiple oneAPI devices and scale-out for clusters with multiple compute nodes.

oneCCL supports the following communication patterns used in deep learning (DL) algorithms:

  • Allreduce

  • Allgatherv

  • Broadcast

  • Reduce

  • Alltoall

oneCCL exposes controls over additional optimizations and capabilities such as:

  • User-defined pre-/post-processing of incoming buffers and reduction operation

  • Prioritization for communication operations

  • Persistent communication operations (enables decoupling one-time initialization and repetitive execution)

  • Fusion of multiple communication operations into the single one

  • Unordered communication operations

  • Allreduce on sparse data

Intel has published an open source implementation with the Apache license. The open source implementation includes a comprehensive test suite. Consult the README for directions.