oneDPL 0.7

The oneAPI DPC++ Library (oneDPL) provides the functionality specified in the C++ standard, with extensions to support data parallelism and offloading to devices, and with extensions to simplify its usage for implementing data parallel algorithms.

The library is comprised of the following components:

Namespaces

oneDPL uses namespace std for the Supported C++ Standard Library APIs and Algorithms including Parallel STL algorithms and the subset of the standard C++ library for kernels, and uses namespace dpstd for its extended functionality.

Supported C++ Standard Library APIs and Algorithms

For all C++ algorithms accepting execution policies (as defined by C++17), oneDPL provides an implementation supporting dpstd::execution::device_policy and SYCL buffers (via dpstd::begin/end). (See Extensions to Parallel STL.)

Extensions to Parallel STL

oneDPL extends Parallel STL with the following APIs:

DPC++ Execution Policy

// Defined in <dpstd/execution>

namespace dpstd {
  namespace execution {

    template <typename BasePolicy, typename KernelName = /*unspecified*/>
    class device_policy;

    template <typename KernelName, typename Arg>
    device_policy<std::execution::parallel_unsequenced_policy, KernelName>
    make_device_policy( const Arg& );

    device_policy<parallel_unsequenced_policy, /*unspecified*/> default_policy;

  }
}

A DPC++ execution policy specifies where and how an algorithm runs. It inherits a standard C++ execution policy and allows specification of an optional kernel name as a template parameter.

An object of a device_policy type encapsulates a SYCL queue which runs algorithms on a DPC++ compliant device. You can create a policy object from a SYCL queue, device, or device selector, as well as from an existing policy object.

The make_device_policy function simplifies device_policy creation.

dpstd::execution::default_policy is a predefined DPC++ execution policy object that can run algorithms on a default SYCL device.

Examples:

using namespace dpstd::execution;

auto policy_a =
  device_policy<parallel_unsequenced_policy, class PolicyA>{cl::sycl::queue{}};
std::for_each(policy_a, …);

auto policy_b = make_device_policy<class PolicyB>(cl::sycl::queue{});
std::for_each(policy_b, …);

auto policy_c =
  make_device_policy<class PolicyC>(cl::sycl::device{cl::sycl::gpu_selector{}});
std::for_each(policy_c, …);

auto policy_d = make_device_policy<class PolicyD>(cl::sycl::default_selector{});
std::for_each(policy_d, …);

// use the predefined dpstd::execution::default_policy policy object
std::for_each(default_policy, …);

Wrappers for SYCL Buffers

// Defined in <dpstd/iterators.h>

namespace dpstd {

  template <cl::sycl::access::mode = cl::sycl::access::mode::read_write, ... >
  /*unspecified*/ begin(cl::sycl::buffer<...>);

  template <cl::sycl::access::mode = cl::sycl::access::mode::read_write, ... >
  /*unspecified*/ end(cl::sycl::buffer<...>);

}

dpstd::begin and dpstd::end are helper functions for passing SYCL buffers to oneDPL algorithms. These functions accept a SYCL buffer and return an object of an unspecified type that satisfies the following requirements:

  • Is CopyConstructible, CopyAssignable, and comparable with operators == and !=

  • The following expressions are valid: a + n, a - n, a - b, where a and b are objects of the type, and n is an integer value

  • Provides get_buffer() method that returns the SYCL buffer passed to dpstd::begin or dpstd::end function.

Example:

#include <CL/sycl.hpp>
#include <dpstd/execution>
#include <dpstd/algorithm>
#include <dpstd/iterators.h>

int main(){
    cl::sycl::queue q;
    cl::sycl::buffer<int> buf { 1000 };
    auto buf_begin = dpstd::begin(buf);
    auto buf_end   = dpstd::end(buf);
    auto policy = dpstd::execution::make_device_policy<class Fill>( q );
    std::fill(policy, buf_begin, buf_end, 42);
    return 0;
}

Specific API of oneDPL

namespace dpstd {

// Declared in <dpstd/iterators.h>

  template <typename Integral>
  class counting_iterator;

  template <typename... Iterators>
  class zip_iterator;
  template <typename... Iterators>
  zip_iterator<Iterators...> make_zip_iterator(Iterators...);

  template <typename UnaryFunc, typename Iterator>
  class transform_iterator;
  template <typename UnaryFunc, typename Iterator>
  transform_iterator<UnaryFunc, Iterator> make_transform_iterator(Iterator, UnaryFunc);


// Declared in <dpstd/functional>

  struct identity;

// Defined in <dpstd/algorithm>

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt>
  OutputValueIt
  exclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt,
           typename T>
  OutputValueIt
  exclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result,
                            T init);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt,
           typename T, typename BinaryPredicate>
  OutputValueIt
  exclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result,
                            T init, BinaryPredicate binary_pred);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt,
           typename T, typename BinaryPredicate, typename BinaryOp>
  OutputValueIt
  exclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result,
                            T init, BinaryPredicate binary_pred, BinaryOp binary_op);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt>
  OutputValueIt
  inclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt,
           typename BinaryPred>
  OutputValueIt
  inclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result,
                            BinaryPred binary_pred);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputValueIt,
           typename BinaryPred, typename BinaryOp>
  OutputValueIt
  inclusive_scan_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                            InputValueIt values_first, OutputValueIt values_result,
                            BinaryPred binary_pred, BinaryOp binary_op);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputKeyIt,
           typename OutputValueIt>
  std::pair<OutputKeyIt,OutputValueIt>
  reduce_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                    InputValueIt values_first, OutputKeyIt keys_result, OutputValueIt values_result);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputKeyIt,
           typename OutputValueIt, typename BinaryPred>
  std::pair<OutputKeyIt,OutputValueIt>
  reduce_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                    InputValueIt values_first, OutputKeyIt keys_result, OutputValueIt values_result,
                    BinaryPred binary_pred);

  template<typename Policy, typename InputKeyIt, typename InputValueIt, typename OutputKeyIt,
           typename OutputValueIt, typename BinaryPred, typename BinaryOp>
  std::pair<OutputKeyIt,OutputValueIt>
  reduce_by_segment(Policy&& policy, InputKeyIt keys_first, InputKeyIt keys_last,
                    InputValueIt values_first, OutputKeyIt keys_result, OutputValueIt values_result,
                    BinaryPred binary_pred, BinaryOp binary_op);

}