oneVPL 0.7

The oneAPI Video Processing Library is a programming interface for video processing and video analytics, focusing on building portable media pipeline on CPU, GPU, Deep Learning (DL) accelerators and FPGA. It aims for function and performance portability such that applications built with oneVPL can be efficiently executed on the current and future generation hardware without modification. oneVPL library provides the following features:

  1. Cross architecture building blocks for Decode, Video Processing, Encode and DL based video analytics.

  2. A VPL-Memory Library for image object passing among media, compute and 3D rendering phases of a video analytics pipeline, with zero-copying buffer sharing among CPU, GPU, DL accelerators and FPGA.

  3. Device discovery and device selection query interface for video and DL operations.

Intel’s implementation of VPL will be hosted at https://software.intel.com/en-us/oneapi/vpl# with each public release.

See VPL API Reference for the detailed API description.

Device Discovery

vpl::DeviceInfo defines a device and property query interface. At runtime, VPL discovers available devices declared in VplDeviceType, the number of devices of each type (vpl::DeviceInfo::GetDeviceCount()), and vpl::VplDeviceProperty of each device (vpl::DeviceInfo::GetDeviceProperty()).

This class also provides two important member functions:

  1. vpl::DeviceInfo::GetPreferredDevice(): get a preferred device instance vpl::DeviceInstance for a vpl::Workstream (workstream)

  2. vpl::DeviceInfo::GetPreset(): get the device preset configuration for a VplWorkstreamType and a device instance

We recommend the following code sequence to select a device and get the device’s preset configuration parameters before creating a Workstream on a device:

// Find all the devices in the system and select a preferred device
DeviceInfo *dinfo = new DeviceInfo();
DeviceInstance *dev = dinfo->GetPreferredDevice(VPL_WORKSTREAM_DECODE);

// Get decode preset
config::VPLParams *dconfig = dinfo->GetPreset(dev->m_dtype, VPL_WORKSTREAM_DECODE, dev->m_id);

Device Context

vpl::DeviceContext defines a device’s driver context and command queue for VPL operations. Depending on the underlying operating systems and workstream operations, the low level device contexts can be VAAPI, DX, OCL, GL for media, compute and rendering operations. This class hides the details of the low level contexts, while enabling context sharing among several workstreams.

This class provides the following functions:

  1. vpl::DeviceContext::DeviceContext(): create a new device context on a device

  2. vpl::DeviceContext::AddWorkstream(): add a workstream sharing the device context

  3. vpl::DeviceContext::RemoveWorkstream(): remove a workstream for the device context

The device context is usually created implicitly inside the vpl::Workstream constructor. Application programmers can also create it explicitly to share the context with multiple workstreams.

Workstream Settings

vpl::VplParams defines the parameter settings for decode, video processing, encode and DL inference. It addresses two aspects of video processing:

  • Easy to use device independent parameter settings for VPL application developers

  • Easy to translate into device dependent settings for VPL hardware plugin writers

This class provides common settings for video processing in vpl::VplParams, such as input/output formats, resolutions, frame rates, crop factors, aspect ratios, and operation specific settings in vpl::VplParamsDecode and vpl::VplParamsEncode, such as CODEC type, input/output buffer sizes, encoding bit rate control, and GOP structure parameters.

In the near future, oneVPL will support target devices running autonomously on different nodes, where parameter settings need to be passed across host boundaries similar to gRPC. We will extend vpl::VplParams to include serialization and deserialization of the parameters as messages. We are also evaluating general message packages such as protobuf as we finalize the API.

Note that vendor specific extensions can be supported by subclassing vpl::VplParams to expose them to oneVPL application developers.

Workstream

The vpl::Workstream class is the core of the VPL interface. It represents a media building block running decode, frame processing, DL inference and encode operations on a single device context. Workstreams can be offloaded to different devices. Complex pipelines can be formed from multiple workstreams.

There are four subclasses of vpl::Workstream that perform the following basic operations

  • vpl::Decode: decode a bitstream to raw frames

  • vpl::VideoProcess: implement a video processing filter operation with raw frame input and output

  • vpl::Infer: invoke a Deep Learning model to infer on a raw frame. Details of this subclass will be provided in a future release.

  • vpl::Encode: encode raw frames to a bitstream

In the vpl::VideoProcess subclass, a sequence of filters with single input and single output running on the same device context can be fused into a single workstream. Operations executing on different device contexts are separated to different workstreams such that each workstream can always be dispatched to a single device.

Workstream Internals

Each vpl::Workstream contains the following members:

  • vpl::VplParams: the parameter settings of the workstream, i.e., resolution of the output per session or per frame.

  • vpl::DeviceContext: the execution device context of the workstream, i.e., VAAPI, DX, OCL, GL contexts for media, compute and rendering operations

Workstream per session configurations are set up during workstream initialization using vpl::VplParams. User program can also dynamically modify a workstream’s configuration frame level control.

Initialization Sequence

The standard sequence to create a workstream consists of the following steps:

  1. Select a device to execute the workstream

    1. Create a device context or use an existing device context to execute the workstream

  2. Get the configuration presets for the selected device

  3. Create the workstream with the configuration setting vpl::config::VPLParams and vpl::DeviceContext.

The following transcoding example uses three workstreams: vpl::Decode, vpl::VideoProcess and vpl::Encode, and shares a single context to execute them together:

#include "vpl/vpl.hpp"
#define BUFFER_SIZE 1024 * 1024 * 80

using namespace vpl;

int main(int argc, char* argv[]) {

// Find all the devices in the system and select a preferred device
DeviceInfo *dinfo = new DeviceInfo();
DeviceInstance *dev = dinfo->GetPreferredDevice(VPL_WORKSTREAM_DECODE | VPL_WORKSTREAM_VIDEOPROC, VPL_WORKSTREAM_ENCODE);

// Get decode preset
config::VPLParams *dconfig = dinfo->GetPreset(dev->m_dtype, VPL_WORKSTREAM_DECODE, dev->m_id);
// Create a Decode workstream and a device context on the device
Decode *decode = new Decode(dconfig, *dev);

// Create a VPP workstream, use the same device context as Decode
config::VPLParams *pconfig = dinfo->GetVideoProcessPreset(dev->m_dtype, VPL_WORKSTREAM_VIDEOPROC, dev->m_id);
VideoProcess *proc = new Process(pconfig, decode->GetContext());

// Create an Encode workstream, use the same device context as Decode
config::VPLParams *econfig = dinfo->GetEncodePreset(dev->m_dtype, VPL_WORKSTEAM_ENCODE, dev->m_id);
Encode *encode = new Encode(econfig, decode->GetContext());

uint8_t* pbs = new uint8_t[BUFFER_SIZE];
uint8_t* pbsout = new uint8_t[BUFFER_SIZE];

FILE* fInput = fopen(argv[1], "rb");
FILE* fOutput = fopen("out.h264", "wb");

int frameCount = 0;
// Run the pipeline explicitly
for (vplWorkstreamState decode_state = VPL_STATE_READ_INPUT;
      decode_state != VPL_STATE_END_OF_OPERATION && decode_state != VPL_STATE_ERROR;
      decode_state = decode->GetState()) {
   vplm_mem* dec_image =
      decode->DecodeFrame(pbs, fread(pbs, 1, BUFFER_SIZE, fInput));
   if (!dec_image) continue;
   frameCount++;

   vplm_mem* vpp_image = proc->ProcessFrame(dec_image);
   if (vpp_image) {
      size_t nbytesout = encode->EncodeFrame(vpp_image, pbsout);
      fwrite(pbsout, 1, nbytesout, fOutput);
      printf("%d\r", frameCount);
      fflush(stdout);
   }
}

printf("\ndone !\n");

fclose(fInput);
fclose(fOutput);

delete[] pbs;
delete[] pbsout;
return 0;
}

Dynamic Setting Control

The vpl::VplParams defines the workstream settings for the device. User program can use its access functions to read and set the parameters for the workstream. After modifying configuration settings, a user program needs to call vpl::Workstream::UpdateDeviceConfig() function to propagate the setting to the device context. Configuration setting change takes effect in the next processing operation. The following example changes the output resolution in the middle of a decoding sequence.

// Decoding loop
while (decode->GetState() != VPL_STATE_END_OF_OPERATION) {
   vplm_mem *image = decode->DecodeFrame(*bs_ptr, bs_size);

   // Change output resolution
   if (need_resize) {
      VplVideoSurfaceResolution resol = {480, 780};
      dconfig->SetOutputResolution(resol);

      // propagate the new settings to the driver
      decode->UpdateDeviceConfig();
   }
}

Decode

The vpl::Decode implements elementary stream decode from encoded bitstreams and outputs raw frames.

digraph {
  rankdir=LR;
  Bitstream [shape=record label="Bitstream" ];
  Decode [shape=record  label="Decode"];
  Raw [shape=rect];
  Bitstream->Decode->Raw;
}

Supported bitstream formats

  • h264

  • h265

DecodeFrame C++ interface:

The decode function vpl::Decode::DecodeFrame() interface is

vplm_mem* Decode::DecodeFrame(const void* pbs, size_t size);

States:

During execution, state communicates what the application should do next.

digraph {
  rankdir=LR

  start
  READ_INPUT
  ERROR
  INPUT_BUFFER_FULL
  INPUT_EXCEEDS_BUFFER_SIZE
  END_OF_OPERATION
  end

  start->READ_INPUT

  READ_INPUT->READ_INPUT[label="almost everything happens here"]
  READ_INPUT->ERROR[label="internal error"]
  ERROR->READ_INPUT

  READ_INPUT->INPUT_BUFFER_FULL[label="input size>remaining buffer size"]
  INPUT_BUFFER_FULL->READ_INPUT

  READ_INPUT->INPUT_EXCEEDS_BUFFER_SIZE[label="input size>total buffer size"]
  INPUT_EXCEEDS_BUFFER_SIZE->READ_INPUT

  READ_INPUT->END_OF_OPERATION[label="no more to process"]
  END_OF_OPERATION->end
}

State

Meaning

READ_INPUT

workstream can read bitstream input

ERROR

an error occurred during processing, try again

INPUT_BUFFER_FULL

input size > remaining space in the input bitstream buffer, input ignored and decode operation should be allowed to drain before attempting to add more bitstream again

INPUT_EXCEEDS_BUFFER_SIZE

input size> total buffer size

Settings:

Encode configuration parameters are specified in vpl::VplParamsDecode, which controls per session and per frame settings of the decode operations.

Data model:

VPL allocates pools of VPL Memory used by decode and consumers of decoded frames. The application is expected to manage reference counts. Only surfaces with no external references can be used to decode new frames.

digraph  {
 rankdir=LR
 node [shape=record];
 struct1 [label="<f0> surface pool|<f1> DecodeFrame"];
 struct3 [label="<f1> rest of pipeline"];

 struct1:f1 -> struct3:f1 [label="VPL Memory"];
}

Sample code:

#include "vpl/vpl.hpp"
#define BUFFER_SIZE 1024 * 1024

int main(int argc, char *argv[]) {

  // Find all the devices in the system and select a preferred device
  DeviceInfo *dinfo = new DeviceInfo();
  DeviceInstance *dev = dinfo->GetPreferredDevice(VPL_WORKSTREAM_DECODE);

  // Get decode preset
  VplParamsDecode *dconfig = dinfo->GetPreset(dev->m_dtype, VPL_WORKSTREAM_DECODE, dev->m_id);

  // Create a Decode workstream and a device context on the device
  Decode *ws = new Decode(dconfig, *dev);

  uint8_t *pbs = new uint8_t[BUFFER_SIZE];

  FILE *fInput = fopen(argv[1], "rb");
  VplFile *fOutput = vplOpenFile("out.nv12", "wb");

  vplm_mem *dec_image = nullptr;
  bool bdrain_mode = false;
  int frameCount = 0;
  vplWorkstreamState decode_state = VPL_STATE_READ_INPUT;

  while (1) {
    decode_state = ws->GetState();
    if (decode_state == VPL_STATE_END_OF_OPERATION ||
        decode_state == VPL_STATE_ERROR) {
      break;
    }

    // read more input if state indicates buffer space
    // is available
    uint32_t bs_size = 0;
    if ((decode_state == VPL_STATE_READ_INPUT) && (!bdrain_mode)) {
      bs_size = (uint32_t)fread(pbs, 1, BUFFER_SIZE, fInput);
    }

    if (bs_size == 0 || decode_state == VPL_STATE_INPUT_BUFFER_FULL) {
      bdrain_mode = true;
    }

    if (bdrain_mode)
      dec_image = ws->DecodeFrame(nullptr, 0);
    else
      dec_image = ws->DecodeFrame(pbs, bs_size);

    if (!dec_image)
      continue;

    frameCount++;

    vplWriteData(fOutput, dec_image);
    printf("%d\r", frameCount);
    fflush(stdout);

    vplm_unref(dec_image);
  }

  printf("\ndone !\n");
  fclose(fInput);
  vplCloseFile(fOutput);

  delete[] pbs;
  return 0;
}

Encode

The vpl::Encode class implements elementary stream encode from raw frame to encoded bitstream.

digraph {
  rankdir=LR;
  Bitstream [shape=record label="Bitstream" ];
  Encode [shape=record  label="Encode"];
  Raw [shape=rect];
  Raw->Encode->Bitstream;
}

Supported bitstream formats

  • h264

  • h265

EncodeFrame C++ interface:

The encode function vpl::Encode::EncodeFrame() interface is

size_t Encode::EncodeFrame(vplm_mem* image, void *pbs_out);

States:

During execution, state communicates what the application should do next.

digraph {
  rankdir=LR

  start
  READ_INPUT
  ERROR
  OUTPUT_BUFFER_FULL
  OUTPUT_EXCEEDS_BUFFER_SIZE
  END_OF_OPERATION
  end

  start->READ_INPUT

  READ_INPUT->READ_INPUT[label="almost everything happens here"]
  READ_INPUT->ERROR[label="internal error"]
  ERROR->READ_INPUT

  READ_INPUT->OUTPUT_BUFFER_FULL[label="output size>remaining buffer size"]
  OUTPUT_BUFFER_FULL->READ_INPUT

  READ_INPUT->OUTPUT_EXCEEDS_BUFFER_SIZE[label="output size>total buffer size"]
  OUTPUT_EXCEEDS_BUFFER_SIZE->READ_INPUT

  READ_INPUT->END_OF_OPERATION[label="no more to process"]
  END_OF_OPERATION->end
}

State

Meaning

READ_INPUT

workstream can read raw frame input

ERROR

an error occurred during processing, try again

OUTPUT_BUFFER_FULL

output size > remaining space in the output bitstream buffer

OUTPUT_EXCEEDS_BUFFER_SIZE

output size> total buffer size

Settings:

Encode configuration parameters are specified in vpl::VplParamsEncode, which controls per session and per frame settings of the encode operations.

Data model:

VPL allocates pools of VPL Memory used by decode and consumers of decoded frames. The application is expected to manage reference counts. Only surfaces with no external references can be used to decode new frames.

digraph  {
 rankdir=LR
 node [shape=record];
 struct1 [label="<f0> surface pool|<f1> DecodeFrame"];
 struct3 [label="<f1> rest of pipeline"];

 struct1:f1 -> struct3:f1 [label="VPL Memory"];
}

Sample code:

#include "vpl/vpl.hpp"
#define BUFFER_SIZE 1024 * 1024

int main(int argc, char* argv[]) {

  // Find all the devices in the system and select a preferred device
  DeviceInfo *dinfo = new DeviceInfo();
  DeviceInstance *dev = dinfo->GetPreferredDevice(VPL_WORKSTREAM_ENCODE);

  // Get encode preset
  VplParamsEncode *dconfig = dinfo->GetPreset(dev->m_dtype, VPL_WORKSTREAM_ENCODE, dev->m_id);

  // Create an Encode workstream and a device context on the device
  Encode *ws = new Encode(dconfig, *dev);

  uint8_t *pbsout=new uint8_t[BUFFER_SIZE];
  VplFile* fInput = vplOpenFile(argv[1],"rb");
  FILE* fOutput = fopen("out.h264","wb");

  vplm_mem* decimage;
  vplm_image_info info={};
  info.width=300;
  info.height=300;
  info.format=VPLM_PIXEL_FORMAT_NV12;
  vplm_create_cpu_image(&info, &decimage);

  for (;;) {
    vplStatus sts=vplReadData(fInput,decimage);
    if (sts<0) break;
    vplm_ref(decimage);

    size_t nbytesout=ws->EncodeFrame(decimage,pbsout);
    fwrite(pbsout, 1, nbytesout, fOutput);

    printf(".");
    fflush(stdout);

  }
  puts("");

  vplCloseFile(fInput);
  fclose(fOutput);

  delete[] pbsout;

  return 0;
}

Video Processing

The vpl::VideoProcess class implements a variety of filter operations. Current API is limited to single input and single output.

digraph {
  rankdir=LR;
  Raw0 [shape=rect label="raw" ];
  Decode [shape=record  label="Decode"];
  Raw [shape=rect label="raw"];
  Raw0->Decode->Raw;
}

Supported Filter Operations The list of filters will expand based on implementation schedule. Current release supports the following filters:

  • Resize

  • Colorspace conversion (in=i420,nv12,bgra out=i420,nv12,bgra)

C++ interface:

The video process function vpl::VideoProcess::ProcessFrame() interface is

vplm_mem* VideoProcess::ProcessFrame(vplm_mem* image);

States:

During execution, state communicates what the application should do next.

State

Meaning

READ_INPUT

workstream can read raw input

ERROR

an error occurred during processing, try again

Settings:

Video processing configuration parameters are specified in vpl::VplParams, which controls per session and per frame settings of the encode operations. vpl::VideoProcess::ProcessFrame() converts the input image to the output image according to the output image specification in the settings.

Data model:

VPL allocates pools of VPL Memory used by frame process and consumers of processed frames. The application is expected to manage reference counts. Only surfaces with no external references can be used to for new frame output.

digraph  {
 rankdir=LR
 node [shape=record];
 struct1 [label="<f0> surface pool|<f1> ProcessFrame"];
 struct3 [label="<f1> rest of pipeline"];

 struct1:f1 -> struct3:f1 [label="VPL Memory"];
}

Sample code:

#include "vpl/vpl.hpp"

int main(int argc, char* argv[]) {

  // Find all the devices in the system and select a preferred device
  DeviceInfo *dinfo = new DeviceInfo();
  DeviceInstance *dev = dinfo->GetPreferredDevice(VPL_WORKSTREAM_VIDEOPROC);

  // Get vidoe process preset
  VplParams *dconfig = dinfo->GetPreset(dev->m_dtype, VPL_WORKSTREAM_VIDEOPROC, dev->m_id);

  // Create an Encode workstream and a device context on the device
  VideoProcess *ws = new VideoProcess(dconfig, *dev);

  // Set output image formats
  VplVideoSurfaceResolution output_size = {300,300};
  dconfig->SetOutputResolution(output_size);
  dconfig->SetOutputFromat(VPL_FOURCC_I420);

  // Propagate the settings to the device driver
  ws->UpdateDeviceConfig();

  VplFile* fInput = vplOpenFile(argv[1],"rb");
  VplFile* fOutput = vplOpenFile("out.i420", "wb");

  vplm_mem* dec_image;
  vplm_image_info info={};
  info.width=1280;
  info.height=720;
  info.format=VPLM_PIXEL_FORMAT_RGBA;
  vplm_create_cpu_image(&info, &dec_image);

  int frameCount = 0;
  for (;;) {
    vplStatus sts=vplReadData(fInput,dec_image);
    if (sts<0) break;
    vplm_ref(dec_image);
    frameCount++;

    vplm_mem* vpp_image = ws->ProcessFrame(dec_image);

    vplWriteData(fOutput,vpp_image);
    printf("%d\r", frameCount);
    fflush(stdout);
    vplm_unref(vpp_image);
  }
  printf("\ndone !\n");

  vplCloseFile(fInput);
  vplCloseFile(fOutput);


  return 0;
}

VPL Memory

Memory representation and memory allocation are important parts of VPL API. By default:

  • Each workstream is responsible to allocate its output memory objects and passes the memory objects to its consumer workstreams.

  • For performance, memory objects are passed asynchronously to consumers before its producer finishes writing to the objects. Hence, a consumer workstream must first acquire a read access right before reading the data from the memory object.

  • Each memory objects may have multiple consumer workstreams. Hence a consumer workstream should avoid modifying its input memory objects. If a consumer workstream needs to modify an input memory object, it must first acquire a write access right to avoid corrupting the memory object.

VPL Memory API provides sharing of 1D buffers and 2D images across different frameworks (e.g. SYCL, OpenCL, VAAPI, DXVA2) and different devices (CPU, GPU). Buffer sharing across the decode, compute and encode pipeline is important for both performance and portability.

The buffer sharing mechanisms can be classified into 3 types:

  1. Direct sharing when access is granted to the primary object’s representation in physical memory, but this happens via different framework specific logical memory objects (like VAAPI surface or OpenCL memory). That’s the case when handle from one framework can be converted to the handle of another framework. For example, via OpenCL VAAPI Sharing Extension.

  2. Mapping when object is being mapped to the device memory and framework handle is generated. That’s a typical case for CPU (HOST) access to the video memory. Underneath implementation might significantly vary and result in a kind of direct access or accessing a copy of the memory object with the set of associated copy and on-the-fly conversion operations. For example, OpenCL provides two sets of functions: clEnqueueReadBuffer, clEnqueueWriteBuffer for copying, and clEnqueueMapBuffer for direct mapping between CPU and OpenCL device.

  3. Coherent sharing when memory object has unified addressing the physical memory and the underlying hardware and software system layers assures coherency between these representations as in a unified shared memory mode.

The Memory API aims to provide a sharing mechanism with the highest performance. From this perspective, the library uses “direct” sharing whenever possible. However, currently, there are a lot of various restrictions coming from all over the software stack which makes “direct” sharing unavailable:

  1. Framework restrictions where some color formats are not supported, or lack of capability to import certain memory handle

  2. Underlying driver implementation (or even HW) restrictions

As oneAPI software stack evolves, we intend to eliminate the direct sharing restrictions in the underlying frameworks and drivers through API extension and implementation enhancement.

The current VPL Memory Library provides the following direct sharing capabilities:

  • Sharing of CPU (HOST) allocated memory on Linux (via userptr):

    • With VAAPI driver

    • With OpenCL driver and SYCL

  • Sharing of VAAPI allocated memory:

    • With OpenCL, SYCL

  • Exporting dmabuf handle:

    • From VAAPI memory object

Recall that oneAPI platform is a host and a collection of devices; and each device has an associated command queue. Operations on the devices are executed through submitting tasks to devices’ command queues. In the application domain of video processing pipeline, each device may have multiple command queues corresponding to the media driver, OpenCL and SYCL compute drivers and 3D graphics drivers. For Intel’s GPU on Linux, a VAAPI driver executes tasks for video decoding, post processing and encoding, an OpenCL driver executes tasks for compute such as DL inference, and an OpenGL driver executes tasks for 3D rendering. Buffers and images allocated by these drivers are initially only accessible in the context of their corresponding command queues. To share a buffer between a source and a sink command queues, VPL Memory library must extract the buffer address from the source command queue context, map the address to the sink command queue context, such that, tasks in the sink command queue can now read or write to the buffer. We call this <command_queue,buffer> pair a memory handler. Memory handlers are encapsulated in the vplm::memory class hierarchy.

To share a buffer to a device driver command queue context, just simply constructs a new memory handler of the corresponding derived class from the base vplm::memory object. For instance, the code above converts a CPU allocated memory to a GPU VAAPI surface for media processing.

In addition to the buffer sharing API in vplm::*:memory subclass constructors, VPL Memory API also provides:

  • API to import memory already allocated by the application external to VPL Memory library

  • API to allocate memory

The memory import and allocation functions are defined in vplm::cpu::make_memory(), vplm::opencl::make_memory(), vplm::sycl::make_memory(), and vplm::vaapi::make_memory().

Creating memory objects

An application can use the VPL Memory Library to create a memory object for one of the supported frameworks. For example, the following code allocates memory in system memory (we count CPU (HOST) as one of the frameworks):

#include <om++.h>

vplm::cpu::memory yuv_image = vplm::cpu::make_memory(1920, 1080, VPLM_PIXEL_FORMAT_NV12);

or it can allocate memory externally and request the VPL Memory Library to manage it as in the following example for VAAPI:

#include <om++.h>
#include <om_vaapi++.h>

VADisplay dpy;
VASurfaceID id;
vaCreateSurfaces(dpy, VA_RT_FORMAT_RGB32, 1920, 1080, &id, 1, attribs, num_attribs);

vplm::vaapi::memory rgb_image = vplm::vaapi::make_surface(dpy, id);

In either case, it ends up with the framework specific C++ object (in our examples vplm::cpu::memory or vplm::vaapi::memory), and hence, have access to the framework specific API defined by VPL Memory for this object. For example, the following code will access CPU allocated image (via vplm::cpu_image representation):

vplm::cpu_image image;
yuv_image.map(VPLM_ACCESS_MODE_READ, image);

// do something with the image since you have access to data pointers:
printf("Y data pointer: %p\n", image.data(0));

yuv_image.unmap(image);

Helper Class for Simplifying Image Data Access

In the examples above we directly used a memory object to access its data. While this is possible, there is a simpler way. Most frameworks require a program to acquire and release access to data. For the CPU access example, we saw calls to map and unmap to acquire/release access to the data. Using the vplm::cpu_image helper class eliminates the need to map and unmap:

{
  vplm::cpu::image cpu_image(yuv_image, VPLM_ACCESS_MODE_READ);

  // do something with the image since you have access to data pointers:
  printf("Y data pointer: %p\n", image.data(0));
}

This helper class issues acquire and release operations to mark start/stop data access in constructor and destructor. Another “feature” of these helper classes is that they accept base memory object (vplm::memory) in constructors. This means that we can use helper classes to make implicit conversion between different framework objects. For example, with the following we will map our VAAPI image on to CPU:

{
  vplm::cpu::image cpu_image(rgb_image, VPLM_ACCESS_MODE_WRITE);

  // do something with the image since you have access to data pointers:
  printf("R data pointer: %p\n", image.data(0));
}

Usage example The following example summarize the key usage scenario:

#include <om++.h>
#include <om_vaapi++.h>

VADisplay dpy = init_vaapi();
VASurfaceID id = create_per_my_needs(); // calls vaCreateSurfaces inside

vplm::cpu::memory yuv_image = vplm::cpu::make_memory(1920, 1080, VPLM_PIXEL_FORMAT_NV12);
vplm::vaapi::memory rgb_image = vplm::vaapi::make_surface(dpy, id);

{
  vplm::cpu::image cpu_image(yuv_image, VPLM_ACCESS_MODE_WRITE);

  // do something with the image since you have access to data pointers:
  printf("Y data pointer: %p\n", image.data(0));
  // for example, write data into the surface
}

{
  vplm::vaapi::image vaapi_cpu_image(dpy, yuv_image);
  vplm::vaapi::image vaapi_rgb_image(dpy, rgb_image); // just for consistency

  // do something with surfaces via VAAPI since we have access to them
  // for example, convert yuv which we just wrote on CPU to rgb format
  convert_yuv_to_rgb(vaapi_cpu_image.id(), vaapi_rgb_image.id())
}

{
  vplm::cpu::image cpu_image(rgb_image, VPLM_ACCESS_MODE_READ);

  // now we can read from the CPU data which we got in rgb image after VAAPI conversion:
  printf("R data pointer: %p\n", image.data(0));
}

The VPL API defines 4 different helper classes, one for each supported device context: vplm::cpu::image vplm::opencl::image, vplm::sycl::memory and vplm::vaapi::image.

VPL API Reference

Device

vpl::DeviceInfo

enum VplDeviceType

Specifies which device extension is available and will be loaded.

Values:

VPL_DEVICE_CPU = 0
VPL_DEVICE_GEN
VPL_DEVICE_HDDL
VPL_DEVICE_FPGA
VPL_DEVICE_COUNT
struct VplDeviceProperty

VPL device property structure.

Public Members

VplDeviceType type
uint32_t vendorId
uint32_t deviceId
uint32_t subdeviceId
uint32_t coreClockRate
uint32_t maxCmdEngines
bool unifiedMemorySupported
bool eccMemorySupported
uint32_t numUnits
char name[VPL_MAX_DEVICE_NAME]
class DeviceInstance

VPL device instance.

Public Functions

DeviceInstance(VplDeviceType dtype, uint32_t id)

Public Members

VplDeviceType m_dtype
uint32_t m_id
class DeviceInfo

Public Functions

int GetDeviceCount(VplDeviceType dtype)

Get device count.

Return

number of devices found for the device type

Parameters
  • [in] dtype: device type

VplDeviceProperty *GetDeviceProperty(VplDeviceType dtype, uint32_t id)

Get device property.

Return

device property

Parameters
  • [in] dtype: device type

  • [in] id: device ID

DeviceInstance *GetPreferredDevice(uint32_t wstypes)

Get preferred device.

Return

preferred device instance for the workstream type

Parameters
  • [in] wstypes: a bit mask of workstream types

VplParams *GetPreset(VplDeviceType dtype, VplWorkstreamType wstype, uint32_t id = 0)

Get device preset to setup a workstream.

Return

config preset for the device and wstype

Parameters
  • [in] dtype: device type

  • [in] wstype: workstream type

  • [in] id: device id

DeviceInfo()
~DeviceInfo()

Protected Functions

vplDevice GetDeviceHandle(VplDeviceType dtype)

Protected Attributes

vplDevice m_device[VPL_DEVICE_COUNT]

Dispatch handle

Device Context

vpl::DeviceContext

class DeviceContext

Public Functions

DeviceContext(const DeviceInstance &dev)

DeviceContext constructor.

Parameters
  • [in] dev: device instance to run the workstream operations

DeviceInstance *GetDevice()

Get Device Instance.

Return

Device instance

void AddWorkstream(Workstream *ws)

Add workstream to context users.

Parameters
  • [in] ws: workstream

void RemoveWorkstream(Workstream *ws)

Remove workstream from context users.

Parameters
  • [in] ws: workstream

~DeviceContext()

DeviceContext destructor.

Protected Attributes

DeviceInstance m_device

running device

std::vector<Workstream *> m_workstreams

workstreams sharing the context

Workstream Settings

vpl::VplParams

class VplParams

VPL config API.

Workstream configuration interface: per session and per frame settings for Decode, VideoProcess, Encode and Infer workstreams It is usually initialized by DeviceInfo::GetPreset() and can be modified by application

Subclassed by vpl::VplParamsDecode, vpl::VplParamsEncode

Public Functions

VplFourCC GetInputFormat()

Get Input Format.

Return

Input image raw format

void SetInputFormat(VplFourCC format)
VplFourCC GetOutputFormat()
void SetOutputFormat(VplFourCC format)
VplFrameRate &GetInputFrameRate()
void SetInputFrameRate(VplFrameRate &fr)
VplFrameRate &GetOutputFrameRate()
void SetOutputFrameRate(VplFrameRate &fr)
VplVideoSurfaceResolution &GetInputResolution()
void SetInputResolution(VplVideoSurfaceResolution &res)
VplVideoSurfaceResolution &GetOutputResolution()
void SetOutputResolution(VplVideoSurfaceResolution &res)
VplCrop &GetInputCrop()
void SetInputCrop(VplCrop &crop)
VplCrop &GetOutputCrop()
void SetOutputCrop(VplCrop &crop)
VplAspectRatio &GetInputAspectRatio()
void SetInputAspectRatio(VplAspectRatio &ratio)
VplAspectRatio &GetOutputAspectRatio()
void SetOutputAspectRatio(VplAspectRatio &ratio)
uint16_t GetPictStruct()
void SetPictStruct(uint16_t ps)
VplParams()
~VplParams()

Protected Attributes

VplFourCC m_iformat
VplFourCC m_oformat
VplFrameRate m_iframerate
VplFrameRate m_oframerate
VplVideoSurfaceResolution m_iresolution
VplVideoSurfaceResolution m_oresolution
VplCrop m_icrop
VplCrop m_ocrop
VplAspectRatio m_iaspect
VplAspectRatio m_oaspect
uint16_t m_picstruct

Decode Settings

vpl::VplParamsDecode

class VplParamsDecode : public vpl::VplParams

Public Functions

VplCodecType GetCodecType()
void SetCodecType(VplCodecType codec)
uint32_t GetDecodeBufferSize()
void SetDecodeBufferSize(uint32_t s)
uint32_t GetMaxNumBuffers()
void SetMaxNumBuffers(uint32_t m)
VplParamsDecode()
~VplParamsDecode()

Protected Attributes

VplCodecType m_codec
uint32_t m_buffersz
uint32_t m_maxnumbuffers

Encode Settings

vpl::VplParamsEncode

class VplParamsEncode : public vpl::VplParams

Public Functions

VplCodecType GetCodecType()
void SetCodecType(VplCodecType codec)
uint32_t GetBitRateKps()
void SetBitRateKps(uint32_t r)
uint32_t GetIFrameInterval()
void SetIFrameInterval(uint32_t il)
uint32_t GetBFrameInterval()
void SetBFrameInterval(uint32_t il)
uint32_t GetFrameRateNumerator()
void SetFrameRateNumerator(uint32_t n)
uint32_t GetFrameRateDenominator()
void SetFrameRateDenominator(uint32_t d)
VplEncodePreset GetEncodePreset()
void SetEncodePreset(VplEncodePreset ps)
VplRateControl GetRateControl()
void SetRateControl(VplRateControl rc)
VplBRC GetBRC()
void SetBRC(VplBRC brc)
VplParamsEncode()
~VplParamsEncode()

Protected Attributes

VplCodecType m_codec
uint32_t m_bitratekbps
uint32_t m_iframeinterval
uint32_t m_bframeinterval
uint32_t m_fr_num
uint32_t m_fr_denom
VplEncodePreset m_encodepreset
VplRateControl m_ratecontrol
VplEncodeScenario m_scenario
VplBRC m_brc

Workstreams

VplWorkstreamType

enum VplWorkstreamType

workstream type

Values:

VPL_WORKSTREAM_DECODE = 0x01
VPL_WORKSTREAM_VIDEOPROC = 0x02
VPL_WORKSTREAM_ENCODE = 0x04
VPL_WORKSTREAM_INFER = 0x08

Workstream

vpl::Workstream

class Workstream

Video processing workstream.

Workstreams are the core of VPL. A workstream implements a consistent interface for decode, frame processing, and encode across multiple accelerator types. It also provides consistent parameter handling and logging.

Subclassed by vpl::Decode, vpl::Encode, vpl::VideoProcess

Public Functions

Workstream(VplParams *config, DeviceInstance &dev, VplWorkstreamType wstype)

Workstream constructor.

Parameters
  • [in] config: workstream settings

  • [in] dev: device instance to create a context

  • [in] wstype: workstream type

Workstream(VplParams *config, DeviceContext *dcontext, VplWorkstreamType wstype)

Workstream constructor.

Parameters
  • [in] config: workstream settings

  • [in] dcontext: workstream device context

  • [in] wstype: workstream type

DeviceContext *GetContext()

Get workstream device context.

Return

workstream device context

vplWorkstreamState GetState()

Returns workstream state.

GetState returns workstream state, which can be

  • VPL_STATE_READ_INPUT Can read more input

  • VPL_STATE_ERROR Error during operation

  • VPL_STATE_INPUT_BUFFER_FULL Input buffer is at capacity, drain to make space

  • VPL_STATE_INPUT_EXCEEDS_BUFFER_SIZE Input size>buffer size

  • VPL_STATE_END_OF_OPERATION No more operations to do

Return

workstream state

vpl::VplParams *GetConfig()

Get workstream configuration setting.

Return

workstream configuration setting message

void SetConfig(vpl::VplParams *config)

Set workstream configuration setting.

Return

configuration setting message

Parameters
  • [in] config: workstream setting

vplStatus UpdateDeviceConfig()

Propagate the workstream settings to the device.

Return

status

operator vplWorkstream()
~Workstream()

Workstream destructor.

Protected Attributes

vpl::DeviceContext *m_context

Running device context

VplParams *m_config

Workstream setting

VplWorkstreamType m_wstype

Workstream type

vplWorkstream m_workstream

Dispatch handle

Workstream::Decode

enum vplWorkstreamState

workstream state communicates next steps to the application

Values:

VPL_STATE_READ_INPUT = 1000
VPL_STATE_CLOSED_TO_INPUT = 1001
VPL_STATE_ERROR = 1002
VPL_STATE_INPUT_BUFFER_FULL = 1003
VPL_STATE_INPUT_EXCEEDS_BUFFER_SIZE = 1004
VPL_STATE_END_OF_OPERATION = 1005
VPL_STATE_OK = 1006
class Decode : public vpl::Workstream

Public Functions

Decode(VplParams *config, DeviceContext *dcontext)

Decode constructor.

Parameters
  • [in] config: decode settings

  • [in] dcontext: decode context

Decode(VplParams *config, DeviceInstance &dev)

Workstream constructor.

Parameters
  • [in] config: workstream settings

  • [in] dev: device instance to create a context

vpl::VplParams *GetConfig()

Get Decode workstream parameter protobuf message.

Retrieve the Decode workstream setting protobuf message

Return

configuration setting protobuf message

vplStatus UpdateDeviceConfig()

Update Decode workstream parameter to device.

Propagate the Decode workstream settings to the device

Return

status

vplm_mem *DecodeFrame(const void *pbs, size_t size)

Decode a video frame.

The DecodeFrame operation decodes a single frame of video when sufficient bitstream data is provided. Bitstream input does not need to be provided at frame boundaries. If a frame cannot be parsed a null frame is returned.

Return

decoded frame (can be a null pointer)

Parameters
  • [in] *pbs: bitstream input data

  • [in] size: size of input data in bytes

~Decode()

Workstream::Encode

class Encode : public vpl::Workstream

Public Functions

Encode(VplParams *config, DeviceContext *dcontext)

Encode constructor.

Parameters
  • [in] config: encode settings

  • [in] dcontext: encode context

Encode(VplParams *config, DeviceInstance &dev)

Encode constructor.

Parameters
  • [in] config: workstream settings

  • [in] dev: device instance to create a context

vpl::VplParams *GetConfig()

Get Encode workstream parameter protobuf message.

Retrieve the Encode workstream setting protobuf message

Return

configuration setting protobuf message

vplStatus UpdateDeviceConfig()

Update Encode workstream parameter to device.

Propagate the Encode workstream settings to the device

Return

status

size_t EncodeFrame(vplm_mem *image, void *pbs_out)

Encode a video frame.

Encodes a raw frame of video to compressed bitstream. If a frame cannot be output this iteration returns output size 0.

Return

# of bytes in encoded bitstream output

Parameters
  • [in] image: raw frame to encode

  • [out] pbs_out:

~Encode()

Workstream::VideoProcess

class VideoProcess : public vpl::Workstream

Public Functions

VideoProcess(VplParams *config, DeviceContext *dcontext)

Process constructor.

Parameters
  • [in] config: video processing settings

  • [in] dcontext: video processing context

VideoProcess(VplParams *config, DeviceInstance &dev)

Process constructor.

Parameters
  • [in] config: workstream settings

  • [in] dev: device instance to create a context

vpl::VplParams *GetConfig()

Get VideoProcess workstream settings.

Retrieve the VideoProcess workstream setting message

Return

configuration setting protobuf message

vplStatus UpdateDeviceConfig()

Update VideoProcess workstream parameter to device driver.

Propagate the VideoProcess workstream settings to the device driver

Return

status

vplm_mem *ProcessFrame(vplm_mem *in)

Process a video frame.

Process a raw frame of video to create an output raw frame

Return

output raw frame output

Parameters
  • [in] in: raw frame to process

~VideoProcess()

Memory

vplm::cpu_image

class cpu_image : public vplm::wrapper<vplm_cpu_image>

Subclassed by vplm::cpu::image

Public Functions

uint8_t *buffer()
size_t buffer_size()
uint8_t *data(size_t index)
std::vector<uint8_t *> data()
uint32_t stride(size_t index)
std::vector<uint32_t> strides()

vplm::memory

class memory : public vplm::wrapper<const vplm_mem *>

Base C++ VPL Memory object.

This class wraps C memory handle and provides access to generic VPL Memory API calls independent from the HW frameworks like getting/setting properties.

To access underlying memory object within some framework (like OpenCL or VAAPI), VPL Memory C++ API defines framework specific memory handlers inherited from this base class. The key thing to note is that end user can construct framework handler from the base memory object. For example:

#include <vplmemory/vplm++.h>
#include <vplmemory/vplm_vaapi++.h>

vplm::memory mem = get_from_somewhere();
vplm::vaapi::memory va_mem(mem);

printf(">>> VASurfaceID=%d\n", va_mem.getSurfaceId());

Subclassed by vplm::cpu::memory, vplm::opencl::memory, vplm::vaapi::memory

Public Functions

memory()
memory(const vplm_mem *mem)
memory(const memory &mem)
~memory()
memory &operator=(const memory &mem)
vplm_status setProperty(int32_t key, vplm::variant value)
vplm::variant getProperty(int32_t key)
vplm_status clearProperties()

Protected Functions

void ref()
void unref()

vplm::cpu::memory

class memory : public vplm::memory

CPU (HOST) memory accessor.

Public Functions

memory(const vplm_mem *mem)
memory(const vplm::memory &mem)
vplm_status map(uint64_t flags, vplm::cpu_image &image)
vplm_status unmap(vplm::cpu_image &image)

vplm::cpu::image

class image : public vplm::cpu_image

CPU (HOST) image accessor.

Helper class which maps memory object for CPU (HOST) access in constructor and unmaps in destructor. Example:

#include <vplmemory/vplm++.h>

vplm::memory mem = get_from_somewhere();

{
    vplm::cpu::image image(mem, VPLM_ACCESS_MODE_READ);

    printf(">>> first plane stride is: %d\n", image.stride(0));
}

Public Functions

image(const vplm::memory &mem, uint64_t flags)
~image()

vplm::cpu::make_memory

namespace cpu

Functions

vplm::cpu::memory make_memory(const vplm::cpu_image &image)
vplm::cpu::memory make_memory(const vplm_image_info &info)
vplm::cpu::memory make_memory(uint32_t width, uint32_t height, vplm_pixel_format format)

vplm::opencl::memory

class memory : public vplm::memory

OpenCL memory accessor.

Public Functions

memory(cl_command_queue queue, const vplm_mem *mem)
memory(cl_command_queue queue, const vplm::memory &mem)
vplm_status get(vplm::cl_image &image)
vplm::cl_image get()
vplm_status begin_access(const vplm::cl_image &clmem, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)
vplm_status end_access(const vplm::cl_image &clmem, cl_uint num_events_in_wait_list, const cl_event *event_wait_list, cl_event *event)

vplm::opencl::image

class image : public vplm::cl_image

OpenCL image accessor.

Helper class which gets access to the memory object as OpenCL image in constructor and releases access in destructor. Example:

#include <vplmemory/vplm++.h>
#include <vplmemory/vplm_opencl++.h>

vplm::memory mem = get_from_somewhere();

{
    vplm::opencl::image image(mem, VPLM_ACCESS_MODE_READ);

    printf(">>> first plane cl_mem: %p\n", image[0]);
}

Public Functions

image(cl_command_queue queue, const vplm::memory &mem)
~image()

vplm::opencl::make_memory

namespace opencl

Functions

vplm::opencl::memory make_memory(cl_command_queue queue, vplm::cl_image &image)

vplm::sycl::memory

class memory

Public Functions

memory(const vplm_mem *vpl_mem)
memory(const vplm::memory &vpl_mem)
cl::sycl::image<2> acquire_image(const cl::sycl::queue &sycl_queue, cl::sycl::access::mode access_mode)
void release_image()
~memory()

Protected Functions

void release_handle()
cl::sycl::image_channel_order vplFormat2SYCL(int32_t format)
vplm_access_flags access_mode_sycl2vpl(cl::sycl::access::mode access_mode)

Protected Attributes

const vplm_mem *mem_
vplm_cpu_image cpu_image_
vplm_cl_image cl_image_

vplm::sycl::make_memory

namespace sycl

vplm::vaapi::memory

class memory : public vplm::memory

VAAPI memory accessor.

Public Functions

memory(VADisplay dpy, const vplm_mem *mem)
memory(VADisplay dpy, const vplm::memory &mem)
vplm_status getBufferId(VABufferID &id)
vplm_status getSurfaceId(VASurfaceID &id)
VABufferID getBufferId()
VASurfaceID getSurfaceId()

vplm::vaapi::image

class image

Public Functions

image(VADisplay dpy, const vplm::memory &mem)
VASurfaceID id() const

vplm::vaapi::make_memory

namespace vaapi

Functions

vplm::vaapi::memory make_surface(VADisplay dpy, VASurfaceID id)
vplm::vaapi::memory make_buffer(VADisplay dpy, VABufferID id)

Miscellaneous

vplStatus

enum vplStatus

status codes

Values:

VPL_OK = 0
VPL_ERR_NOT_SUPPORTED = -1
VPL_ERR_NULL_POINTER = -2
VPL_ERR_NOT_FOUND = -3
VPL_ERR_HW_UNAVALIBLE = -4
VPL_ERR_INVALID_FRAME = -5
VPL_ERR_OUT_OF_RESOURCES = -6
VPL_ERR_INTERNAL_ERROR = -7
VPL_ERR_INVALID_SIZE = -8
VPL_ERR_INVALID_PROPERTY = -9

VplCodecType

enum VplCodecType

bitstream codec types

Values:

VPL_CODEC_H264 = 0
VPL_CODEC_H265 = 1
VPL_CODEC_MPEG2 = 2
VPL_CODEC_VC1 = 3
VPL_CODEC_VP9 = 4
VPL_CODEC_AV1 = 5
VPL_CODEC_COUNT = 6

VplFourCC

enum VplFourCC

bitstream and raw frame format FourCC codes

Values:

VPL_FOURCC_H264 = VPL_MAKEFOURCC('H', '2', '6', '4')
VPL_FOURCC_H265 = VPL_MAKEFOURCC('H', '2', '6', '5')
VPL_FOURCC_NV12 = VPL_MAKEFOURCC('N', 'V', '1', '2')
VPL_FOURCC_RGB4 = VPL_MAKEFOURCC('R', 'G', 'B', '4')
VPL_FOURCC_I420 = VPL_MAKEFOURCC('I', '4', '2', '0')
VPL_FOURCC_BGRA = VPL_MAKEFOURCC('B', 'G', 'R', 'A')
VPL_FOURCC_I010 = VPL_MAKEFOURCC('I', '0', '1', '0')

VplCrop

struct VplCrop

Public Members

uint32_t crop_x
uint32_t crop_y
uint32_t crop_w
uint32_t crop_h

VplAspectRatio

struct VplAspectRatio

Public Members

uint32_t ratio_w
uint32_t ratio_h

VplEncodePreset

enum VplEncodePreset

Values:

VPL_BALANCED = 0
VPL_MAX_QUALITY = 1
VPL_HIGH_QUALITY = 2
VPL_QUALITY = 3
VPL_SPEED = 4
VPL_HIGH_SPEED = 5
VPL_MAX_SPEED = 6
VPL_LOW_LATENCY_MAX_QUALITY = 7
VPL_LOW_LATENCY_MAX_SPEED = 8
VPL_LOWEST_LATENCY_MAX_QUALITY = 9
VPL_LOWEST_LATENCY_MAX_SPEED = 10
VPL_EP_COUNT = 11

VplEncodeScenario

enum VplEncodeScenario

Values:

GAME_STREAMING
CLOUD_GAMING
VIDEO_CONFERENCING
LIVE_STREAMING
LIVE_STREAMING_LD
RECORDING
SURVEILLANCE
REMOTE_DISPLAY

VplBRC

enum VplBRC

Values:

VPL_BRC_CBR = 0
VPL_BRC_VBR = 1
VPL_BRC_CQP = 2
VPL_BRC_AVBR = 3
VPL_BRC_COUNT = 4

VplFrameRate

struct VplFrameRate

Public Members

uint32_t FrameRateExtN
uint32_t FrameRateExtD

VplVersion

struct VplVersion

Public Members

uint32_t major
uint32_t major_update
uint32_t phase
uint32_t phase_update

vplLogLevel

enum vplLogLevel

logging level

Values:

LOG_NONE = 0x0
LOG_INFO = 0x1
LOG_WARNING = 0x2
LOG_ERROR = 0x4
LOG_CRITICAL = 0x8
LOG_ALL = 0xFFFF