Tables

Table is a generic oneDAL concept over numerical data. It provides uniformed way to pass the data to the library as inputs or parameters or get them as results.

Table object is a container of two entities: data and metadata.

  • Data is organized in a shape of \((N \times p)\), where \(N\) is a number of observations in a table and \(p\) is a number of features.

  • Metadata defines the detailed structure of the data and, therefore, helps oneDAL to access the data efficiently (see Metadata API section for details).

Table types in oneDAL

oneDAL defines a set of classes, each class implements a table contract with a specific set of metadata values (see Metadata API for details):

Table type

Data layout

Data format

Is contiguous

Is homogeneous

homogen_table

row_major/column_major

dense

yes

yes

soa_table

column_major

dense

yes

yes/no

aos_table

row_major

dense

yes

yes/no

csr_table

row_major

csr

yes

yes

Requirements on table objects

Each table object in oneDAL follows these requirements:

  1. Table objects in oneDAL are immutable (it is not possible to change data or metadata values inside the table).

  2. To create complex table types or modify table data, builders should be used.

  3. Table objects in oneDAL are reference-counted. One can use an assignment operator or copy constructor on table objects to create another reference to it.

    onedal::table table2 = table1;
    // table1 and table2 share the data (no data copy is performed)
    
    table3 = table2;
    // table1, table2 and table3 share the same data
    
  4. Every table type must be inherited from the base table class, which represents a generalized table API.

  5. Every table type is implemented over particular set of metadata values and must hide other implementation details from public API.

Entities and their dependencies

This section describes dependencies between all the classes and structures related to tables.

TBD

Table API

class table {
public:
   table() = default;

   template <typename TableImpl,
            typename = std::enable_if_t<is_table_impl_v<TableImpl>>>
   table(TableImpl&&);

   table(const table&);
   table(table&&);

   table& operator=(const table&);

   std::int64_t get_feature_count() const noexcept;
   std::int64_t get_observation_count() const noexcept;
   bool is_empty() const noexcept;
   const dal::table_meta& get_metadata() const noexcept;
};
class table
table()

Creates an empty table with no data and table_meta constructed by default

table(TableImpl&&)

Creates a table object using the entity passed as a parameter

Template Parameters

TableImpl – The class that contains the table’s implementation

Invariants
contract is_table_impl is satisfied
table(const table&)

Creates new reference object on the table data

table(table&&)

Moves one table object into another

table &operator=(const table&)

Sets the current object reference to point to another one

std::int64_t feature_count = 0

The number of features \(p\) in the table.

Getter
std::int64_t get_feature_count() const noexcept
Invariants
feature_count >= 0
std::int64_t observation_count = 0

The number of observations \(N\) in the table.

Getter
std::int64_t get_observation_count() const noexcept
Invariants
observation_count >= 0
bool is_empty = true

If feature_count or observation_count are zero, the table is empty.

Getter
bool is_empty() const noexcept
table_meta metadata = table_meta()

The object that represents data structure inside the table

Getter
const dal::table_meta& get_metadata() const noexcept
Invariants
is_empty = false

Homogeneous table

Class homogen_table is an implementation of a table type for which the following is true:

  • Its data is dense and it is stored as one contiguous memory block

  • All features have the same data type (but feature types may differ)

class homogen_table : public table {
public:
   // TODO:
   // Consider constructors with user-provided allocators & deleters

   homogen_table(const homogen_table&);
   homogen_table(homogen_table&&);

   homogen_table(std::int64_t N, std::int64_t p, data_layout layout);

   template <typename T>
   homogen_table(const T* const data_pointer, std::int64_t N, std::int64_t p, data_layout layout);

   homogen_table& operator=(const homogen_table&);

   data_type get_data_type() const noexcept;
   bool has_equal_feature_types() const noexcept;

   template <typename T>
   const T* get_data_pointer() const noexcept;
};
class homogen_table
homogen_table(const homogen_table&)

Creates new reference object on the table data

homogen_table(homogen_table&&)

Moves current reference object into another one

homogen_table(std::int64_t N, std::int64_t p, data_layout layout)

Creates a homogeneous table of shape \(N \times p\) with default oneDAL allocator

homogen_table(const T *const data_pointer, std::int64_t N, std::int64_t p, data_layout layout)
Template Parameters

T – The type of pointer to the data

Creates a homogeneous table of shape \(N \times p\) with the user-defined data. Uses the provided pointer to access data (no copy is performed).

homogen_table &operator=(const homogen_table&)

Sets the current object reference to point to another

onedal::data_type data_type

The type of underlying data

Getter
data_type get_data_type() const noexcept
bool feature_types_equal

Flag that indicates whether or not the feature_type fields of metadata are all equal

Getter
bool has_equal_feature_types() const noexcept
const T *data_pointer
Template Parameters

T – The type of pointer to the data

The pointer to underlying data

Getter
const T* get_data_pointer() const noexcept

Structure-of-arrays table

TBD

Arrays-of-structure table

TBD

Compressed-sparse-row table

TBD

Metadata API

Table metadata contains structures describing how the data are stored inside the table and how efficiently access them.

class table_meta {
public:
   table_meta();

   std::int64_t get_feature_count() const noexcept;
   table_meta& set_feature_count(std::int64_t);

   const feature_info& get_feature(std::int64_t index) const;
   table_meta& add_feature(const feature_info&);

   data_layout get_layout() const noexcept;
   table_meta& set_layout(data_layout);

   bool is_contiguous() const noexcept;
   table_meta& set_contiguous(bool);

   bool is_homogeneous() const noexcept;

   data_format get_format() const noexcept;
   table_meta& set_format(data_format);
};
class table_meta
std::int64_t feature_count = 0

The number of features \(p\) in the table.

Getter & Setter
std::int64_t get_feature_count() const noexcept
table_meta& set_feature_count(std::int64_t)
Invariants
feature_count >= 0
feature_info feature

Information about a particular feature in the table

Getter & Setter
const feature_info& get_feature(std::int64_t index) const
table_meta& add_feature(const feature_info&)
data_layout layout = data_layout::row_major

Flag that indicates whether the data is in a row-major or column-major format.

Getter & Setter
data_layout get_layout() const noexcept
table_meta& set_layout(data_layout)
bool is_contiguous = true

Flag that indicates whether the data is stored in contiguous blocks of memory by the axis of layout. For example, if is_contiguous == true and data_layout is row_major, the data is stored contiguously in each row.

Getter & Setter
bool is_contiguous() const noexcept
table_meta& set_contiguous(bool)
bool is_homogeneous() const noexcept

Returns true if all features have the same data_type

data_format format = data_format::dense

Description of the format used for data representation inside the table

Getter & Setter
data_format get_format() const noexcept
table_meta& set_format(data_format)

Data layout

enum class data_layout : std::int64_t {
   row_major,
   column_major
};
class data_layout

Structure that represents underlying data layout

Data format

enum class data_format : std::int64_t {
   dense,
   csr
};
class data_format

Structure that represents underlying format of the data

Feature info

class feature_info {
public:
   feature(data_type, feature_type);

   data_type get_data_type() const noexcept;
   feature_type get_type() const noexcept;
};
class feature_info

Structure that represents information about particular feature

Invariants:
feature_type::nominal or feature_type::ordinal are avaliable only with integer data_type
feature_type::contiguous avaliable only with floating-point data_type

Data type

enum class data_type : std::int64_t {
   u32, u64
   i32, i64,
   f32, f64
};
class data_type

Structure that represents runtime information about feature data type.

oneDAL supports next data types:

  • std::uint32_t

  • std::uint64_t

  • std::int32_t

  • std::int64_t

  • float

  • double

Feature type

enum class feature_type : std::int64_t {
   nominal,
   ordinal,
   contiguous
};
class feature_type

Structure that represents runtime information about feature logical type.

feature_type::nominal

Discrete feature type, non-ordered

feature_type::ordinal

Discrete feature type, ordered

feature_type::contiguous

Contiguous feature type