# K-Means initialization¶

The K-Means initialization algorithm receives $$n$$ feature vectors as input and chooses $$k$$ initial centroids. After initialization, K-Means algorithm uses the initialization result to partition input data into $$k$$ clusters.

 Operation Computational methods Programming Interface Computing Dense compute(…) compute_input compute_result

## Mathematical formulation¶

### Computing¶

Given the training set $$X = \{ x_1, \ldots, x_n \}$$ of $$p$$-dimensional feature vectors and a positive integer $$k$$, the problem is to find a set $$C = \{ c_1, \ldots, c_k \}$$ of $$p$$-dimensional initial centroids.

### Computing method: dense¶

The method chooses first $$k$$ feature vectors from the training set $$X$$.

## Usage example¶

### Computing¶

table run_compute(const table& data) {
const auto kmeans_desc = kmeans_init::descriptor<float,
kmeans_init::method::dense>{}
.set_cluster_count(10)

const auto result = compute(kmeans_desc, data);

print_table("centroids", result.get_centroids());

return result.get_centroids();
}


## Programming Interface¶

All types and functions in this section shall be declared in the oneapi::dal::kmeans_init namespace and be available via inclusion of the oneapi/dal/algo/kmeans_init.hpp header file.

### Descriptor¶

template <typename Float = float,
typename Method = method::by_default,
class descriptor {
public:

explicit descriptor(std::int64_t cluster_count = 2);

std::int64_t get_cluster_count() const;
descriptor& set_cluster_count(std::int64_t);

};

template<typename Float = float, typename Method = method::by_default, typename Task = task::by_default>
class descriptor
Template Parameters
• Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

• Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.

• Task – Tag-type that specifies the type of the problem to solve. Can be task::init.

Constructors

descriptor(std::int64_t cluster_count = 2)

Creates a new instance of the class with the given cluster_count.

Properties

std::int64_t cluster_count = 2

The number of clusters $$k$$.

Getter & Setter
std::int64_t get_cluster_count() const
descriptor & set_cluster_count(std::int64_t)
Invariants
cluster_count > 0

#### Method tags¶

namespace method {
struct dense {};
using by_default = dense;
} // namespace method

struct dense

Tag-type that denotes dense computational method.

using by_default = dense

namespace task {
struct init {};
using by_default = init;

struct init

Tag-type that parameterizes entities used for obtaining the initial K-Means centroids.

using by_default = init

Alias tag-type for the initialization task.

### Computing compute(...)¶

#### Input¶

template <typename Task = task::by_default>
class compute_input {
public:

compute_input(const table& data = table{});

const table& get_data() const;
compute_input& set_data(const table&);
};

template<typename Task = task::by_default>
class compute_input
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::init.

Constructors

compute_input(const table &data = table{})

Creates a new instance of the class with the given data.

Properties

const table &data = table{}

An $$n \times p$$ table with the data to be clustered, where each row stores one feature vector.

Getter & Setter
const table & get_data() const
compute_input & set_data(const table &)

#### Result¶

template <typename Task = task::by_default>
class compute_result {
public:

compute_result();

const table& get_centroids() const;
};

template<typename Task = task::by_default>
class compute_result
Template Parameters

Task – Tag-type that specifies type of the problem to solve. Can be task::clustering.

Constructors

compute_result()

Creates a new instance of the class with the default property values.

Properties

const table &centroids = table{}

A $$k \times p$$ table with the initial centroids. Each row of the table stores one centroid.

Getter & Setter
const table & get_centroids() const

#### Operation¶

template<typename Float, typename Method, typename Task>
compute_result<Task> compute(const descriptor<Float, Method, Task> &desc, const compute_input<Task> &input)

Runs the computing operation for K-Means initialization. For more details, see oneapi::dal::compute.

Template Parameters
• Float – The floating-point type that the algorithm uses for intermediate computations. Can be float or double.

• Method – Tag-type that specifies an implementation of K-Means Initialization algorithm.

• Task – Tag-type that specifies type of the problem to solve. Can be task::init.

Parameters
• desc – The descriptor of the algorithm.

• input – Input data for the computing operation.

Preconditions
input.data.has_data == true
input.data.row_count == desc.cluster_count
Postconditions
result.centroids.has_data == true
result.centroids.row_count == desc.cluster_count
result.centroids.column_count == input.data.column_count