tree based algorithms

The module nnlearn.tree includes all tree based models along with the data structures on which they depend.

class nnlearn.tree.DecisionTree(criterion_name='gini', min_samples_split=2, max_features=None, random_state=42)

Bases: object

Decision Tree data structure.

Parameters
  • criterion_name (str, optional) – Name of the metric based on which to define purity of tree nodes.

  • min_samples_split (int, optional) – Minimum number of samples present within given node in order for it to become an internal node.

  • max_features (int, optional) – Maximum number of features to take into account when deciding on how to split the node.

  • random_state (int, optional) – When you are not using all features to split the node and only selecting randomly a subset, then this will ensure reproducibility.

Notes

This implementation uses node objects as an underlying data structure. Each node has left and right child if it is an internal node or root.

class nnlearn.tree.DecisionTreeClassifier(criterion_name='gini', min_samples_split=2, max_features=None, random_state=42)

Bases: object

The DecisionTreeClassifier is a tree based ML model used for classification.

Parameters
  • criterion_name (str, optional) – Name of the metric based on which to define purity of tree nodes. Options: {‘gini’, ‘entropy’}

  • min_samples_split (int, optional) – Minimum number of samples present within given node in order for it to become an internal node.

  • min_samples_leaf (int, optional) – Mimimum number of leaves to be present within a leaf.

  • max_features (int, optional) – Maximum number of features to take into account when deciding on how to split the node.

  • random_state (int, optional) – When you are not using all features to split the node and only selecting randomly a subset, then this will ensure reproducibility.

fit(X, y)

Train the model.

Parameters
  • X (2d array) – Training data.

  • y (1d array) – Ground truth values.

predict(X)

Predicts labels for given records.

Parameters

X (2d array) – Data based on which to predict labels.

Returns

Predicted labels.

Return type

1d array

class nnlearn.tree.Node(X, y, tree, impurity=None, **kwargs)

Bases: object

Node object serves as a core element as part of the deciosion tree data structure.

Parameters
  • X (2d array) – Records which the given nodes holds.

  • y (1d array) – Labels for the records.

  • tree (DecisionTree) – Decision tree object.

  • impurity (float, optional) – Impurity of this node.

left

Left child.

Type

Node

right

Right child.

Type

Node

threshold

Value where to make split.

Type

float

feature

Index of feature within X based on which to do the split.

Type

int

is_leaf_node()

Return if the current node is a leaf node.

split()

Split the node if it is possible.