Package 'solitude' reference manual

Title:	An Implementation of Isolation Forest
Description:	Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).
Authors:	Komala Sheshachala Srikanth [aut, cre], David Zimmermann [ctb]
Maintainer:	Komala Sheshachala Srikanth <[email protected]>
License:	GPL-3
Version:	1.1.3
Built:	2025-03-30 04:41:46 UTC
Source:	https://github.com/talegari/solitude

Check for a single integer

Description

for a single integer

Usage

is_integerish(x)
is_integerish(x)

Arguments

x

input

Value

TRUE or FALSE

Examples

## Not run: is_integerish(1)
## Not run: is_integerish(1)

'solitude' class implements the isolation forest method introduced by paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>). The extremely randomized trees (extratrees) required to build the isolation forest is grown using ranger function from ranger package.

Design

$new() initiates a new 'solitude' object. The possible arguments are:

sample_size: (positive integer, default = 256) Number of observations in the dataset to used to build a tree in the forest
num_trees: (positive integer, default = 100) Number of trees to be built in the forest
replace: (boolean, default = FALSE) Whether the sample of observations should be chosen with replacement when sample_size is less than the number of observations in the dataset
seed: (positive integer, default = 101) Random seed for the forest
nproc: (NULL or a positive integer, default: NULL, means use all resources) Number of parallel threads to be used by ranger
respect_unordered_factors: (string, default: "partition")See respect.unordered.factors argument in ranger
max_depth: (positive number, default: ceiling(log2(sample_size))) See max.depth argument in ranger

$fit() fits a isolation forest for the given dataframe or sparse matrix, computes depths of terminal nodes of each tree and stores the anomaly scores and average depth values in $scores object as a data.table

$predict() returns anomaly scores for a new data as a data.table

Details

Parallelization: ranger is parallelized and by default uses all the resources. This is supported when nproc is set to NULL. The process of obtaining depths of terminal nodes (which is excuted with $fit() is called) may be parallelized separately by setting up a future backend.

Methods

Method `new()`

Usage

isolationForest$new(
  sample_size = 256,
  num_trees = 100,
  replace = FALSE,
  seed = 101,
  nproc = NULL,
  respect_unordered_factors = NULL,
  max_depth = ceiling(log2(sample_size))
)

Method `fit()`

Usage

isolationForest$fit(dataset)

Method `predict()`

Usage

isolationForest$predict(data)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

isolationForest$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## Not run: 
library("solitude")
library("tidyverse")
library("mlbench")

data(PimaIndiansDiabetes)
PimaIndiansDiabetes = as_tibble(PimaIndiansDiabetes)
PimaIndiansDiabetes

splitter   = PimaIndiansDiabetes %>%
  select(-diabetes) %>%
  rsample::initial_split(prop = 0.5)
pima_train = rsample::training(splitter)
pima_test  = rsample::testing(splitter)

iso = isolationForest$new()
iso$fit(pima_train)

scores_train = pima_train %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_train

umap_train = pima_train %>%
  scale() %>%
  uwot::umap() %>%
  setNames(c("V1", "V2")) %>%
  as_tibble() %>%
  rowid_to_column() %>%
  left_join(scores_train, by = c("rowid" = "id"))

umap_train

umap_train %>%
  ggplot(aes(V1, V2)) +
  geom_point(aes(size = anomaly_score))

scores_test = pima_test %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_test

## End(Not run)
## Not run: 
library("solitude")
library("tidyverse")
library("mlbench")

data(PimaIndiansDiabetes)
PimaIndiansDiabetes = as_tibble(PimaIndiansDiabetes)
PimaIndiansDiabetes

splitter   = PimaIndiansDiabetes %>%
  select(-diabetes) %>%
  rsample::initial_split(prop = 0.5)
pima_train = rsample::training(splitter)
pima_test  = rsample::testing(splitter)

iso = isolationForest$new()
iso$fit(pima_train)

scores_train = pima_train %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_train

umap_train = pima_train %>%
  scale() %>%
  uwot::umap() %>%
  setNames(c("V1", "V2")) %>%
  as_tibble() %>%
  rowid_to_column() %>%
  left_join(scores_train, by = c("rowid" = "id"))

umap_train

umap_train %>%
  ggplot(aes(V1, V2)) +
  geom_point(aes(size = anomaly_score))

scores_test = pima_test %>%
  iso$predict() %>%
  arrange(desc(anomaly_score))

scores_test

## End(Not run)

An Implementation of Isolation Forest

Description

Isolation forest is an anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>)

Author(s)

Srikanth Komala Sheshachala

Depth of each terminal node of all trees in a ranger model

Description

Depth of each terminal node of all trees in a ranger model is returned as a three column tibble with column names: 'id_tree', 'id_node', 'depth'. Note that root node has the node_id = 0.

Usage

terminalNodesDepth(model)
terminalNodesDepth(model)

Arguments

model

A ranger model

Details

This function may be parallelized using a future backend.

Value

A tibble with three columns: 'id_tree', 'id_node', 'depth'.

Examples

rf = ranger::ranger(Species ~ ., data = iris, num.trees = 100)
terminalNodesDepth(rf)
rf = ranger::ranger(Species ~ ., data = iris, num.trees = 100)
terminalNodesDepth(rf)

Depth of each terminal node of a single tree in a ranger model

Description

Depth of each terminal node of a single tree in a ranger model. Note that root node has the id_node = 0.

Usage

terminalNodesDepthPerTree(treelike)
terminalNodesDepthPerTree(treelike)

Arguments

treelike

Output of 'ranger::treeInfo'

Value

data.table with two columns: id_node and depth

Examples

## Not run: 
  rf = ranger::ranger(Species ~ ., data = iris)
  terminalNodesDepthPerTree(ranger::treeInfo(rf, 1))

## End(Not run)
## Not run: 
  rf = ranger::ranger(Species ~ ., data = iris)
  terminalNodesDepthPerTree(ranger::treeInfo(rf, 1))

## End(Not run)

Package 'solitude'

Help Index

Check for a single integer

Description

Usage

Arguments

Value

Examples

Fit an Isolation Forest

Description

Design

Details

Methods

Public methods

Method new()

Usage

Method fit()

Usage

Method predict()

Usage

Method clone()

Usage

Arguments

Examples

An Implementation of Isolation Forest

Description

Author(s)

See Also

Depth of each terminal node of all trees in a ranger model

Description

Usage

Arguments

Details

Value

Examples

Depth of each terminal node of a single tree in a ranger model

Description

Usage

Arguments

Value

Examples

Method `new()`

Method `fit()`

Method `predict()`

Method `clone()`