Skip to content

Utilize these Four Comprehensible Algorithms for Optimal Results in 2022

New Year's arrival marks the time for fresh commitments, one of which could be enhancing the clarity of decision-making procedures. To assist you in achieving this, I'm offering four interpretable rule-based algorithms. These algorithms employ an ensemble of decision trees as a rule generator,...

Utilized Algorithms to Consider in 2022: A Comprehensive List of Four Methods
Utilized Algorithms to Consider in 2022: A Comprehensive List of Four Methods

Utilize these Four Comprehensible Algorithms for Optimal Results in 2022

In the realm of machine learning, four algorithms - RuleFit, Node harvest, SIRUS, and CoveringAlgorithm - strive to provide transparent decision-making models through sets of rules. Despite their shared goal, each method varies in its approach to extracting, presenting, and balancing interpretability with accuracy.

RuleFit, developed in 2008, generates a large ensemble of rules extracted from decision trees and fits a sparse linear model combining these rules and original features. The algorithm offers good interpretability with explicit rule weights, but its transparency can be compromised if many rules remain, making it more like a "rule + linear weight" model.

Node harvest, on the other hand, harvests "nodes" (rules) from trees, then averages predictions over these rules with positive weights. This method yields a small, weighted set of positive rules combined additively, making it highly interpretable. The rules are directly interpretable as conjunctions of conditions, and the weights represent a convex combination, making them easier to understand as "strength of membership."

SIRUS (Stable and Interpretable RUle Set), developed to focus on stability and simple rules, extracts rules that are most frequently selected across many tree ensembles. The result is a very small, stable, and simple rule set that is extremely easy to interpret. SIRUS is designed explicitly to maximize stability (robustness to data perturbations) to aid trust and interpretability, prioritising simplicity and reproducibility of rules over maximal accuracy.

CoveringAlgorithm iteratively finds simple rules that cover subsets of data to explain decisions or class membership. Each rule is simple and explicitly covers part of the data, making the explanation modular. The rules are non-overlapping or minimally overlapping, contributing to the model's clarity.

When comparing the algorithms based on rule set size, rule weighting, rule stability, and interpretability characteristic, SIRUS stands out for its very small, stable, reproducible set of simple rules, while Node harvest represents a middle ground with small, convex weighted rules that are intuitive. RuleFit offers potentially more rules with positive and negative weights, blending interpretable rules into a linear model but possibly at some interpretability cost. CoveringAlgorithm prioritises simplicity and direct coverage but varies depending on exact methods.

If your priority is human interpretability and trust, SIRUS and Node harvest are preferable choices. For more flexible, potentially higher-accuracy models with transparency, RuleFit might be preferable, though at some interpretability cost. CoveringAlgorithms prioritise simplicity and direct coverage but vary depending on the exact methods used.

As we step into the new year, making decision-making processes more interpretable could be a valuable resolution. European Contract Research Organization (CRO) Advestis, with a deep understanding and practice of statistics and interpretable machine learning techniques, is well-positioned to help in this endeavour.

Artificial-intelligence, in the form of techniques like SIRUS and Node harvest, is especially notable for enhancing the interpretability of machine learning models, making them more accessible to humans. Unlike RuleFit, which offers a balance between interpretability and accuracy through a blend of rules and a linear model, SIRUS explicitly focuses on simplicity, stability, and reproducibility, making it more suitable for decisions requiring high human trust.

Read also:

    Latest