Introduction

Machine learning (ML) research with classifiers usually emphasizes quantitative evaluation, i.e. measuring accuracy, AUC or some other performance metric. But it's also useful to visualize what classifier algorithms do with different datasets.

This is the index page of a "machine learning classifier gallery" which shows the results of numerous experiments on ML algorithms when applied to two-dimensional patterns. Each row shows a different pattern (or pattern set), described verbally then illustrated on a 2-D grid. These patterns were randomly generated (for the most part) on a 2-D grid of points, in [0:4] x [0:4] with a resolution of 0.05, yielding 6561 total points. The points were then labeled based on where they fell in the pattern.

On the right are algorithm classes (instance-space, rule + tree, etc.). Clicking on one of these takes you to a page showing different samples (size 50 to 1000) of the target pattern and the hypotheses generated from it by the various algorithms. Clicking on any diagram will show it to you at full resolution. Hovering over an algorithm name (eg, pyriel-2) will describe briefly what the algorithm is and exactly what arguments it was invoked with.

Credits

All figures were created with gnuplot. I used Weka for most learning algorithms (except for CA and Pyriel, which are my own). The "glue code" is Perl with some Python.

Contributions

Construction of these web pages is entirely automated. Adding new algorithms and test patterns is easy. If you have algorithms (or patterns) you'd like to add, send them along to me. Command-line invocation of Weka programs are especially welcome.

The raw datafiles for all of these patterns are available at http://www.divshare.com/download/8078929-3a0. However, note that since these patterns are generated automatically with some random components (eg, slope of lines, vertices of polygons) the data may not correspond exactly to what you see here.

-Tom Fawcett (tom.fawcett@gmail.com)


Domain description Diagram Algorithm classes
CHECKERBOARD

X = [0, 4], Y = [0, 4], resolution 0.05
Checkerboard (parity), 9 squares alternating classes
Total of 6561 points (3645 class 1, 2916 class 2)
Skew: 1.25 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
CIRCLE

X = [0, 4], Y = [0, 4], resolution 0.05
Circle radius 1.60 at (2.00, 2.00), balanced classes
Total of 6561 points (3364 class 1, 3197 class 2)
Skew: 1.05 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
SMALL_CIRCLE

X = [0, 4], Y = [0, 4], resolution 0.05
Circle radius 0.50 at (2.00, 2.00)
Total of 6561 points (6236 class 1, 325 class 2)
Skew: 19.19 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
ANNULUS

X = [0, 4], Y = [0, 4], resolution 0.05
Annulus centered at (2.00, 2.00) between radii 1.60 and 1.06
Total of 6561 points (4797 class 1, 1764 class 2)
Skew: 2.72 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
CLOSEDCONCAVE

X = [0, 4], Y = [0, 4], resolution 0.05
Concave polygon of 6 points
Total of 6561 points (862 class 1, 5699 class 2)
Skew: 0.15 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
CLOSEDCONVEX

X = [0, 4], Y = [0, 4], resolution 0.05
Convex polygon of 8 points
Total of 6561 points (914 class 1, 5647 class 2)
Skew: 0.16 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
DISJUNCTIVE

X = [0, 4], Y = [0, 4], resolution 0.05
4 disjoint concave polygons
Total of 6561 points (2945 class 1, 3616 class 2)
Skew: 0.81 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
LINEAR

X = [0, 4], Y = [0, 4], resolution 0.05
Linear: Y = 1.87 * X + -1.74
Total of 6561 points (3281 class 1, 3280 class 2)
Skew: 1.00 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
LINEARAXP

X = [0, 4], Y = [0, 4], resolution 0.05
Line parallel to Y axis at Y = 2
Total of 6561 points (3321 class 1, 3240 class 2)
Skew: 1.02 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
PARABOLIC

X = [0, 4], Y = [0, 4], resolution 0.05
Parabola: Y = (X - 2)^2 / (4* 0.25) + 1
Total of 6561 points (3761 class 1, 2800 class 2)
Skew: 1.34 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
PARITY

X = [0, 4], Y = [0, 4], resolution 0.05
9 parity circles, default class 1
Total of 6561 points (5006 class 1, 1555 class 2)
Skew: 3.22 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
POLYNOMIAL

X = [0, 4], Y = [0, 4], resolution 0.05
Y = 1/2 * (x-2)**3 + 1/2 * (x-2.2)**2 + 2
Total of 6561 points (3989 class 1, 2572 class 2)
Skew: 1.55 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
SINE

X = [0, 4], Y = [0, 4], resolution 0.05
Sine wave freq 1.78 centered at Y = 2.00, amplitude 0.84
Total of 6561 points (3368 class 1, 3193 class 2)
Skew: 1.05 / 1
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines
SPIRALS

Intertwined spirals, 3 cycles each, radius 0.67
352 points (176 class 1, 176 class 2)
Bayes Instance-space methods Logistic regression Rule and tree Support vector machines

Comments, questions, new alg contributions: tom.fawcett@gmail.comMain index page