How to train a binary classifier

A binary classifier sorts items into two groups, based off item attributes.

To get started, prepare a single table with one item per row, and columns as attributes.

Then find the button on the Models page, and follow along below to train an initial model.

Train a New Binary Classifier

Start here

There's a button like this on the Models page. That's the one you want.

Model name

Give your creation a name



Don't worry, you can change it later.

Connection

The name of the connection you set up earlier



This is where the training job will go for data.

Table

The table with training data



Write schema.table here to specify a schema.

Make sure the user specified by the connection has permission to run a SELECT on the table.

Filter

SQL expression used to filter down items



All rows that evaluate to true will be used to build the model.

This is a great place to specify (for example) a date range, to get a repeatable set of items even as the table grows.

Test set

SQL expression specifying if an item is in the test set



This will be used to split items into two groups, one for training the model, and one for evaluating (or "testing") the model. All rows that evaluate to true will be used in the test set.

A larger training set will generally yield better models, while a larger test set will give a more-accurate measure of model accuracy, to a point. When in doubt, start with a 50/50 split.

Response

Column name (or SQL expression) for the item attribute the model should predict

This should be valued 0 and 1, or true and false.

Item id

Column name (or SQL expression) uniquely identifying the item



You'll use this later to inspect predictions for specific items.

Predictors

Column names (or SQL expressions) of item attributes to use when making a prediction

One predictor per line. There are two kinds here:

1. Numeric predictors

Real-valued attributes with many possible values, and real-valued attributes where nearby values are related


The model will make use of the distance between attribute values.

2. Categorical predictors

All other attributes


The model will learn separately for each attribute value.

Train

Start the training job



Your model will be ready in a couple of minutes.