How to train a binary classifier

A binary classifier sorts items into two groups, based off item attributes.

To get started, prepare a single table with one item per row, and columns as attributes.

Then find the button on the Models page, and follow along below to train an initial model.

Train a New Binary Classifier

Start here

There's a button like this on the Models page. That's the one you want.

Model name

Give your creation a name

Don't worry, you can change it later.


The name of the connection you set up earlier

This is where the training job will go for data.


The table with training data

Write schema.table here to specify a schema.

Make sure the user specified by the connection has permission to run a SELECT on the table.


SQL expression used to filter down items

All rows that evaluate to true will be used to build the model.

This is a great place to specify (for example) a date range, to get a repeatable set of items even as the table grows.

Test set

SQL expression specifying if an item is in the test set

This will be used to split items into two groups, one for training the model, and one for evaluating (or "testing") the model. All rows that evaluate to true will be used in the test set.

A larger training set will generally yield better models, while a larger test set will give a more-accurate measure of model accuracy, to a point. When in doubt, start with a 50/50 split.


Column name (or SQL expression) for the item attribute the model should predict

This should be valued 0 and 1, or true and false.

Item id

Column name (or SQL expression) uniquely identifying the item

You'll use this later to inspect predictions for specific items.


Column names (or SQL expressions) of item attributes to use when making a prediction

One predictor per line. There are two kinds here:

1. Numeric predictors

Real-valued attributes with many possible values, and real-valued attributes where nearby values are related

The model will make use of the distance between attribute values.

2. Categorical predictors

All other attributes

The model will learn separately for each attribute value.


Start the training job

Your model will be ready in a couple of minutes.