ludwig

Ludwig is a TensorFlow-based toolbox that allows users to train and test deep learning models without the need to write code. By offering a well-defined, codeless deep learning pipeline from beginning to end, Ludwig enables practitioners and researchers alike to quickly train and test their models and obtain strong baselines to compare experiments against. Ludwig offers CLI commands for preprocessing data, training, issuing predictions, and visualizations.

Running Ludwig with Comet

Install Ludwig

Install Ludwig for Python (and spacy for English as a dependency since we're using text features for this example). The following examples have been tested with Python 3.6 and Ludwig 0.2.

shell $ pip install ludwig $ python -m spacy download en

If you encounter problems installing gmpy please install libgmp or gmp. On Debian-based Linux distributions: sudo apt-get install libgmp3-dev. On MacOS: brew install gmp.

Install Comet

If you haven't already, install comet_ml.

bash $ pip install comet_ml

Make sure to set up your Comet credentials. Get your API key at www.comet.ml

Make your API key available to Ludwig and set which Comet project you’d like the Ludwig experiment details to report to. Replace the following ... with the appropriate values:

bash $ export COMET_API_KEY="..." $ export COMET_PROJECT_NAME="..."

We recommend that you create a new directory for each Ludwig experiment.

bash $ mkdir experiment1 $ cd experiment1

Some background: every time you want to create a new model and train it, you will use one of two commands:

  • ludwig train
  • ludwig experiment

Once you run these commands with the --comet flag, a .comet.config file is created in the current directory. This .comet.config file pulls your API key and Comet Project name from the environment variables you set above and creates an Experiment key for use in this directory.

If you want to run another experiment, it is recommended that you create a new directory (and thus it will create another Experiment).

Download the dataset

For this example, we will be working on a text classification problem with the Reuters-21578, a well-known newswire dataset. It only contains 21,578 newswire documents grouped into 6 categories. Two are 'big' categories (many positive documents), two are 'medium' categories, and two are 'small' categories (few positive documents).

  • Small categories: heat.csv, housing.csv
  • Medium categories: coffee.csv, gold.csv
  • Big categories: acq.csv, earn.csv

To get the dataset, we use the curl command-line program:

bash $ curl http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW2/reuters-allcats-6.zip \ -o reuters-allcats-6.zip $ unzip reuters-allcats-6.zip

You can also just download the file and place it in this directory.

Define the model

Define the model we wish to build with the input and output features we want. Create a file named model_definition.yaml with these contents:

``` input_features: - name: text type: text level: word encoder: parallel_cnn

output_features: - name: class type: category ```

Train the Model

Train the model with the new --comet flag:

bash $ ludwig experiment --comet --data_csv reuters-allcats.csv \ --model_definition_file model_definition.yaml

Once you run this, a Comet experiment will be created. Check your output for that Comet experiment URL and press on that URL.

Analysis

In Comet (even while the above experiment is being run), you’ll be able to see:

  • your live model metrics in real-time on the Charts tab
  • the bash command you ran to train your experiment along with any run arguments in the Code tab
  • hyperparameters that Ludwig is using (defaults) in the Hyper parameter tab and much more!

If you choose to make any visualizations with Ludwig, it’s also possible to upload these visualizations to Comet’s Image Tab by running:

bash $ ludwig visualize --comet \ --visualization learning_curves \ --training_statistics \ ./results/experiment_run_0/training_statistics.json

To keep up to date with Ludwig, consider these resources: