Welcome to gokart’s documentation!

Useful links: GitHub | cookiecutter gokart

Gokart is a wrapper of the data pipeline library luigi. Gokart solves “reproducibility”, “task dependencies”, “constraints of good code”, and “ease of use” for Machine Learning Pipeline.

Good thing about gokart

Here are some good things about gokart.

  • The following data for each Task is stored separately in a pkl file with hash value
    • task output data
    • imported all module versions
    • task processing time
    • random seed in task
    • displayed log
    • all parameters set as class variables in the task
  • If change parameter of Task, rerun spontaneously.
    • The above file will be generated with a different hash value
    • The hash value of dependent task will also change and both will be rerun
  • Support GCS or S3
  • The above output is exchanged between tasks as an intermediate file, which is memory-friendly
  • pandas.DataFrame type and column checking during I/O
  • Directory structure of saved files is automatically determined from structure of script
  • Seeds for numpy and random are automatically fixed
  • Can code while adhering to SOLID principles as much as possible
  • Tasks are locked via redis even if they run in parallel

These are all functions baptized for creating Machine Learning batches. Provides an excellent environment for reproducibility and team development.

Indices and tables