Tutorial

Also please refer to Intro To Gokart section.

1, Make gokart project

Create a project using cookiecutter-gokart.

cookiecutter  https://github.com/m3dev/cookiecutter-gokart
# project_name [project_name]: example
# package_name [package_name]: gokart_example
# python_version [3.7.0]:
# author [your name]: m3dev
# package_description [What's this project?]: gokart example
# license [MIT License]:

You will have a directory tree like following:

tree example/
example/
├── Dockerfile
├── README.md
├── conf
│   ├── logging.ini
│   └── param.ini
├── gokart_example
│   ├── __init__.py
│   ├── model
│   │   ├── __init__.py
│   │   └── sample.py
│   └── utils
│       └── template.py
├── main.py
├── pyproject.toml
└── test
    ├── __init__.py
    └── unit_test
        └── test_sample.py

2, Running sample task

Let’s run the first task.

python main.py gokart_example.Sample --local-scheduler

The results are stored in resources directory.

tree resources
resources/
├── gokart_example
│   └── model
│       └── sample
│           └── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl
└── log
    ├── module_versions
    │   └── Sample_cdf55a3d6c255d8c191f5f472da61f99.txt
    ├── processing_time
    │   └── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl
    ├── random_seed
    │   └── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl
    ├── task_log
    │   └── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl
    └── task_params
        └── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl

Please refer to Intro To Gokart for output

Note

It is better to use poetry in terms of the module version. Please refer to poetry document

poetry lock
poetry run python main.py gokart_example.Sample --local-scheduler

If want to stabilize it further, please use docker.

docker build -t sample .
docker run -it sample "python main.py gokart_example.Sample --local-scheduler"

3, Check result

Check the output.

with open('resources/gokart_example/model/sample/Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl', 'rb') as f:
    print(pickle.load(f))  # sample output

4, Run unittest

It is important to run unittest before and after modifying the code.

python -m unittest discover -s ./test/unit_test/
.
----------------------------------------------------------------------
Ran 1 test in 0.001s

OK

5, Create Task

Writing gokart-like tasks. Modify example/gokart_example/model/sample.py as follows:

from logging import getLogger
import gokart
from gokart_example.utils.template import GokartTask
logger = getLogger(__name__)


class Sample(GokartTask):
    def run(self):
        self.dump('sample output')


class StringToSplit(GokartTask):
    """Like the function to divide received data by spaces."""
    task = gokart.TaskInstanceParameter()

    def run(self):
        sample = self.load('task')
        self.dump(sample.split(' '))


class Main(GokartTask):
    """Endpoint task."""
    def requires(self):
        return StringToSplit(task=Sample())

Added Main and StringToSplit. StringToSplit is a function-like task that loads the result of an arbitrary task and splits it by spaces. Main is injecting Sample into StringToSplit. It like Endpoint.

Let’s run the Main task.

python main.py gokart_example.Main --local-scheduler

Please take a look at the logger output at this time.

===== Luigi Execution Summary =====

Scheduled 3 tasks of which:
* 1 complete ones were encountered:
    - 1 gokart_example.Sample(...)
* 2 ran successfully:
    - 1 gokart_example.Main(...)
    - 1 gokart_example.StringToSplit(...)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

As the log shows, Sample has been executed once, so the cache will be used. The only things that worked were Main and StringToSplit.

The output will look like the following, with the result in StringToSplit_b8a0ce6c972acbd77eae30f35da4307e.pkl.

tree resources/
resources/
├── gokart_example
│   └── model
│       └── sample
│           ├── Sample_cdf55a3d6c255d8c191f5f472da61f99.pkl
│           └── StringToSplit_b8a0ce6c972acbd77eae30f35da4307e.pkl
...
with open('resources/gokart_example/model/sample/StringToSplit_b8a0ce6c972acbd77eae30f35da4307e.pkl', 'rb') as f:
    print(pickle.load(f))  # ['sample', 'output']

It was able to move the added task.

6, Rerun Task

Finally, let’s rerun the task. There are two ways to rerun a task. Change the rerun parameter or parameters of the dependent tasks.

gokart.TaskOnKart can set rerun parameter for each task like following:

class Main(GokartTask):
    rerun=True

    def requires(self):
        return StringToSplit(task=Sample(rerun=True), rerun=True)

OR

Add new parameter on dependent tasks like following:

class Sample(GokartTask):
    version = luigi.IntParameter(default=1)

    def run(self):
        self.dump('sample output version {self.version}')

In both cases, all tasks will be rerun. The difference is hash value given to output files. The reurn parameter has no effect on the hash value. So it will be rerun with the same hash value.

In the second method, version parameter is added to the Sample task. This parameter will change the hash value of Sample and generate another output file. And the dependent task, StringToSplit, will also have a different hash value, and rerun.

Please refer to Task Settings for details.

Please try rerunning task at hand:)

Feature

This is the end of the gokart tutorial. The tutorial is an introduction to some of the features. There are still more useful features.

Please See TaskOnKart section, For Pandas section and Task Parameters section for more useful features of the task.

Have a good gokart life.