gokart package

Submodules

gokart.file_processor module

class gokart.file_processor.BinaryFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

Pass bytes to this processor

` figure_binary = io.BytesIO() plt.savefig(figure_binary) figure_binary.seek(0) BinaryFileProcessor().dump(figure_binary.read()) `

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.CsvFileProcessor(sep=', ')[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.FeatherFileProcessor(store_index_in_feather: bool)[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.FileProcessor[source]

Bases: object

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.GzipFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.JsonFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.NpzFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.ParquetFileProcessor(engine='pyarrow', compression=None)[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.PickleFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.TextFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
class gokart.file_processor.XmlFileProcessor[source]

Bases: gokart.file_processor.FileProcessor

dump(obj, file)[source]
format()[source]
load(file)[source]
gokart.file_processor.make_file_processor(file_path: str, store_index_in_feather: bool) → gokart.file_processor.FileProcessor[source]

gokart.info module

gokart.info.make_tree_info(task: gokart.task.TaskOnKart, indent: str = '', last: bool = True, details: bool = False, abbr: bool = True, visited_tasks: Optional[Set[str]] = None, ignore_task_names: Optional[List[str]] = None) → str[source]

Return a string representation of the tasks, their statuses/parameters in a dependency tree format

This function has moved to gokart.tree.task_info.make_task_info_as_tree_str. This code is remained for backward compatibility.

  • task: TaskOnKart
    Root task.
  • details: bool
    Whether or not to output details.
  • abbr: bool
    Whether or not to simplify tasks information that has already appeared.
  • ignore_task_names: Optional[List[str]]
    List of task names to ignore.
  • tree_info : str
    Formatted task dependency tree.
class gokart.info.tree_info(*args, **kwargs)[source]

Bases: gokart.task.TaskOnKart

mode = <luigi.parameter.Parameter object>
output()[source]
output_path = <luigi.parameter.Parameter object>

gokart.parameter module

class gokart.parameter.ExplicitBoolParameter(*args, **kwargs)[source]

Bases: luigi.parameter.BoolParameter

class gokart.parameter.ListTaskInstanceParameter(default=<object object>, is_global=False, significant=True, description=None, config_path=None, positional=True, always_in_help=False, batch_method=None, visibility=<ParameterVisibility.PUBLIC: 0>)[source]

Bases: luigi.parameter.Parameter

parse(s)[source]

Parse an individual value from the input.

The default implementation is the identity function, but subclasses should override this method for specialized parsing.

Parameters:x (str) – the value to parse.
Returns:the parsed value.
serialize(x)[source]

Opposite of parse().

Converts the value x to a string.

Parameters:x – the value to serialize.
class gokart.parameter.TaskInstanceParameter(default=<object object>, is_global=False, significant=True, description=None, config_path=None, positional=True, always_in_help=False, batch_method=None, visibility=<ParameterVisibility.PUBLIC: 0>)[source]

Bases: luigi.parameter.Parameter

parse(s)[source]

Parse an individual value from the input.

The default implementation is the identity function, but subclasses should override this method for specialized parsing.

Parameters:x (str) – the value to parse.
Returns:the parsed value.
serialize(x)[source]

Opposite of parse().

Converts the value x to a string.

Parameters:x – the value to serialize.

gokart.run module

gokart.run.run(cmdline_args=None, set_retcode=True)[source]

gokart.s3_config module

class gokart.s3_config.S3Config(*args, **kwargs)[source]

Bases: luigi.task.Config

aws_access_key_id_name = <luigi.parameter.Parameter object>
aws_secret_access_key_name = <luigi.parameter.Parameter object>
get_s3_client() → luigi.contrib.s3.S3Client[source]

gokart.target module

class gokart.target.LargeDataFrameProcessor(max_byte: int)[source]

Bases: object

static load(file_path: str) → pandas.core.frame.DataFrame[source]
save(df: pandas.core.frame.DataFrame, file_path: str)[source]
class gokart.target.ModelTarget(file_path: str, temporary_directory: str, load_function, save_function, redis_params: gokart.redis_lock.RedisParams)[source]

Bases: gokart.target.TargetOnKart

class gokart.target.SingleFileTarget(target: luigi.target.FileSystemTarget, processor: gokart.file_processor.FileProcessor, redis_params: gokart.redis_lock.RedisParams)[source]

Bases: gokart.target.TargetOnKart

class gokart.target.TargetOnKart[source]

Bases: luigi.target.Target

dump(obj, lock_at_dump: bool = True) → None[source]
exists() → bool[source]

Returns True if the Target exists and False otherwise.

last_modification_time() → datetime.datetime[source]
load() → Any[source]
path() → str[source]
remove() → None[source]
wrap_with_lock(func)[source]
gokart.target.make_model_target(file_path: str, temporary_directory: str, save_function, load_function, unique_id: Optional[str] = None, redis_params: gokart.redis_lock.RedisParams = None) → gokart.target.TargetOnKart[source]
gokart.target.make_target(file_path: str, unique_id: Optional[str] = None, processor: Optional[gokart.file_processor.FileProcessor] = None, redis_params: gokart.redis_lock.RedisParams = None, store_index_in_feather: bool = True) → gokart.target.TargetOnKart[source]

gokart.task module

class gokart.task.TaskOnKart(*args, **kwargs)[source]

Bases: luigi.task.Task

This is a wrapper class of luigi.Task.

The key methods of a TaskOnKart are:

  • make_target() - this makes output target with a relative file path.
  • make_model_target() - this makes output target for models which generate multiple files to save.
  • load() - this loads input files of this task.
  • dump() - this save a object as output of this task.
FIX_RANDOM_SEED_VALUE_NONE_MAGIC_NUMBER = -42497368
cache_unique_id = <gokart.parameter.ExplicitBoolParameter object>
clone(cls=None, **kwargs)[source]

Creates a new instance from an existing instance where some of the args have changed.

There’s at least two scenarios where this is useful (see test/clone_test.py):

  • remove a lot of boiler plate when you have recursive dependencies and lots of args
  • there’s task inheritance and some logic is on the base class
Parameters:
  • cls
  • kwargs
Returns:

complete() → bool[source]

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.

delete_unnecessary_output_files = <luigi.parameter.BoolParameter object>
dump(obj, target: Union[None, str, gokart.target.TargetOnKart] = None) → None[source]
fail_on_empty_dump = <gokart.parameter.ExplicitBoolParameter object>
fix_random_seed_methods = <luigi.parameter.ListParameter object>
fix_random_seed_value = <luigi.parameter.IntParameter object>
static get_code(target_class) → Set[str][source]
get_info(only_significant=False)[source]
get_own_code()[source]
get_processing_time() → str[source]
get_task_log() → Dict[KT, VT][source]
get_task_params() → Dict[KT, VT][source]
static is_task_on_kart(value)[source]
load(target: Union[None, str, gokart.target.TargetOnKart] = None) → Any[source]
load_data_frame(target: Union[None, str, gokart.target.TargetOnKart] = None, required_columns: Optional[Set[str]] = None, drop_columns: bool = False) → pandas.core.frame.DataFrame[source]
load_generator(target: Union[None, str, gokart.target.TargetOnKart] = None) → Any[source]
local_temporary_directory = <luigi.parameter.Parameter object>
make_large_data_frame_target(relative_file_path: str = None, use_unique_id: bool = True, max_byte=67108864) → gokart.target.TargetOnKart[source]
make_model_target(relative_file_path: str, save_function: Callable[[Any, str], None], load_function: Callable[[str], Any], use_unique_id: bool = True)[source]

Make target for models which generate multiple files in saving, e.g. gensim.Word2Vec, Tensorflow, and so on.

Parameters:
  • relative_file_path – A file path to save.
  • save_function – A function to save a model. This takes a model object and a file path.
  • load_function – A function to load a model. This takes a file path and returns a model object.
  • use_unique_id – If this is true, add an unique id to a file base name.
make_target(relative_file_path: str = None, use_unique_id: bool = True, processor: Optional[gokart.file_processor.FileProcessor] = None) → gokart.target.TargetOnKart[source]
make_task_instance_dictionary() → Dict[str, gokart.task.TaskOnKart][source]
make_unique_id()[source]
modification_time_check = <luigi.parameter.BoolParameter object>
output()[source]
redis_fail_on_collision = <luigi.parameter.BoolParameter object>
redis_host = <luigi.parameter.OptionalParameter object>
redis_port = <luigi.parameter.OptionalParameter object>
redis_timeout = <luigi.parameter.IntParameter object>
requires()[source]
rerun = <luigi.parameter.BoolParameter object>
classmethod restore(unique_id)[source]
serialized_task_definition_check = <luigi.parameter.BoolParameter object>
should_dump_supplementary_log_files = <gokart.parameter.ExplicitBoolParameter object>
significant = <luigi.parameter.BoolParameter object>
store_index_in_feather = <gokart.parameter.ExplicitBoolParameter object>
strict_check = <luigi.parameter.BoolParameter object>
static try_set_seed(methods: List[str], random_seed: int) → List[str][source]
workspace_directory = <luigi.parameter.Parameter object>

gokart.workspace_management module

gokart.workspace_management.delete_local_unnecessary_outputs(task: gokart.task.TaskOnKart)[source]

gokart.zip_client module

class gokart.zip_client.LocalZipClient(file_path: str, temporary_directory: str)[source]

Bases: gokart.zip_client.ZipClient

exists() → bool[source]
make_archive() → None[source]
path
remove() → None[source]
unpack_archive() → None[source]
class gokart.zip_client.ZipClient[source]

Bases: object

exists() → bool[source]
make_archive() → None[source]
path
remove() → None[source]
unpack_archive() → None[source]

Module contents