gokart package
Submodules
gokart.file_processor module
File processor module with support for multiple DataFrame backends.
- class gokart.file_processor.BinaryFileProcessor[source]
Bases:
FileProcessorPass bytes to this processor
` figure_binary = io.BytesIO() plt.savefig(figure_binary) figure_binary.seek(0) BinaryFileProcessor().dump(figure_binary.read()) `
- class gokart.file_processor.CsvFileProcessor(sep: str = ',', encoding: str = 'utf-8', dataframe_type: Literal['pandas', 'polars', 'polars-lazy'] = 'pandas')[source]
Bases:
FileProcessorCSV file processor with automatic backend selection based on dataframe_type.
- class gokart.file_processor.FeatherFileProcessor(store_index_in_feather: bool, dataframe_type: Literal['pandas', 'polars', 'polars-lazy'] = 'pandas')[source]
Bases:
FileProcessorFeather file processor with automatic backend selection based on dataframe_type.
- class gokart.file_processor.GzipFileProcessor[source]
Bases:
FileProcessor
- class gokart.file_processor.JsonFileProcessor(orient: Literal['split', 'records', 'index', 'table', 'columns', 'values'] | None = None, dataframe_type: Literal['pandas', 'polars', 'polars-lazy'] = 'pandas')[source]
Bases:
FileProcessorJSON file processor with automatic backend selection based on dataframe_type.
- class gokart.file_processor.NpzFileProcessor[source]
Bases:
FileProcessor
- class gokart.file_processor.ParquetFileProcessor(engine: Any = 'pyarrow', compression: Any = None, dataframe_type: Literal['pandas', 'polars', 'polars-lazy'] = 'pandas')[source]
Bases:
FileProcessorParquet file processor with automatic backend selection based on dataframe_type.
- class gokart.file_processor.PickleFileProcessor[source]
Bases:
FileProcessor
- class gokart.file_processor.TextFileProcessor[source]
Bases:
FileProcessor
- class gokart.file_processor.XmlFileProcessor[source]
Bases:
FileProcessor
- gokart.file_processor.make_file_processor(file_path: str, store_index_in_feather: bool = True, *, dataframe_type: Literal['pandas', 'polars', 'polars-lazy'] = 'pandas') FileProcessor[source]
Create a file processor based on file extension with default parameters.
gokart.info module
- gokart.info.make_tree_info(task: TaskOnKart[Any], indent: str = '', last: bool = True, details: bool = False, abbr: bool = True, visited_tasks: set[str] | None = None, ignore_task_names: list[str] | None = None) str[source]
Return a string representation of the tasks, their statuses/parameters in a dependency tree format
This function has moved to gokart.tree.task_info.make_task_info_as_tree_str. This code is remained for backward compatibility.
Parameters
- task: TaskOnKart
Root task.
- details: bool
Whether or not to output details.
- abbr: bool
Whether or not to simplify tasks information that has already appeared.
- ignore_task_names: list[str] | None
List of task names to ignore.
Returns
- tree_infostr
Formatted task dependency tree.
gokart.parameter module
- class gokart.parameter.ListTaskInstanceParameter(expected_elements_type: type[~gokart.parameter.TASK_ON_KART_TYPE] | None = None, default: list[~gokart.parameter.TASK_ON_KART_TYPE] | ~luigi.parameter._NoValueType = <no_value>, **kwargs: ~typing.Unpack[~gokart.parameter.ParameterKwargs])[source]
Bases:
Parameter[list[TASK_ON_KART_TYPE]],Generic[TASK_ON_KART_TYPE]
- class gokart.parameter.ParameterKwargs[source]
Bases:
TypedDict- always_in_help: bool
- batch_method: Callable[[Iterable[Any]], Any] | None
- config_path: ConfigPath | None
- description: str | None
- positional: bool
- significant: bool
- visibility: ParameterVisibility
- class gokart.parameter.Serializable(*args, **kwargs)[source]
Bases:
Protocol
- class gokart.parameter.SerializableParameter(object_type: type[S], *args: Any, **kwargs: Any)[source]
Bases:
Parameter[S],Generic[S]
- class gokart.parameter.TaskInstanceParameter(expected_type: type[~gokart.parameter.TASK_ON_KART_TYPE] | None = None, default: ~gokart.parameter.TASK_ON_KART_TYPE | ~luigi.parameter._NoValueType = <no_value>, **kwargs: ~typing.Unpack[~gokart.parameter.ParameterKwargs])[source]
Bases:
Parameter[TASK_ON_KART_TYPE],Generic[TASK_ON_KART_TYPE]- expected_type: type
- class gokart.parameter.ZonedDateSecondParameter(**kwargs)[source]
Bases:
Parameter[datetime]ZonedDateSecondParameter supports a datetime.datetime object with timezone information.
A ZonedDateSecondParameter is a ISO 8601 formatted date, time specified to the second and timezone. For example,
2013-07-10T19:07:38+09:00specifies July 10, 2013 at 19:07:38 +09:00. The separator : can be omitted for Python3.11 and later.- normalize(x)[source]
Given a parsed parameter value, normalizes it.
The value can either be the result of parse(), the default value or arguments passed into the task’s constructor by instantiation.
This is very implementation defined, but can be used to validate/clamp valid values. For example, if you wanted to only accept even integers, and “correct” odd values to the nearest integer, you can implement normalize as
x // 2 * 2.
gokart.run module
gokart.s3_config module
- class gokart.s3_config.S3Config(*args, **kwargs)[source]
Bases:
Config- aws_access_key_id_name: luigi.Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- aws_secret_access_key_name: luigi.Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
gokart.target module
- class gokart.target.ModelTarget(file_path: str, temporary_directory: str, load_function: Any, save_function: Any, task_lock_params: TaskLockParams)[source]
Bases:
TargetOnKart
- class gokart.target.SingleFileTarget(target: FileSystemTarget, processor: FileProcessor, task_lock_params: TaskLockParams)[source]
Bases:
TargetOnKart
- class gokart.target.TargetOnKart[source]
Bases:
Target
- gokart.target.make_model_target(file_path: str, temporary_directory: str, save_function: Any, load_function: Any, unique_id: str | None = None, task_lock_params: TaskLockParams | None = None) TargetOnKart[source]
- gokart.target.make_target(file_path: str, unique_id: str | None = None, processor: FileProcessor | None = None, task_lock_params: TaskLockParams | None = None, store_index_in_feather: bool = True) TargetOnKart[source]
gokart.task module
- exception gokart.task.EmptyDumpError[source]
Bases:
AssertionErrorRaised when the task attempts to dump an empty DataFrame even though it is prohibited (
fail_on_empty_dumpis set to True)
- class gokart.task.TaskOnKart(*args, **kwargs)[source]
Bases:
Task,Generic[T]This is a wrapper class of luigi.Task.
The key methods of a TaskOnKart are:
make_target()- this makes output target with a relative file path.make_model_target()- this makes output target for models which generate multiple files to save.load()- this loads input files of this task.dump()- this save a object as output of this task.
- FIX_RANDOM_SEED_VALUE_NONE_MAGIC_NUMBER = -42497368
- cache_unique_id: Parameter[bool]
- clone(cls=None, **kwargs)[source]
Creates a new instance from an existing instance where some of the args have changed.
There’s at least two scenarios where this is useful (see test/clone_test.py):
remove a lot of boiler plate when you have recursive dependencies and lots of args
there’s task inheritance and some logic is on the base class
- Parameters:
cls
kwargs
- Returns:
- complete() bool[source]
If the task has any outputs, return
Trueif all outputs exist. Otherwise, returnFalse.However, you may freely override this method with custom logic.
- complete_check_at_run: Parameter[bool]
- delete_unnecessary_output_files: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- dump(obj: T, target: None = None, custom_labels: dict[Any, Any] | None = None) None[source]
- dump(obj: Any, target: str | TargetOnKart, custom_labels: dict[Any, Any] | None = None) None
- fail_on_empty_dump: Parameter[bool]
- fix_random_seed_methods: Parameter[tuple[str, ...]]
Parameter whose value is a
list.In the task definition, use
class MyTask(luigi.Task): grades = luigi.ListParameter() def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
At the command line, use
$ luigi --module my_tasks MyTask --grades <JSON string>
Simple example with two grades:
$ luigi --module my_tasks MyTask --grades '[100,70]'
It is possible to provide a JSON schema that should be validated by the given value:
class MyTask(luigi.Task): grades = luigi.ListParameter( schema={ "type": "array", "items": { "type": "number", "minimum": 0, "maximum": 10 }, "minItems": 1 } ) def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
Using this schema, the following command will work:
$ luigi --module my_tasks MyTask --numbers '[1, 8.7, 6]'
while these commands will fail because the parameter is not valid:
$ luigi --module my_tasks MyTask --numbers '[]' # must have at least 1 element $ luigi --module my_tasks MyTask --numbers '[-999, 999]' # elements must be in [0, 10]
Finally, the provided schema can be a custom validator:
custom_validator = jsonschema.Draft4Validator( schema={ "type": "array", "items": { "type": "number", "minimum": 0, "maximum": 10 }, "minItems": 1 } ) class MyTask(luigi.Task): grades = luigi.ListParameter(schema=custom_validator) def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
- fix_random_seed_value: Parameter[int]
Parameter whose value is an
int.
- input() TargetOnKart | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
Returns the outputs of the Tasks returned by
requires()See Task.input
- Returns:
a list of
Targetobjects which are specified as outputs of all required Tasks.
- load(target: None | str | TargetOnKart = None) Any[source]
- load(target: TaskOnKart[K]) K
- load(target: list[TaskOnKart[K]]) list[K]
- load_generator(target: None | str | TargetOnKart = None) Generator[Any, None, None][source]
- load_generator(target: list[TaskOnKart[K]]) Generator[K, None, None]
- local_temporary_directory: Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- make_large_data_frame_target(relative_file_path: str | None = None, use_unique_id: bool = True, max_byte: int = 67108864) TargetOnKart[source]
- make_model_target(relative_file_path: str, save_function: Callable[[Any, str], None], load_function: Callable[[str], Any], use_unique_id: bool = True) TargetOnKart[source]
Make target for models which generate multiple files in saving, e.g. gensim.Word2Vec, Tensorflow, and so on.
- Parameters:
relative_file_path – A file path to save.
save_function – A function to save a model. This takes a model object and a file path.
load_function – A function to load a model. This takes a file path and returns a model object.
use_unique_id – If this is true, add an unique id to a file base name.
- make_target(relative_file_path: str | None = None, use_unique_id: bool = True, processor: FileProcessor | None = None) TargetOnKart[source]
- make_task_instance_dictionary() dict[str, TaskOnKart[Any]][source]
- modification_time_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- output() TargetOnKart | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
- property priority
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- redis_host: Parameter[str | None]
Class to parse optional parameters.
- redis_port: OptionalIntParameter
Class to parse optional int parameters.
- redis_timeout: IntParameter
Parameter whose value is an
int.
- requires() TaskOnKart[Any] | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
- rerun: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- serialized_task_definition_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- should_dump_supplementary_log_files: Parameter[bool]
- should_lock_run: Parameter[bool]
- significant: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- store_index_in_feather: Parameter[bool]
- strict_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- workspace_directory: Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
gokart.workspace_management module
- gokart.workspace_management.delete_local_unnecessary_outputs(task: TaskOnKart[Any]) None[source]
gokart.zip_client module
Module contents
- class gokart.ListTaskInstanceParameter(expected_elements_type: type[~gokart.parameter.TASK_ON_KART_TYPE] | None = None, default: list[~gokart.parameter.TASK_ON_KART_TYPE] | ~luigi.parameter._NoValueType = <no_value>, **kwargs: ~typing.Unpack[~gokart.parameter.ParameterKwargs])[source]
Bases:
Parameter[list[TASK_ON_KART_TYPE]],Generic[TASK_ON_KART_TYPE]- expected_elements_type: type
- class gokart.SerializableParameter(object_type: type[S], *args: Any, **kwargs: Any)[source]
Bases:
Parameter[S],Generic[S]
- class gokart.TaskInstanceParameter(expected_type: type[~gokart.parameter.TASK_ON_KART_TYPE] | None = None, default: ~gokart.parameter.TASK_ON_KART_TYPE | ~luigi.parameter._NoValueType = <no_value>, **kwargs: ~typing.Unpack[~gokart.parameter.ParameterKwargs])[source]
Bases:
Parameter[TASK_ON_KART_TYPE],Generic[TASK_ON_KART_TYPE]- expected_type: type
- class gokart.TaskOnKart(*args, **kwargs)[source]
Bases:
Task,Generic[T]This is a wrapper class of luigi.Task.
The key methods of a TaskOnKart are:
make_target()- this makes output target with a relative file path.make_model_target()- this makes output target for models which generate multiple files to save.load()- this loads input files of this task.dump()- this save a object as output of this task.
- FIX_RANDOM_SEED_VALUE_NONE_MAGIC_NUMBER = -42497368
- cache_unique_id: Parameter[bool]
- clone(cls=None, **kwargs)[source]
Creates a new instance from an existing instance where some of the args have changed.
There’s at least two scenarios where this is useful (see test/clone_test.py):
remove a lot of boiler plate when you have recursive dependencies and lots of args
there’s task inheritance and some logic is on the base class
- Parameters:
cls
kwargs
- Returns:
- complete() bool[source]
If the task has any outputs, return
Trueif all outputs exist. Otherwise, returnFalse.However, you may freely override this method with custom logic.
- complete_check_at_run: Parameter[bool]
- delete_unnecessary_output_files: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- dump(obj: T, target: None = None, custom_labels: dict[Any, Any] | None = None) None[source]
- dump(obj: Any, target: str | TargetOnKart, custom_labels: dict[Any, Any] | None = None) None
- fail_on_empty_dump: Parameter[bool]
- fix_random_seed_methods: Parameter[tuple[str, ...]]
Parameter whose value is a
list.In the task definition, use
class MyTask(luigi.Task): grades = luigi.ListParameter() def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
At the command line, use
$ luigi --module my_tasks MyTask --grades <JSON string>
Simple example with two grades:
$ luigi --module my_tasks MyTask --grades '[100,70]'
It is possible to provide a JSON schema that should be validated by the given value:
class MyTask(luigi.Task): grades = luigi.ListParameter( schema={ "type": "array", "items": { "type": "number", "minimum": 0, "maximum": 10 }, "minItems": 1 } ) def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
Using this schema, the following command will work:
$ luigi --module my_tasks MyTask --numbers '[1, 8.7, 6]'
while these commands will fail because the parameter is not valid:
$ luigi --module my_tasks MyTask --numbers '[]' # must have at least 1 element $ luigi --module my_tasks MyTask --numbers '[-999, 999]' # elements must be in [0, 10]
Finally, the provided schema can be a custom validator:
custom_validator = jsonschema.Draft4Validator( schema={ "type": "array", "items": { "type": "number", "minimum": 0, "maximum": 10 }, "minItems": 1 } ) class MyTask(luigi.Task): grades = luigi.ListParameter(schema=custom_validator) def run(self): sum = 0 for element in self.grades: sum += element avg = sum / len(self.grades)
- fix_random_seed_value: Parameter[int]
Parameter whose value is an
int.
- input() TargetOnKart | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
Returns the outputs of the Tasks returned by
requires()See Task.input
- Returns:
a list of
Targetobjects which are specified as outputs of all required Tasks.
- load(target: None | str | TargetOnKart = None) Any[source]
- load(target: TaskOnKart[K]) K
- load(target: list[TaskOnKart[K]]) list[K]
- load_generator(target: None | str | TargetOnKart = None) Generator[Any, None, None][source]
- load_generator(target: list[TaskOnKart[K]]) Generator[K, None, None]
- local_temporary_directory: Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- make_large_data_frame_target(relative_file_path: str | None = None, use_unique_id: bool = True, max_byte: int = 67108864) TargetOnKart[source]
- make_model_target(relative_file_path: str, save_function: Callable[[Any, str], None], load_function: Callable[[str], Any], use_unique_id: bool = True) TargetOnKart[source]
Make target for models which generate multiple files in saving, e.g. gensim.Word2Vec, Tensorflow, and so on.
- Parameters:
relative_file_path – A file path to save.
save_function – A function to save a model. This takes a model object and a file path.
load_function – A function to load a model. This takes a file path and returns a model object.
use_unique_id – If this is true, add an unique id to a file base name.
- make_target(relative_file_path: str | None = None, use_unique_id: bool = True, processor: FileProcessor | None = None) TargetOnKart[source]
- make_task_instance_dictionary() dict[str, TaskOnKart[Any]][source]
- modification_time_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- output() TargetOnKart | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
- property priority
int([x]) -> integer int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments are given. If x is a number, return x.__int__(). For floating-point numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string, bytes, or bytearray instance representing an integer literal in the given base. The literal can be preceded by ‘+’ or ‘-’ and be surrounded by whitespace. The base defaults to 10. Valid bases are 0 and 2-36. Base 0 means to interpret the base from the string as an integer literal. >>> int(‘0b100’, base=0) 4
- redis_host: Parameter[str | None]
Class to parse optional parameters.
- redis_port: OptionalIntParameter
Class to parse optional int parameters.
- redis_timeout: IntParameter
Parameter whose value is an
int.
- requires() TaskOnKart[Any] | list[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]] | tuple[T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]], ...] | dict[str, T | list[FlattenableItems[T]] | tuple[FlattenableItems[T], ...] | dict[str, FlattenableItems[T]]][source]
- rerun: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- serialized_task_definition_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- should_dump_supplementary_log_files: Parameter[bool]
- should_lock_run: Parameter[bool]
- significant: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- store_index_in_feather: Parameter[bool]
- strict_check: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.
- workspace_directory: Parameter[str]
Parameter whose value is a
str, and a base class for other parameter types.Parameters are objects set on the Task class level to make it possible to parameterize tasks. For instance:
class MyTask(luigi.Task): foo = luigi.Parameter() class RequiringTask(luigi.Task): def requires(self): return MyTask(foo="hello") def run(self): print(self.requires().foo) # prints "hello"
This makes it possible to instantiate multiple tasks, eg
MyTask(foo='bar')andMyTask(foo='baz'). The task will then have thefooattribute set appropriately.When a task is instantiated, it will first use any argument as the value of the parameter, eg. if you instantiate
a = TaskA(x=44)thena.x == 44. When the value is not provided, the value will be resolved in this order of falling priority:Any value provided on the command line:
To the root task (eg.
--param xyz)Then to the class, using the qualified task name syntax (eg.
--TaskA-param xyz).
With
[TASK_NAME]>PARAM_NAME: <serialized value>syntax. See ParamConfigIngestionAny default value set using the
defaultflag.
Parameter objects may be reused, but you must then set the
positional=Falseflag.
- class gokart.ZonedDateSecondParameter(**kwargs)[source]
Bases:
Parameter[datetime]ZonedDateSecondParameter supports a datetime.datetime object with timezone information.
A ZonedDateSecondParameter is a ISO 8601 formatted date, time specified to the second and timezone. For example,
2013-07-10T19:07:38+09:00specifies July 10, 2013 at 19:07:38 +09:00. The separator : can be omitted for Python3.11 and later.- normalize(x)[source]
Given a parsed parameter value, normalizes it.
The value can either be the result of parse(), the default value or arguments passed into the task’s constructor by instantiation.
This is very implementation defined, but can be used to validate/clamp valid values. For example, if you wanted to only accept even integers, and “correct” odd values to the nearest integer, you can implement normalize as
x // 2 * 2.
- gokart.build(task: TaskOnKart[T], return_value: Literal[True] = True, reset_register: bool = True, log_level: int = logging.ERROR, task_lock_exception_max_tries: int = 10, task_lock_exception_max_wait_seconds: int = 600, **env_params: Any) T[source]
- gokart.build(task: TaskOnKart[T], return_value: Literal[False], reset_register: bool = True, log_level: int = logging.ERROR, task_lock_exception_max_tries: int = 10, task_lock_exception_max_wait_seconds: int = 600, **env_params: Any) None
Run gokart task for local interpreter. Sharing the most of its parameters with luigi.build (see https://luigi.readthedocs.io/en/stable/api/luigi.html?highlight=build#luigi.build)
- gokart.delete_local_unnecessary_outputs(task: TaskOnKart[Any]) None[source]
- gokart.make_task_info_as_tree_str(task: TaskOnKart[Any], details: bool = False, abbr: bool = True, ignore_task_names: list[str] | None = None) str[source]
Return a string representation of the tasks, their statuses/parameters in a dependency tree format
Parameters
- task: TaskOnKart
Root task.
- details: bool
Whether or not to output details.
- abbr: bool
Whether or not to simplify tasks information that has already appeared.
- ignore_task_names: list[str] | None
List of task names to ignore.
Returns
- tree_infostr
Formatted task dependency tree.
- gokart.make_tree_info(task: TaskOnKart[Any], indent: str = '', last: bool = True, details: bool = False, abbr: bool = True, visited_tasks: set[str] | None = None, ignore_task_names: list[str] | None = None) str[source]
Return a string representation of the tasks, their statuses/parameters in a dependency tree format
This function has moved to gokart.tree.task_info.make_task_info_as_tree_str. This code is remained for backward compatibility.
Parameters
- task: TaskOnKart
Root task.
- details: bool
Whether or not to output details.
- abbr: bool
Whether or not to simplify tasks information that has already appeared.
- ignore_task_names: list[str] | None
List of task names to ignore.
Returns
- tree_infostr
Formatted task dependency tree.
- class gokart.test_run(*args, **kwargs)[source]
Bases:
TaskOnKart[Any]- namespace: OptionalStrParameter
Class to parse optional str parameters.
- pandas: BoolParameter
A Parameter whose value is a
bool. This parameter has an implicit default value ofFalse. For the command line interface this means that the value isFalseunless you add"--the-bool-parameter"to your command without giving a parameter value. This is considered implicit parsing (the default). However, in some situations one might want to give the explicit bool value ("--the-bool-parameter true|false"), e.g. when you configure the default value to beTrue. This is called explicit parsing. When omitting the parameter value, it is still consideredTruebut to avoid ambiguities during argument parsing, make sure to always place bool parameters behind the task family on the command line when using explicit parsing.You can toggle between the two parsing modes on a per-parameter base via
class MyTask(luigi.Task): implicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.IMPLICIT_PARSING) explicit_bool = luigi.BoolParameter(parsing=luigi.BoolParameter.EXPLICIT_PARSING)
or globally by
luigi.BoolParameter.parsing = luigi.BoolParameter.EXPLICIT_PARSING
for all bool parameters instantiated after this line.