Skip to content

CoresetTreeServicePCA

CoresetTreeServicePCA ¤

CoresetTreeServicePCA(*, data_manager=None, data_params=None, n_instances=None, max_memory_gb=None, optimized_for, chunk_size=None, coreset_size=None, coreset_params=None, working_directory=None, cache_dir=None, node_train_function=None, node_train_function_params=None, node_metadata_func=None, save_all=None)

Bases: CoresetTreeServiceUnsupervisedMixin, CoresetTreeService

Subclass of CoresetTreeService for PCA. A service class for creating a coreset tree and working with it. optimized_for is a required parameter defining the main usage of the service: 'training' or 'cleaning'. In the 'cleaning' case, a single Coreset is built over the entire dataset. In the 'training' case, the service will decide whether to build an actual Coreset Tree or to build a single Coreset over the entire dataset, based on the triplet: n_instances, n_classes, max_memory_gb and the 'number of features' (deduced from the dataset). The chunk_size and coreset_size will be deduced based on the above triplet too. In case chunk_size and coreset_size are provided, they will override all above mentioned parameters (less recommended).

When building the Coreset, samples are selected and weights are assigned to them, therefore it is important to use functions that support the receipt of sample_weight. Sklearn's PCA implementation does not support the receipt of sample_weight, therefore, it is highly recommended to use the built-in fit or fit_transform functions of the CoresetTreeServicePCA class as they were extended to receive sample_weight.

Parameters:

Name Type Description Default
data_manager DataManagerT

DataManagerBase subclass, optional. The class used to interact with the provided data and store it locally. By default, only the sampled data is stored in HDF5 files format.

None
data_params Union[DataParams, dict]

DataParams, optional. Preprocessing information.

For Example: data_params = { 'index': {'name': 'index_column'} }

None
n_instances int

int. The total number of instances that are going to be processed (can be an estimation). This parameter is required and the only one from the above mentioned quadruplet, which isn't deduced from the data.

None
max_memory_gb int

int, optional. The maximum memory in GB that should be used. When not provided, the server's total memory is used. In any case only 80% of the provided memory or the server's total memory is considered.

None
optimized_for str

str, either 'training' or 'cleaning'. The main usage of the service.

required
chunk_size int

int, optional. The number of instances to be used when creating a coreset node in the tree. When defined, it will override the parameters of optimized_for, n_instances, n_classes and max_memory_gb. chunk_size=0: nodes are created based on input chunks

None
coreset_size Union[int, dict]

int, optional. Represents the coreset size of each node in the coreset tree. The coreset is constructed by sampling data instances from the dataset based on their calculated importance. Since each instance may be sampled more than once, in practice, the actual size of the coreset is mostly smaller than coreset_size.

None
coreset_params Union[CoresetParams, dict]

CoresetParams or dict, optional. Coreset algorithm specific parameters.

None
node_train_function Callable[[np.ndarray, np.ndarray, np.ndarray], Any]

Callable, optional. method for training model at tree node level.

None
node_train_function_params dict

dict, optional. kwargs to be used when calling node_train_function.

None
node_metadata_func Callable[[Tuple[np.ndarray], np.ndarray, Union[list, None]], Union[list, dict, None]]

callable, optional. A method for storing user meta data on each node.

None
working_directory Union[str, os.PathLike]

str, path, optional. Local directory where intermediate data is stored.

None
cache_dir Union[str, os.PathLike]

str, path, optional. For internal use when loading a saved service.

None
save_all bool

bool, optional. When set to True, the entire dataset would be saved and not only the selected samples. When optimized_for='cleaning' the default is True. When optimized_for='training' the default is False.

None

build ¤

build(X, y=None, indices=None, props=None, *, chunk_size=None, chunk_by=None, copy=False)

Create a coreset tree from the parameters X, y, indices and props (properties). build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
X Union[Iterable, Iterable[Iterable]]

array like or iterator of arrays like. An array or an iterator of features. All features must be numeric.

required
y Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator of targets. The target will be ignored when the Coreset is built.

None
indices Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator with indices of X.

None
props Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator of properties. Properties, won’t be used to compute the Coreset or train the model, but it is possible to filter_out_samples on them or to pass them in the select_from_function of get_important_samples.

None
chunk_size int

int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None
copy bool

boolean, default False. False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory).

False

Returns:

Type Description
CoresetTreeService

self

build_from_df ¤

build_from_df(datasets, target_datasets=None, *, chunk_size=None, chunk_by=None, copy=False)

Create a coreset tree from pandas DataFrame(s). build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
datasets Union[Iterator[pd.DataFrame], pd.DataFrame]

pandas DataFrame or a DataFrame iterator. Data includes features, may include labels and may include indices.

required
target_datasets Union[Iterator[pd.DataFrame], pd.DataFrame]

pandas DataFrame or a DataFrame iterator, optional. Use when data is split to features and target. Should include only one column.

None
chunk_size int

int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None
copy bool

boolean, default False. False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory).

False

Returns:

Type Description
CoresetTreeService

self

build_from_file ¤

build_from_file(file_path, target_file_path=None, *, reader_f=pd.read_csv, reader_kwargs=None, reader_chunk_size_param_name=None, chunk_size=None, chunk_by=None)

Create a coreset tree based on data taken from local storage. build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
file_path Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]

file, list of files, directory, list of directories. Path(s) to the place where data is stored. Data includes features, may include targets and may include indices.

required
target_file_path Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]

file, list of files, directory, list of directories, optional. Use when the dataset files are split to features and target. Each file should include only one column.

None
reader_f Callable

pandas like read method, optional, default pandas read_csv. For example, to read excel files use pandas read_excel.

pd.read_csv
reader_kwargs dict

dict, optional. Keyword arguments used when calling reader_f method.

None
reader_chunk_size_param_name str

str, optional. reader_f input parameter name for reading file in chunks. When not provided we'll try to figure it out our self. Based on the data, we decide on the optimal chunk size to read and use this parameter as input when calling reader_f. Use "ignore" to skip the automatic chunk reading logic.

None
chunk_size int

int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None

Returns:

Type Description
CoresetTreeService

self

filter_out_samples ¤

filter_out_samples(filter_function, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)

Remove samples from the coreset tree, based on the provided filter function. The coreset tree is automatically updated to accommodate to the changes.

Parameters:

Name Type Description Default
filter_function Callable[[Iterable, Iterable, Union[Iterable, None], Union[Iterable, None]], Iterable[Any]]

function, optional. A function that returns a list of indices to be removed from the tree. The function should accept 4 parameters as input: indices, X, y, props and return a list(iterator) of indices to be removed from the coreset tree. For example, in order to remove all instances with a target equal to 6, use the following function: filter_function = lambda indices, X, y, props : indices[y = 6].

required
force_resample_all int

int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_sensitivity_recalc int

int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_do_nothing bool

bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called.

False

fit ¤

fit(level=0, model=None, **model_params)

Fit a model on the coreset tree.

Parameters:

Name Type Description Default
level int

Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc.

0
model Any

An ML model instance, optional. When provided, model_params are not relevant. Default: instantiate the service model class using input model_params.

None
model_params

Model hyperparameters kwargs. Input when instantiating default model class.

{}

Returns:

Type Description

Fitted estimator.

get_coreset ¤

get_coreset(level=0, as_orig=False, with_index=False)

Get tree's coreset data either in a processed format or in the original format. Use the level parameter to control the level of the tree from which samples will be returned.

Parameters:

Name Type Description Default
level int

int, optional, default 0. Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc.

0
as_orig bool

boolean, optional, default False. Should the data be returned in its original format or as a tuple of indices, X, and optionally y. True: data is returned as a pandas DataFrame. False: return a tuple of (indices, X, y) if target was used and (indices, X) when there is no target.

False
with_index bool

boolean, optional, default False. Relevant only when as_orig=True. Should the returned data include the index column.

False

Returns:

Name Type Description
Dict dict

data: numpy arrays tuple (indices, X, optional y) or a pandas DataFrame. w: A numpy array of sample weights. n_represents: number of instances represented by the coreset.

get_important_samples ¤

get_important_samples(size=None, ignore_indices=None, select_from_indices=None, select_from_function=None, ignore_seen_samples=True)

Returns indices of samples in descending order of importance. Useful for identifying mislabeled instances. Either class_size (recommended) or size must be provided. Must be called after build.

Parameters:

Name Type Description Default
size int

int, optional. Number of samples to return. When class_size is provided, remaining samples are taken from classes not appearing in class_size dictionary.

None
ignore_indices Iterable

array-like, optional. An array of indices to ignore when selecting important samples.

None
select_from_indices Iterable

array-like, optional. An array of indices to consider when selecting important samples.

None
select_from_function Callable[[Iterable, Iterable, Union[Iterable, None], Union[Iterable, None]], Iterable[Any]]

function, optional. Pass a function in order to limit the selection of the important samples accordingly. The function should accept 4 parameters as input: indices, X, y, props. and return a list(iterator) of the desired indices.

None
ignore_seen_samples bool

bool, optional, default True. Exclude already seen samples and set the seen flag on any indices returned by the function.

True

Returns:

Name Type Description
Dict Union[ValueError, dict]

indices: array-like[int]. Important samples indices. X: array-like[int]. X array. y: array-like[int]. y array. importance: array-like[float]. The importance property. Instances that receive a high Importance in the Coreset computation, require attention as they usually indicate a labeling error, anomaly, out-of-distribution problem or other data-related issue.

Examples¤
Input

size=100, class_size={"class A": 10, "class B": 50, "class C": "all"}

Output

10 of "class A", 50 of "class B", 12 of "class C" (all), 28 of "class D/E"

is_dirty ¤

is_dirty()

Returns:

Type Description
bool

Indicates whether the coreset tree has nodes marked as dirty, meaning they were affected by any of the methods: remove_samples, update_targets, update_features or filter_out_samples, when they were called with force_do_nothing.

load classmethod ¤

load(dir_path, name=None, *, data_manager=None, load_buffer=True, working_directory=None)

Restore a service object from a local directory.

Parameters:

Name Type Description Default
dir_path Union[str, os.PathLike]

str, path. Local directory where service data is stored.

required
name str

string, optional, default service class name (lower case). The name prefix of the subdirectory to load. When several subdirectories having the same name prefix are found, the last one, ordered by name, is selected. For example when saving with override=False, the chosen subdirectory is the last saved.

None
data_manager DataManagerT

DataManagerBase subclass, optional. When specified, input data manger will be used instead of restoring it from the saved configuration.

None
load_buffer bool

boolean, optional, default True. If set, load saved buffer (a partial node of the tree) from disk and add it to the tree.

True
working_directory Union[str, os.PathLike]

str, path, optional, default use working_directory from saved configuration. Local directory where intermediate data is stored.

None

Returns:

Type Description
CoresetTreeService

CoresetTreeService object

partial_build ¤

partial_build(X, y=None, indices=None, props=None, *, chunk_size=None, chunk_by=None, copy=False)

Add new samples to a coreset tree from parameters X, y, indices and props (properties). All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
X Union[Iterable, Iterable[Iterable]]

array like or iterator of arrays like. An array or an iterator of features. All features must be numeric.

required
y Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator of targets. The target will be ignored when the Coreset is built.

None
indices Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator with indices of X.

None
props Union[Iterable[Any], Iterable[Iterable[Any]]]

array like or iterator of arrays like, optional. An array or an iterator of properties. Properties, won’t be used to compute the Coreset or train the model, but it is possible to filter_out_samples on them or to pass them in the select_from_function of get_important_samples.

None
chunk_size int

int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None
copy bool

boolean, default False False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory).

False

Returns:

Type Description
CoresetTreeService

self

partial_build_from_df ¤

partial_build_from_df(datasets, target_datasets=None, *, chunk_size=None, chunk_by=None, copy=False)

Add new samples to a coreset tree based on the pandas DataFrame iterator. All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
datasets Union[Iterator[pd.DataFrame], pd.DataFrame]

pandas DataFrame or a DataFrame iterator. Data includes features, may include targets and may include indices.

required
target_datasets Union[Iterator[pd.DataFrame], pd.DataFrame]

pandas DataFrame or a DataFrame iterator, optional. Use when data is split to features and target. Should include only one column.

None
chunk_size int

int, optional, default previous used chunk_size. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None
copy bool

boolean, default False. False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory).

False

Returns:

Type Description
CoresetTreeService

self

partial_build_from_file ¤

partial_build_from_file(file_path, target_file_path=None, *, reader_f=pd.read_csv, reader_kwargs=None, reader_chunk_size_param_name=None, chunk_size=None, chunk_by=None)

Add new samples to a coreset tree based on data taken from local storage. All features must be numeric. The target will be ignored when the Coreset is built.

Parameters:

Name Type Description Default
file_path Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]

file, list of files, directory, list of directories. Path(s) to the place where data is stored. Data includes features, may include targets and may include indices.

required
target_file_path Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]

file, list of files, directory, list of directories, optional. Use when files are split to features and target. Each file should include only one column.

None
reader_f Callable

pandas like read method, optional, default pandas read_csv. For example, to read excel files use pandas read_excel.

pd.read_csv
reader_kwargs dict

dict, optional. Keyword arguments used when calling reader_f method.

None
reader_chunk_size_param_name str

str, optional. reader_f input parameter name for reading file in chunks. When not provided we'll try to figure it out our self. Based on the data, we decide on the optimal chunk size to read and use this parameter as input when calling reader_f. Use "ignore" to skip the automatic chunk reading logic.

None
chunk_size int

int, optional, default previous used chunk_size. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks.

None
chunk_by Union[Callable, str, list]

function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored.

None

Returns:

Type Description
CoresetTreeService

self

plot ¤

plot(dir_path=None, name=None)

Produce a tree graph plot and save figure as a local png file.

Parameters:

Name Type Description Default
dir_path Union[str, os.PathLike]

string or PathLike. Path to save the plot figure in; if not provided, or if isn't valid/doesn't exist, the figure will be saved in the current directory (from which this method is called).

None
name str

string, optional. Name of the image file.

None

Returns:

Type Description
pathlib.Path

Image file path

predict ¤

predict(X)

Run prediction on the trained model.

Parameters:

Name Type Description Default
X Union[Iterable, Iterable[Iterable]]

An array of features.

required

Returns:

Type Description

Model prediction results.

predict_proba ¤

predict_proba(X)

Run prediction on the trained model.

Parameters:

Name Type Description Default
X Union[Iterable, Iterable[Iterable]]

An array of features.

required

Returns:

Type Description

Returns the probability of the sample for each class in the model.

print ¤

print()

Print the tree's string representation.

remove_samples ¤

remove_samples(indices, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)

Remove samples from the coreset tree. The coreset tree is automatically updated to accommodate to the changes.

Parameters:

Name Type Description Default
indices Iterable

array-like. An array of indices to be removed from the coreset tree.

required
force_resample_all int

int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_sensitivity_recalc int

int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_do_nothing bool

bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called.

False

save ¤

save(dir_path=None, name=None, save_buffer=True, override=False, allow_pickle=True)

Save service configuration and relevant data to a local directory. Use this method when the service needs to be restored.

Parameters:

Name Type Description Default
dir_path Union[str, os.PathLike]

string or PathLike, optional, default self.working_directory. A local directory for saving service's files.

None
name str

string, optional, default service class name (lower case). Name of the subdirectory where the data will be stored.

None
save_buffer bool

boolean, default True. Save also the data in the buffer (a partial node of the tree) along with the rest of the saved data.

True
override bool

bool, optional, default False. False: add a timestamp suffix so each save won’t override the previous ones. True: The existing subdirectory with the provided name is overridden.

False
allow_pickle bool

bool, optional, default True. True: Saves the Coreset tree in pickle format (much faster). False: Saves the Coreset tree in JSON format.

True

Returns:

Type Description
pathlib.Path

Save directory path.

save_coreset ¤

save_coreset(file_path, level=0, as_orig=False, with_index=False)

Get coreset from the tree and save to a file along with coreset weights. Use the level parameter to control the level of the tree from which samples will be returned.

Parameters:

Name Type Description Default
file_path Union[str, os.PathLike]

string or PathLike. Local file path to store the coreset.

required
level int

int, optional, default 0. Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc.

0
as_orig bool

boolean, optional, default False. True: save in the original format. False: save in a processed format (indices, X, y, weight).

False
with_index bool

boolean, optional, default False. Relevant only when as_orig=True. Save also index column.

False

set_seen_indication ¤

set_seen_indication(seen_flag=True, indices=None)

Set samples as 'seen' or 'unseen'. Not providing an indices list defaults to setting the flag on all samples.

Parameters:

Name Type Description Default
seen_flag bool

bool, optional, default True. Set 'seen' or 'unseen' flag

True
indices Iterable

array like, optional. Set flag only for the provided list of indices. Defaults to all indices.

None

update_dirty ¤

update_dirty(force_resample_all=None, force_sensitivity_recalc=None)

Calculate the sensitivity and resample the nodes that were marked as dirty, meaning they were affected by any of the methods: remove_samples, update_targets, update_features or filter_out_samples, when they were called with force_do_nothing.

Parameters:

Name Type Description Default
force_resample_all int

int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_sensitivity_recalc int

int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None

update_features ¤

update_features(indices, X, feature_names=None, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)

Update the features for selected samples on the coreset tree. The coreset tree is automatically updated to accommodate to the changes.

Parameters:

Name Type Description Default
indices Iterable

array-like. An array of indices to be updated.

required
X Iterable

array-like. An array of features. Should have the same length as indices.

required
feature_names Iterable[str]

If the quantity of features in X is not equal to the quantity of features in the original coreset, this param should contain list of names of passed features.

None
force_resample_all int

int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_sensitivity_recalc int

int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_do_nothing bool

bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called.

False

update_targets ¤

update_targets(indices, y, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)

Update the targets for selected samples on the coreset tree. The coreset tree is automatically updated to accommodate to the changes.

Parameters:

Name Type Description Default
indices Iterable

array-like. An array of indices to be updated.

required
y Iterable

array-like. An array of classes/labels. Should have the same length as indices.

required
force_resample_all int

int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_sensitivity_recalc int

int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level.

None
force_do_nothing bool

bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called.

False
Back to top