CoresetTreeServiceLR
CoresetTreeServiceLR ¤
CoresetTreeServiceLR(*, data_manager=None, data_params=None, n_instances=None, max_memory_gb=None, optimized_for, chunk_size=None, coreset_size=None, coreset_params=None, working_directory=None, cache_dir=None, node_train_function=None, node_train_function_params=None, node_metadata_func=None, save_all=None)
Bases: CoresetTreeServiceSupervisedMixin
, CoresetTreeService
Subclass of CoresetTreeService for Linear Regression. A service class for creating a coreset tree and working with it. optimized_for is a required parameter defining the main usage of the service: 'training' or 'cleaning'. In the 'cleaning' case, a single Coreset is built over the entire dataset. In the 'training' case, the service will decide whether to build an actual Coreset Tree or to build a single Coreset over the entire dataset, based on the triplet: n_instances, max_memory_gb and the 'number of features' (deduced from the dataset). The chunk_size and coreset_size will be deduced based on the above triplet too. In case chunk_size and coreset_size are provided, they will override all above mentioned parameters (less recommended).
If you intend on using Lasso or Ridge and intend on passing alpha to your linear regressor to control the regularization strength, it should be adjusted as specified here when used on the Coreset: alpha = alpha * float(len(X) / np.sum(weights)). When calling the class' fit function, alpha is already adjusted accordingly and no further action is required.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data_manager |
DataManagerT
|
DataManagerBase subclass, optional. The class used to interact with the provided data and store it locally. By default, only the sampled data is stored in HDF5 files format. |
None
|
data_params |
Union[DataParams, dict]
|
DataParams, optional. Preprocessing information. For Example: data_params = { 'target': {'name': 'interest_rate'}, 'index': {'name': 'index_column'} } |
None
|
n_instances |
int
|
int. The total number of instances that are going to be processed (can be an estimation). This parameter is required and the only one from the above mentioned quadruplet, which isn't deduced from the data. |
None
|
max_memory_gb |
int
|
int, optional. The maximum memory in GB that should be used. When not provided, the server's total memory is used. In any case only 80% of the provided memory or the server's total memory is considered. |
None
|
optimized_for |
str
|
str, either 'training' or 'cleaning'. The main usage of the service. |
required |
chunk_size |
int
|
int, optional. The number of instances to be used when creating a coreset node in the tree. When defined, it will override the parameters of optimized_for, n_instances, n_classes and max_memory_gb. chunk_size=0: nodes are created based on input chunks |
None
|
coreset_size |
Union[int, dict]
|
int, optional. Represents the coreset size of each node in the coreset tree. The coreset is constructed by sampling data instances from the dataset based on their calculated importance. Since each instance may be sampled more than once, in practice, the actual size of the coreset is mostly smaller than coreset_size. |
None
|
coreset_params |
Union[CoresetParams, dict]
|
CoresetParams or dict, optional. Coreset algorithm specific parameters. |
None
|
node_train_function |
Callable[[np.ndarray, np.ndarray, np.ndarray], Any]
|
Callable, optional. method for training model at tree node level. |
None
|
node_train_function_params |
dict
|
dict, optional. kwargs to be used when calling node_train_function. |
None
|
node_metadata_func |
Callable[[Tuple[np.ndarray], np.ndarray, Union[list, None]], Union[list, dict, None]]
|
callable, optional. A method for storing user meta data on each node. |
None
|
working_directory |
Union[str, os.PathLike]
|
str, path, optional. Local directory where intermediate data is stored. |
None
|
cache_dir |
Union[str, os.PathLike]
|
str, path, optional. For internal use when loading a saved service. |
None
|
save_all |
bool
|
bool, optional. When set to True, the entire dataset would be saved and not only the selected samples. When optimized_for='cleaning' the default is True. When optimized_for='training' the default is False. |
None
|
build ¤
build(X, y, indices=None, props=None, *, chunk_size=None, chunk_by=None, copy=False)
Create a coreset tree from the parameters X, y, indices and props (properties). build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[Iterable, Iterable[Iterable]]
|
array like or iterator of arrays like. An array or an iterator of features. All features must be numeric. |
required |
y |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like. An array or an iterator of targets. |
required |
indices |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like, optional. An array or an iterator with indices of X. |
None
|
props |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like, optional. An array or an iterator of properties. Properties, won’t be used to compute the Coreset or train the model, but it is possible to filter_out_samples on them or to pass them in the select_from_function of get_important_samples. |
None
|
chunk_size |
int
|
int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks. |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
copy |
bool
|
boolean, default False. False (default) - input data might be updated as result a consequence action like update_targets or update_features True - Data is copied before processing (impacts memory). |
False
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
build_from_df ¤
build_from_df(datasets, target_datasets=None, *, chunk_size=None, chunk_by=None, copy=False)
Create a coreset tree from pandas DataFrame(s). build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric. The target will be ignored when the Coreset is built.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterator[pd.DataFrame], pd.DataFrame]
|
pandas DataFrame or a DataFrame iterator. Data includes features, may include labels and may include indices. |
required |
target_datasets |
Union[Iterator[pd.DataFrame], pd.DataFrame]
|
pandas DataFrame or a DataFrame iterator, optional. Use when data is split to features and target. Should include only one column. |
None
|
chunk_size |
int
|
int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks. |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
copy |
bool
|
boolean, default False. False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory). |
False
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
build_from_file ¤
build_from_file(file_path, target_file_path=None, *, reader_f=pd.read_csv, reader_kwargs=None, reader_chunk_size_param_name=None, chunk_size=None, chunk_by=None)
Create a coreset tree based on data taken from local storage. build functions may be called only once. To add more data to the coreset tree use one of the partial_build functions. All features must be numeric. The target will be ignored when the Coreset is built.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]
|
file, list of files, directory, list of directories. Path(s) to the place where data is stored. Data includes features, may include targets and may include indices. |
required |
target_file_path |
Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]
|
file, list of files, directory, list of directories, optional. Use when the dataset files are split to features and target. Each file should include only one column. |
None
|
reader_f |
Callable
|
pandas like read method, optional, default pandas read_csv. For example, to read excel files use pandas read_excel. |
pd.read_csv
|
reader_kwargs |
dict
|
dict, optional. Keyword arguments used when calling reader_f method. |
None
|
reader_chunk_size_param_name |
str
|
str, optional. reader_f input parameter name for reading file in chunks. When not provided we'll try to figure it out our self. Based on the data, we decide on the optimal chunk size to read and use this parameter as input when calling reader_f. Use "ignore" to skip the automatic chunk reading logic. |
None
|
chunk_size |
int
|
int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks. |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
filter_out_samples ¤
filter_out_samples(filter_function, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)
Remove samples from the coreset tree, based on the provided filter function. The coreset tree is automatically updated to accommodate to the changes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filter_function |
Callable[[Iterable, Iterable, Union[Iterable, None], Union[Iterable, None]], Iterable[Any]]
|
function, optional. A function that returns a list of indices to be removed from the tree. The function should accept 4 parameters as input: indices, X, y, props and return a list(iterator) of indices to be removed from the coreset tree. For example, in order to remove all instances with a target equal to 6, use the following function: filter_function = lambda indices, X, y, props : indices[y = 6]. |
required |
force_resample_all |
int
|
int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_sensitivity_recalc |
int
|
int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_do_nothing |
bool
|
bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called. |
False
|
fit ¤
fit(level=0, model=None, **model_params)
Fit a model on the coreset tree.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
level |
int
|
Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc. |
0
|
model |
Any
|
An ML model instance, optional. When provided, model_params are not relevant. Default: instantiate the service model class using input model_params. |
None
|
model_params |
Model hyperparameters kwargs. Input when instantiating default model class. |
{}
|
Returns:
Type | Description |
---|---|
Fitted estimator. |
get_coreset ¤
get_coreset(level=0, as_orig=False, with_index=False)
Get tree's coreset data either in a processed format or in the original format. Use the level parameter to control the level of the tree from which samples will be returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
level |
int
|
int, optional, default 0. Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc. |
0
|
as_orig |
bool
|
boolean, optional, default False. Should the data be returned in its original format or as a tuple of indices, X, and optionally y. True: data is returned as a pandas DataFrame. False: return a tuple of (indices, X, y) if target was used and (indices, X) when there is no target. |
False
|
with_index |
bool
|
boolean, optional, default False. Relevant only when as_orig=True. Should the returned data include the index column. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Dict |
dict
|
data: numpy arrays tuple (indices, X, optional y) or a pandas DataFrame. w: A numpy array of sample weights. n_represents: number of instances represented by the coreset. |
get_important_samples ¤
get_important_samples(size=None, ignore_indices=None, select_from_indices=None, select_from_function=None, ignore_seen_samples=True)
Returns indices of samples in descending order of importance. Useful for identifying mislabeled instances. Either class_size (recommended) or size must be provided. Must be called after build.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size |
int
|
int, optional. Number of samples to return. When class_size is provided, remaining samples are taken from classes not appearing in class_size dictionary. |
None
|
ignore_indices |
Iterable
|
array-like, optional. An array of indices to ignore when selecting important samples. |
None
|
select_from_indices |
Iterable
|
array-like, optional. An array of indices to consider when selecting important samples. |
None
|
select_from_function |
Callable[[Iterable, Iterable, Union[Iterable, None], Union[Iterable, None]], Iterable[Any]]
|
function, optional. Pass a function in order to limit the selection of the important samples accordingly. The function should accept 4 parameters as input: indices, X, y, props. and return a list(iterator) of the desired indices. |
None
|
ignore_seen_samples |
bool
|
bool, optional, default True. Exclude already seen samples and set the seen flag on any indices returned by the function. |
True
|
Returns:
Name | Type | Description |
---|---|---|
Dict |
Union[ValueError, dict]
|
indices: array-like[int]. Important samples indices. X: array-like[int]. X array. y: array-like[int]. y array. importance: array-like[float]. The importance property. Instances that receive a high Importance in the Coreset computation, require attention as they usually indicate a labeling error, anomaly, out-of-distribution problem or other data-related issue. |
Examples¤
Input
size=100, class_size={"class A": 10, "class B": 50, "class C": "all"}
Output
10 of "class A", 50 of "class B", 12 of "class C" (all), 28 of "class D/E"
is_dirty ¤
is_dirty()
Returns:
Type | Description |
---|---|
bool
|
Indicates whether the coreset tree has nodes marked as dirty, meaning they were affected by any of the methods: remove_samples, update_targets, update_features or filter_out_samples, when they were called with force_do_nothing. |
load
classmethod
¤
load(dir_path, name=None, *, data_manager=None, load_buffer=True, working_directory=None)
Restore a service object from a local directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path |
Union[str, os.PathLike]
|
str, path. Local directory where service data is stored. |
required |
name |
str
|
string, optional, default service class name (lower case). The name prefix of the subdirectory to load. When several subdirectories having the same name prefix are found, the last one, ordered by name, is selected. For example when saving with override=False, the chosen subdirectory is the last saved. |
None
|
data_manager |
DataManagerT
|
DataManagerBase subclass, optional. When specified, input data manger will be used instead of restoring it from the saved configuration. |
None
|
load_buffer |
bool
|
boolean, optional, default True. If set, load saved buffer (a partial node of the tree) from disk and add it to the tree. |
True
|
working_directory |
Union[str, os.PathLike]
|
str, path, optional, default use working_directory from saved configuration. Local directory where intermediate data is stored. |
None
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
CoresetTreeService object |
partial_build ¤
partial_build(X, y, indices=None, props=None, *, chunk_size=None, chunk_by=None, copy=False)
Add new samples to a coreset tree from parameters X, y, indices and props (properties). All features must be numeric.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[Iterable, Iterable[Iterable]]
|
array like or iterator of arrays like. An array or an iterator of features. All features must be numeric. |
required |
y |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like. An array or an iterator of targets. |
required |
indices |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like, optional. An array or an iterator with indices of X. |
None
|
props |
Union[Iterable[Any], Iterable[Iterable[Any]]]
|
array like or iterator of arrays like, optional. An array or an iterator of properties. Properties, won’t be used to compute the Coreset or train the model, but it is possible to filter_out_samples on them or to pass them in the select_from_function of get_important_samples. |
None
|
chunk_size |
int
|
int, optional. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
copy |
bool
|
boolean, default False. False (default) - input data might be updated as result a consequence action like update_targets or update_features. True - Data is copied before processing (impacts memory). |
False
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
partial_build_from_df ¤
partial_build_from_df(datasets, target_datasets=None, *, chunk_size=None, chunk_by=None, copy=False)
Add new samples to a coreset tree based on the pandas DataFrame iterator. All features must be numeric. The target will be ignored when the Coreset is built.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datasets |
Union[Iterator[pd.DataFrame], pd.DataFrame]
|
pandas DataFrame or a DataFrame iterator. Data includes features, may include targets and may include indices. |
required |
target_datasets |
Union[Iterator[pd.DataFrame], pd.DataFrame]
|
pandas DataFrame or a DataFrame iterator, optional. Use when data is split to features and target. Should include only one column. |
None
|
chunk_size |
int
|
int, optional, default previous used chunk_size. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks. |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
copy |
bool
|
boolean, default False. False (default) - Input data might be updated as result of functions such as update_targets or update_features. True - Data is copied before processing (impacts memory). |
False
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
partial_build_from_file ¤
partial_build_from_file(file_path, target_file_path=None, *, reader_f=pd.read_csv, reader_kwargs=None, reader_chunk_size_param_name=None, chunk_size=None, chunk_by=None)
Add new samples to a coreset tree based on data taken from local storage. All features must be numeric. The target will be ignored when the Coreset is built.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]
|
file, list of files, directory, list of directories. Path(s) to the place where data is stored. Data includes features, may include targets and may include indices. |
required |
target_file_path |
Union[Union[str, os.PathLike], Iterable[Union[str, os.PathLike]]]
|
file, list of files, directory, list of directories, optional. Use when files are split to features and target. Each file should include only one column. |
None
|
reader_f |
Callable
|
pandas like read method, optional, default pandas read_csv. For example, to read excel files use pandas read_excel. |
pd.read_csv
|
reader_kwargs |
dict
|
dict, optional. Keyword arguments used when calling reader_f method. |
None
|
reader_chunk_size_param_name |
str
|
str, optional. reader_f input parameter name for reading file in chunks. When not provided we'll try to figure it out our self. Based on the data, we decide on the optimal chunk size to read and use this parameter as input when calling reader_f. Use "ignore" to skip the automatic chunk reading logic. |
None
|
chunk_size |
int
|
int, optional, default previous used chunk_size. The number of instances used when creating a coreset node in the tree. chunk_size=0: nodes are created based on input chunks. |
None
|
chunk_by |
Union[Callable, str, list]
|
function, label, or list of labels, optional. Split the data according to the provided key. When provided, chunk_size input is ignored. |
None
|
Returns:
Type | Description |
---|---|
CoresetTreeService
|
self |
plot ¤
plot(dir_path=None, name=None)
Produce a tree graph plot and save figure as a local png file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path |
Union[str, os.PathLike]
|
string or PathLike. Path to save the plot figure in; if not provided, or if isn't valid/doesn't exist, the figure will be saved in the current directory (from which this method is called). |
None
|
name |
str
|
string, optional. Name of the image file. |
None
|
Returns:
Type | Description |
---|---|
pathlib.Path
|
Image file path |
predict ¤
predict(X)
Run prediction on the trained model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[Iterable, Iterable[Iterable]]
|
An array of features. |
required |
Returns:
Type | Description |
---|---|
Model prediction results. |
predict_proba ¤
predict_proba(X)
Run prediction on the trained model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
Union[Iterable, Iterable[Iterable]]
|
An array of features. |
required |
Returns:
Type | Description |
---|---|
Returns the probability of the sample for each class in the model. |
remove_samples ¤
remove_samples(indices, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)
Remove samples from the coreset tree. The coreset tree is automatically updated to accommodate to the changes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
indices |
Iterable
|
array-like. An array of indices to be removed from the coreset tree. |
required |
force_resample_all |
int
|
int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_sensitivity_recalc |
int
|
int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_do_nothing |
bool
|
bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called. |
False
|
save ¤
save(dir_path=None, name=None, save_buffer=True, override=False, allow_pickle=True)
Save service configuration and relevant data to a local directory. Use this method when the service needs to be restored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path |
Union[str, os.PathLike]
|
string or PathLike, optional, default self.working_directory. A local directory for saving service's files. |
None
|
name |
str
|
string, optional, default service class name (lower case). Name of the subdirectory where the data will be stored. |
None
|
save_buffer |
bool
|
boolean, default True. Save also the data in the buffer (a partial node of the tree) along with the rest of the saved data. |
True
|
override |
bool
|
bool, optional, default False. False: add a timestamp suffix so each save won’t override the previous ones. True: The existing subdirectory with the provided name is overridden. |
False
|
allow_pickle |
bool
|
bool, optional, default True. True: Saves the Coreset tree in pickle format (much faster). False: Saves the Coreset tree in JSON format. |
True
|
Returns:
Type | Description |
---|---|
pathlib.Path
|
Save directory path. |
save_coreset ¤
save_coreset(file_path, level=0, as_orig=False, with_index=False)
Get coreset from the tree and save to a file along with coreset weights. Use the level parameter to control the level of the tree from which samples will be returned.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[str, os.PathLike]
|
string or PathLike. Local file path to store the coreset. |
required |
level |
int
|
int, optional, default 0. Defines the depth level of the tree from which the coreset is extracted. Level 0 returns the coreset from the head of the tree with up to coreset_size samples. Level 1 returns the coreset from the level below the head of the tree with up to 2*coreset_size samples, etc. |
0
|
as_orig |
bool
|
boolean, optional, default False. True: save in the original format. False: save in a processed format (indices, X, y, weight). |
False
|
with_index |
bool
|
boolean, optional, default False. Relevant only when as_orig=True. Save also index column. |
False
|
set_seen_indication ¤
set_seen_indication(seen_flag=True, indices=None)
Set samples as 'seen' or 'unseen'. Not providing an indices list defaults to setting the flag on all samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seen_flag |
bool
|
bool, optional, default True. Set 'seen' or 'unseen' flag |
True
|
indices |
Iterable
|
array like, optional. Set flag only for the provided list of indices. Defaults to all indices. |
None
|
update_dirty ¤
update_dirty(force_resample_all=None, force_sensitivity_recalc=None)
Calculate the sensitivity and resample the nodes that were marked as dirty, meaning they were affected by any of the methods: remove_samples, update_targets, update_features or filter_out_samples, when they were called with force_do_nothing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
force_resample_all |
int
|
int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_sensitivity_recalc |
int
|
int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
update_features ¤
update_features(indices, X, feature_names=None, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)
Update the features for selected samples on the coreset tree. The coreset tree is automatically updated to accommodate to the changes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
indices |
Iterable
|
array-like. An array of indices to be updated. |
required |
X |
Iterable
|
array-like. An array of features. Should have the same length as indices. |
required |
feature_names |
Iterable[str]
|
If the quantity of features in X is not equal to the quantity of features in the original coreset, this param should contain list of names of passed features. |
None
|
force_resample_all |
int
|
int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_sensitivity_recalc |
int
|
int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_do_nothing |
bool
|
bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called. |
False
|
update_targets ¤
update_targets(indices, y, force_resample_all=None, force_sensitivity_recalc=None, force_do_nothing=False)
Update the targets for selected samples on the coreset tree. The coreset tree is automatically updated to accommodate to the changes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
indices |
Iterable
|
array-like. An array of indices to be updated. |
required |
y |
Iterable
|
array-like. An array of classes/labels. Should have the same length as indices. |
required |
force_resample_all |
int
|
int, optional. Force full resampling of the affected nodes in the coreset tree, starting from level=force_resample_all. None - Do not force_resample_all (default), 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_sensitivity_recalc |
int
|
int, optional. Force the recalculation of the sensitivity and partial resampling of the affected nodes, based on the coreset's quality, starting from level=force_sensitivity_recalc. None - If self.save_all=False - one level above leaf node level. If self.save_all=True - leaf level 0 - The head of the tree, 1 - The level below the head of the tree, len(tree)-1 = leaf level, -1 - same as leaf level. |
None
|
force_do_nothing |
bool
|
bool, optional, default False. When set to True, suppresses any update to the coreset tree until update_dirty is called. |
False
|