DataTuningParamsClassification
Bases: DataTuningParams
A class including all required information to tune the data parameters for classification Coreset trees: CoresetTreeServiceDTC
and CoresetTreeServiceLG
.
The parameters of the class are treated as a param_grid and a Coreset tree will be built for each combination of parameters.
Parameter name | Type | Description |
---|---|---|
General Parameters | ||
coreset_size | List[Optional[Union[int, float]]] |
Represents the coreset size of each node in the coreset tree. If None, the coreset size is not specified.
If provided as a float, it represents the ratio between each chunk and the resulting coreset.
In any case the coreset_size is limited to 60% of the chunk_size.
If provided as int, it is the number of samples. The coreset is constructed by sampling data instances
from the dataset based on their calculated importance. Since each instance may be sampled more than once,
in practice, the actual size of the coreset is mostly smaller than coreset_size.
Example: 'coreset_size': [1000, 5000, 10000] |
fair | List[Optional[Union[str, bool]]] |
Automatically determines the number of samples to sample from each class. If set to True , small classes will be sampled in a
higher proportion than their proportion in the full dataset. If set to False , the classes would be
sampled according to their proportion in the full dataset, unless the class_size parameter is defined.
Example: 'fair': [True, False] |
class_size | List[Optional[Dict[Any, Union[int, float]]]] |
Determines the number of samples to sample from each class. If provided as float, it represents the ratio from the coreset_size .
If provided as int, it is the number of samples. If None, the number of samples per class will be
automatically determined based on the fair parameter. Entries in the class_size should not sum higher than the provided coreset_size .
Example: 'class_size': [{0: 5000, 1: 1000}, None] |
deterministic_size | List[Optional[Union[int, float]]] |
The ratio of the coreset_size, which is selected deterministically, based on the calculated importance.
If None, the deterministic size is not specified and the Coreset would sample all its samples probabilistically.
Example: 'deterministic_size': [0.1, 0.2, None] |
det_weights_behaviour | List[Optional[str]] |
Determines how the weights of the Coreset samples will be calculated. The default is auto , which defaults to keep
Example: 'det_weights_behaviour': ['keep', 'inv'] |
sample_all | List[Optional[List[Any]]] |
A list of classes for which all data instances should be selected into the Coreset, instead of applying sampling.
If None, sample_all will apply to no class. Entries in the sample_all should not sum higher than the provided coreset_size .
sample_all should only be used in highly imbalanced datasets to ensure the rare classes are sampled. A similar effect can be achieved when providing a proper class_size .
Example: 'sample_all': [None, [1]] |
Code example:
data_tuning_params = {
'coreset_size': [500, 2000, 5000],
'deterministic_size': [0.1, 0.3, None],
'det_weights_behaviour': ['keep', 'inv'],
'sample_all': 'sample_all': [None, [1]],
'class_size': [{0: 5000, 1: 1000}, None],
'fair': [True, False]
}
service = CoresetTreeServiceDTC(data_tuning_params=data_tuning_params, ...)