DataTuningParams
A class including all required information to tune the data parameters for unsupervised and regression Coreset trees: CoresetTreeServiceDTR
,
CoresetTreeServiceKMeans
,
CoresetTreeServiceLR
,
CoresetTreeServicePCA
,
CoresetTreeServiceSVD
.
The parameters of the class are treated as a param_grid
and a Coreset tree will be built for each combination of parameters.
Parameter name | Type | Description |
---|---|---|
General Parameters | ||
coreset_size | List[Optional[Union[int, float]]] |
Represents the coreset size of each node in the coreset tree. If None, the coreset size is not specified.
If provided as a float, it represents the ratio between each chunk and the resulting coreset.
In any case the coreset_size is limited to 60% of the chunk_size.
If provided as int, it is the number of samples. The coreset is constructed by sampling data instances
from the dataset based on their calculated importance. Since each instance may be sampled more than once,
in practice, the actual size of the coreset is mostly smaller than coreset_size.
Example: 'coreset_size': [1000, 5000, 10000] |
deterministic_size | List[Optional[Union[int, float]]] |
The ratio of the coreset_size, which is selected deterministically, based on the calculated importance.
If None, the deterministic size is not specified and the Coreset would sample all its samples probabilistically.
Example: 'deterministic_size': [0.1, 0.2, None] |
det_weights_behaviour | List[Optional[str]] |
Determines how the weights of the Coreset samples will be calculated. The default is auto , which defaults to keep
Example: 'det_weights_behaviour': ['keep', 'inv'] |
Code example:
data_tuning_params = {
'coreset_size': [500, 2000, 5000],
'deterministic_size': [0.1, 0.3, None],
'det_weights_behaviour': ['keep', 'inv']
}
service = CoresetTreeServiceDTR(data_tuning_params=data_tuning_params, ...)