datacleanbot¶
Welcome to the documentation of the datacleanbot
Python API.
datacleanbot
offers automated, data-driven support to help users clean data
effectively and smoothly. Given a random raw dataset representing a machine
learning problem, the Python tool is capable of automatically identifying
the potential issues and reporting the results and recommendations to the
end-user in an effective way. datacleanbot
is designed with a strong connection
to OpenML which is a platform where people can
easily share data, experiments and machine learning models. Users can easily
acquire datasets from OpenML with the dataset ID and clean them with datacleanbot
.
User’s Guide¶
Usage¶
Acquire Data¶
The first step is to acquire data from OpenML.
import openml as oml
import datacleanbot.dataclean as dc
import numpy as np
data = oml.datasets.get_dataset(id) # id: openml dataset id
X, y, categorical_indicator, features = data.get_data(target=data.default_target_attribute, dataset_format='array')
Xy = np.concatenate((X,y.reshape((y.shape[0],1))), axis=1)
Show Impotant Features¶
datacleanbot
computes the most important features of
the given dataset using random forest and present the
15 most useful features to the user.
dc.show_important_features(X, y, data.name, features)
Unify Column Names¶
Inconsistent capitalization of column names can be detected and reported to the user. Users can decide whether to unify them or not. The capitalization can be unified to either upper case or lower case.
dc.unify_name_consistency(features)
Show Statistical Inforamtion¶
datacleanbot
can present the statistical information to
help users gain a better understanding of the data
distribution.
dc.show_statistical_info(Xy)
Discover Data Types¶
datacleanbot
can discover feature data types.
Basic data types discovered are ‘datetime’, ‘float’, ‘integer’,
‘bool’ and ‘string’.
datacleanbot
also can discover statistical data types (real, positive real,
categorical and count) using Bayesian Model abda.
dc.discover_types(Xy)
Clean Duplicated Rows¶
datacleanbot
detects the duplicated records and reports them to users.
dc.clean_duplicated_rows(Xy)
Handle Missing Values¶
datacleanbot
identifies characters ‘n/a’, ‘na’, ‘–’ and ‘?’ as missing values.
Users can add extra characters to be considered as missing. After the missing
values being detected, datacleanbot
will present the missing values in effective
visualizations to help users identify the missing mechanism. Afterwards, datacleanbot
recommends the appropriate approach to clean missing values according the missing
mechanism.
features, Xy = dc.handle_missing(features, Xy)
Outlier Detection¶
A meta-learner is trained beforehand to recommend the outlier detection algorithm according to the meta features og the given dataset. Users can apply the recommended algorithm or any other available algorithm to detect outliers. After the detection, outliers will be present to users in effective visualizations and users can choose to drop them or not.
Xy = dc.handle_outlier(features, Xy)
Example¶
Example_autoclean¶
[4]:
import datacleanbot.dataclean as dc
import openml as oml
import numpy as np
[5]:
# acquire data
data = oml.datasets.get_dataset(51)
X, y, categorical_indicator, features = data.get_data(target=data.default_target_attribute, dataset_format='array')
Xy = np.concatenate((X,y.reshape((y.shape[0],1))), axis=1)
[6]:
# input openml dataset id
Xy = dc.autoclean(Xy, data.name, features)
Important Features

Statistical Information
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 294.000000 | 3.0 | 294.000000 | 271.000000 | 293.000000 | 286.000000 | 294.000000 | 293.000000 | 294.000000 | 104.000000 | 28.000000 | 293.000000 | 293.000000 | 294.000000 |
mean | 47.826531 | 0.0 | 1.867347 | 250.848708 | 0.303754 | 0.930070 | 0.586054 | 1.156997 | 0.724490 | 1.105769 | 1.035714 | 139.129693 | 132.583618 | 0.360544 |
std | 7.811812 | 0.0 | 0.956077 | 67.657711 | 0.460665 | 0.255476 | 0.908648 | 0.417011 | 0.447533 | 0.338995 | 0.881167 | 23.589749 | 17.626568 | 0.480977 |
min | 28.000000 | 0.0 | 0.000000 | 85.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 82.000000 | 92.000000 | 0.000000 |
25% | 42.000000 | 0.0 | 1.000000 | 209.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 0.000000 | 122.000000 | 120.000000 | 0.000000 |
50% | 49.000000 | 0.0 | 2.000000 | 243.000000 | 0.000000 | 1.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 140.000000 | 130.000000 | 0.000000 |
75% | 54.000000 | 0.0 | 3.000000 | 282.500000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 2.000000 | 155.000000 | 140.000000 | 1.000000 |
max | 66.000000 | 0.0 | 3.000000 | 603.000000 | 1.000000 | 1.000000 | 5.000000 | 2.000000 | 1.000000 | 2.000000 | 2.000000 | 190.000000 | 200.000000 | 1.000000 |
Discover Data Types
Simple Data Types
['int64', 'int64', 'int64', 'int64', 'int64', 'int64', 'int64', 'int64', 'bool', 'float64', 'float64', 'int64', 'int64', 'bool']
Statistical Data Types
['Type.POSITIVE', 'Type.CATEGORICAL', 'Type.CATEGORICAL', 'Type.POSITIVE', 'Type.COUNT', 'Type.CATEGORICAL', 'Type.POSITIVE', 'Type.COUNT', 'Type.COUNT', 'Type.CATEGORICAL', 'Type.CATEGORICAL', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.CATEGORICAL']
Duplicated Rows
Identifying Duplicated Rows ...
0 1 2 3 4 5 6 7 8 9 10 11 12 13
101 49.0 NaN 3.0 NaN 0.0 1.0 0.0 1.0 0.0 NaN NaN 160.0 110.0 0.0
102 49.0 NaN 3.0 NaN 0.0 1.0 0.0 1.0 0.0 NaN NaN 160.0 110.0 0.0
Do you want to drop the duplicated rows? [y/n]y
Duplicated rows are dropped.
Inconsitent Column Names
Column names
============
['age', 'sex', 'chest_pain', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']
Column names are consistent
Missing values
Identify Missing Data ...
The default setting of missing characters is ['n/a', 'na', '--', '?']
Do you want to add extra character? [y/n]n
Number of missing in each feature
0 0
1 290
2 0
3 22
4 1
5 8
6 0
7 1
8 0
9 189
10 265
11 1
12 1
13 0
dtype: int64
Records containing missing values:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 28.0 | NaN | 3.0 | 132.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | NaN | NaN | 185.0 | 130.0 | 0.0 |
1 | 29.0 | NaN | 3.0 | 243.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | NaN | NaN | 160.0 | 120.0 | 0.0 |
2 | 29.0 | NaN | 3.0 | NaN | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | NaN | NaN | 170.0 | 140.0 | 0.0 |
3 | 30.0 | NaN | 0.0 | 237.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | NaN | 0.0 | 170.0 | 170.0 | 0.0 |
4 | 31.0 | NaN | 3.0 | 219.0 | 0.0 | 1.0 | 0.0 | 2.0 | 0.0 | NaN | NaN | 150.0 | 100.0 | 0.0 |
Missing correlation between features containing missing values and other features
1 | 3 | 4 | 5 | 7 | 9 | 10 | 11 | 12 | |
---|---|---|---|---|---|---|---|---|---|
0 | -0.054393 | 0.019737 | 0.001330 | -0.014961 | 0.053771 | -0.234171 | 0.001532 | 0.001330 | 0.001330 |
1 | 1.000000 | -0.099671 | 0.005952 | 0.017041 | 0.005952 | -0.004595 | 0.082259 | 0.005952 | 0.005952 |
2 | 0.020988 | 0.067940 | 0.069733 | 0.023981 | -0.114337 | 0.342524 | 0.075190 | 0.069733 | 0.069733 |
3 | -0.099671 | 1.000000 | -0.016674 | -0.047736 | -0.016674 | 0.076025 | 0.048563 | -0.016674 | -0.016674 |
4 | 0.005952 | -0.016674 | 1.000000 | -0.009805 | -0.003425 | -0.078890 | 0.019022 | 1.000000 | 1.000000 |
5 | 0.017041 | -0.047736 | -0.009805 | 1.000000 | -0.009805 | -0.007021 | -0.088011 | -0.009805 | -0.009805 |
6 | 0.009863 | -0.077552 | 0.091000 | -0.039312 | -0.037900 | -0.841642 | 0.005952 | 0.091000 | 0.091000 |
7 | 0.005952 | -0.016674 | -0.003425 | -0.009805 | 1.000000 | 0.043410 | 0.019022 | -0.003425 | -0.003425 |
8 | -0.062333 | 0.000198 | -0.095489 | -0.085351 | 0.035864 | -0.038358 | -0.016808 | -0.095489 | -0.095489 |
9 | -0.004595 | 0.076025 | -0.078890 | -0.007021 | 0.043410 | 1.000000 | 0.025752 | -0.078890 | -0.078890 |
10 | 0.082259 | 0.048563 | 0.019022 | -0.088011 | 0.019022 | 0.025752 | 1.000000 | 0.019022 | 0.019022 |
11 | 0.005952 | -0.016674 | 1.000000 | -0.009805 | -0.003425 | -0.078890 | 0.019022 | 1.000000 | 1.000000 |
12 | 0.005952 | -0.016674 | 1.000000 | -0.009805 | -0.003425 | -0.078890 | 0.019022 | 1.000000 | 1.000000 |
Visualize Missing Data ...



Clean Missing Data ...
Feature [1, 10] has extreme large proportion of missing data
Do you want to delete the above features? [y/n]y
Choose the missing mechanism [a/b/c/d]:
a.MCAR b.MAR c.MNAR d.Skip
b
Imputation score of knn is 0.7567397233586597
Imputation score of matrix factorization is 0.7567397233586597
Imputation score of multiple imputation is 0.8122681667640756
Imputation method with the highest socre is multiple imputation
The recommended approach is multiple imputation
Do you want to apply the recommended approach? [y/n]y
Applying multiple imputation ...
Missing values cleaned!
Outliers
Recommend Algorithm ...
The recommended approach is isolation forest.
Do you want to apply the recommended outlier detection approach? [y/n]y
Visualize Outliers ...

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | anomaly_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
232 | 48 | 1 | 275 | 1 | 0 | 2 | 2 | 1 | 0 | 150 | 122 | 1 | -0.119937 |
254 | 46 | 0 | 272 | 0 | 0 | 2 | 1 | 1 | 1 | 175 | 140 | 1 | -0.0797964 |
275 | 59 | 1 | 264 | 1 | 0 | 0 | 0 | 1 | 0.944132 | 119 | 140 | 1 | -0.0689509 |
90 | 48 | 3 | 308 | 0.141883 | 1 | 2 | 2 | 0 | 2 | 147.257 | 139.446 | 0 | -0.068584 |
220 | 59 | 1 | 338 | 1 | 0 | 1.5 | 2 | 0 | 1 | 130 | 130 | 1 | -0.0657597 |
291 | 58 | 3 | 393 | 1 | 1 | 1 | 1 | 0 | 1 | 110 | 180 | 1 | -0.0580726 |
248 | 58 | 2 | 211 | 0 | 0 | 0 | 2 | 1 | 1.15535 | 92 | 160 | 1 | -0.0565477 |
3 | 30 | 0 | 237 | 0 | 1 | 0 | 2 | 0 | 1.32164 | 170 | 170 | 0 | -0.0521636 |
117 | 51 | 2 | 220 | 1 | 1 | 2 | 1 | 0 | 2 | 160 | 130 | 0 | -0.0464553 |
223 | 65 | 1 | 306 | 1 | 0 | 1.5 | 1 | 1 | 1 | 87 | 140 | 1 | -0.0440714 |
94 | 48 | 1 | 163 | 0 | 1 | 2 | 1 | 0 | 2 | 175 | 108 | 0 | -0.0405723 |
268 | 55 | 3 | 292 | 1 | 0 | 2 | 1 | 1 | 1 | 143 | 160 | 1 | -0.0397377 |
171 | 57 | 1 | 347 | 1 | 1 | 0.8 | 2 | 0 | 1 | 126 | 180 | 0 | -0.0360816 |
242 | 54 | 1 | 603 | 1 | 0 | 1 | 1 | 1 | 1 | 125 | 130 | 1 | -0.0355075 |
146 | 54 | 0 | 171 | 0 | 1 | 2 | 1 | 1 | 2 | 137 | 120 | 0 | -0.0348489 |
276 | 65 | 1 | 263 | 1 | 0 | 2 | 1 | 1 | 1 | 112 | 170 | 1 | -0.0329883 |
273 | 58 | 3 | 164 | 1 | 1 | 2 | 2 | 1 | 1 | 99 | 136 | 1 | -0.0309242 |
12 | 35 | 0 | 160 | 0 | 1 | 0 | 2 | 0 | 1.31709 | 185 | 120 | 0 | -0.0297764 |
154 | 54 | 1 | 365 | 0 | 1 | 1 | 2 | 1 | 2 | 134 | 150 | 0 | -0.0297346 |
157 | 55 | 3 | 394 | 0 | 1 | 0 | 0 | 0 | 1.43866 | 150 | 130 | 0 | -0.0280174 |
289 | 54 | 2 | 294 | 1 | 1 | 0 | 2 | 0 | 1 | 100 | 130 | 1 | -0.026562 |
0 | 28 | 3 | 132 | 0 | 1 | 0 | 0 | 1 | 1.42619 | 185 | 130 | 0 | -0.0263437 |
31 | 39 | 3 | 224.323 | 0 | 1 | 2 | 2 | 1 | 2 | 146 | 120 | 0 | -0.0256593 |
263 | 52 | 1 | 246 | 1 | 1 | 4 | 2 | 1 | 1 | 82 | 160 | 1 | -0.0254428 |
185 | 62 | 0 | 193 | 0 | 1 | 0 | 1 | 0 | 1.36576 | 116 | 160 | 0 | -0.0246483 |
170 | 57 | 0 | 308 | 0 | 1 | 1 | 1 | 0 | 1 | 98 | 130 | 0 | -0.0235655 |
227 | 40 | 1 | 392 | 0 | 1 | 2 | 1 | 0 | 1 | 130 | 150 | 1 | -0.0210837 |
290 | 56 | 1 | 342 | 1 | 0 | 3 | 1 | 1 | 1 | 150 | 155 | 1 | -0.0182647 |
130 | 53 | 3 | 468 | 0 | 0.864092 | 0 | 1 | 0 | 1.37019 | 127 | 113 | 0 | -0.0181947 |
91 | 48 | 3 | 256.72 | 0 | 0 | 0 | 2 | 0 | 1.3819 | 148 | 120 | 0 | -0.0151125 |
285 | 50 | 1 | 231 | 1 | 1 | 5 | 2 | 1 | 1 | 140 | 140 | 1 | -0.0115976 |
183 | 61 | 1 | 294 | 1 | 1 | 1 | 2 | 0 | 1 | 120 | 130 | 0 | -0.0100509 |
255 | 47 | 2 | 248 | 0 | 0 | 0 | 1 | 0 | 1.16329 | 170 | 135 | 1 | -0.00722342 |
37 | 39 | 2 | 147 | 0 | 0 | 0 | 1 | 1 | 1.36338 | 160 | 160 | 0 | -0.00674078 |
89 | 47 | 1 | 276 | 1 | 0 | 0 | 1 | 1 | 1.11308 | 125 | 140 | 0 | -0.00241484 |
195 | 38 | 1 | 117 | 1 | 1 | 2.5 | 1 | 1 | 1 | 134 | 92 | 1 | -0.00238595 |
168 | 56 | 2 | 276 | 1 | 1 | 1 | 1 | 1 | 2 | 128 | 130 | 0 | -0.00109016 |
150 | 54 | 3 | 195 | 0 | 1 | 1 | 2 | 1 | 2 | 130 | 160 | 0 | -0.000929664 |
205 | 48 | 1 | 263 | 0 | 0 | 0 | 1 | 1 | 1.00107 | 110 | 106 | 1 | 0.00114704 |
250 | 41 | 1 | 172 | 0 | 1 | 2 | 2 | 1 | 1 | 130 | 130 | 1 | 0.00212158 |
196 | 40 | 1 | 466 | 1 | 0.86451 | 1 | 1 | 1 | 1 | 152 | 120 | 1 | 0.00309686 |
188 | 33 | 1 | 246 | 1 | 1 | 1 | 1 | 0 | 1 | 150 | 100 | 1 | 0.00348012 |
118 | 51 | 2 | 200 | 0 | 1 | 0.5 | 1 | 0 | 2 | 120 | 150 | 0 | 0.0043416 |
131 | 53 | 3 | 216 | 1 | 1 | 2 | 1 | 0 | 1 | 142 | 140 | 0 | 0.00484593 |
224 | 32 | 1 | 529 | 0 | 1 | 0 | 1 | 1 | 0.99009 | 130 | 118 | 1 | 0.00797082 |
282 | 47 | 1 | 291 | 1 | 1 | 3 | 2 | 1 | 1 | 158 | 160 | 1 | 0.010827 |
246 | 56 | 1 | 213 | 1 | 0 | 1 | 1 | 1 | 1 | 125 | 150 | 1 | 0.011012 |
74 | 45 | 3 | 224 | 0 | 0 | 0 | 1 | 1 | 1.35054 | 122 | 140 | 0 | 0.0124018 |
281 | 47 | 1 | 205 | 1 | 1 | 2 | 1 | 0 | 1 | 98 | 120 | 1 | 0.013517 |
140 | 54 | 3 | 230 | 0 | 0 | 0 | 1 | 0 | 1.38866 | 140 | 120 | 0 | 0.013915 |
252 | 44 | 3 | 288 | 1 | 1 | 3 | 1 | 1 | 1 | 150 | 150 | 1 | 0.0143166 |
59 | 43 | 0 | 223 | 0 | 1 | 0 | 1 | 0 | 1.24638 | 142 | 100 | 0 | 0.0148435 |
228 | 43 | 0 | 291 | 0 | 1 | 0 | 2 | 1 | 1.07937 | 155 | 120 | 1 | 0.0149449 |
84 | 46 | 1 | 280 | 0 | 1 | 0 | 2 | 1 | 1.36972 | 120 | 180 | 0 | 0.0158468 |
22 | 37 | 1 | 173 | 0 | 1 | 0 | 2 | 0 | 1.38258 | 184 | 130 | 0 | 0.0158852 |
14 | 35 | 3 | 308 | 0 | 1 | 0 | 0 | 1 | 1.39788 | 180 | 120 | 0 | 0.0173538 |
4 | 31 | 3 | 219 | 0 | 1 | 0 | 2 | 0 | 1.3727 | 150 | 100 | 0 | 0.0187209 |
272 | 56 | 1 | 388 | 1 | 1 | 2 | 2 | 1 | 1 | 122 | 170 | 1 | 0.0187329 |
184 | 61 | 1 | 292 | 1 | 1 | 0 | 2 | 1 | 1.21704 | 115 | 125 | 0 | 0.0191592 |
218 | 57 | 3 | 265 | 1 | 1 | 1 | 2 | 1 | 1 | 145 | 140 | 1 | 0.0224086 |
186 | 62 | 3 | 271 | 0 | 1 | 1 | 1 | 1 | 2 | 152 | 140 | 0 | 0.0224566 |
265 | 53 | 1 | 285 | 1 | 1 | 1.5 | 2 | 1 | 1 | 120 | 180 | 1 | 0.0235483 |
158 | 55 | 3 | 256 | 0 | 0 | 0 | 1 | 1 | 1.37734 | 137 | 120 | 0 | 0.0238808 |
35 | 39 | 3 | 241 | 0 | 1 | 0 | 1 | 1 | 1.43285 | 106 | 190 | 0 | 0.0244046 |
172 | 57 | 3 | 260 | 0 | 0 | 0 | 1 | 1 | 1.41142 | 140 | 140 | 0 | 0.0244428 |
17 | 36 | 2 | 340 | 0 | 1 | 1 | 1 | 1 | 1 | 184 | 112 | 0 | 0.0256382 |
260 | 52 | 1 | 342 | 1 | 1 | 1 | 2 | 1 | 1 | 96 | 112 | 1 | 0.0256705 |
213 | 51 | 1 | 303 | 1 | 1 | 1 | 1 | 0 | 1 | 150 | 160 | 1 | 0.0263497 |
191 | 36 | 3 | 267 | 0 | 1 | 3 | 1 | 1 | 1 | 160 | 120 | 1 | 0.0272168 |
244 | 54 | 1 | 198 | 1 | 1 | 2 | 1 | 1 | 1 | 142 | 200 | 1 | 0.0278133 |
165 | 56 | 2 | 219 | 0 | 0.904268 | 0 | 2 | 0 | 1.46569 | 164 | 130 | 0 | 0.0278579 |
112 | 50 | 3 | 209 | 0 | 1 | 0 | 2 | 1 | 1.48383 | 116 | 170 | 0 | 0.0279521 |
109 | 50 | 1 | 328 | 1 | 1 | 1 | 1 | 0 | 1 | 110 | 120 | 0 | 0.0285039 |
125 | 52 | 1 | 180 | 1 | 1 | 1.5 | 1 | 0 | 1 | 140 | 130 | 0 | 0.0286704 |
143 | 54 | 3 | 309 | 0 | 0.889135 | 0 | 2 | 0 | 1.47009 | 140 | 140 | 0 | 0.0291233 |
23 | 37 | 3 | 283 | 0 | 1 | 0 | 2 | 1 | 1.34845 | 98 | 130 | 0 | 0.0291719 |
127 | 52 | 3 | 100 | 1 | 1 | 0 | 1 | 1 | 1.356 | 138 | 140 | 0 | 0.0303691 |
85 | 47 | 3 | 257 | 0 | 1 | 1 | 1 | 0 | 2 | 135 | 140 | 0 | 0.030758 |
72 | 45 | 3 | 244.979 | 0 | 1 | 0 | 1 | 0 | 1.53949 | 180 | 180 | 0 | 0.0310886 |
67 | 44 | 1 | 218 | 0 | 1 | 0 | 2 | 0 | 1.30464 | 115 | 120 | 0 | 0.0312676 |
141 | 54 | 3 | 273 | 0 | 1 | 1.5 | 1 | 0 | 1 | 150 | 120 | 0 | 0.0313078 |
95 | 48 | 1 | 254 | 0 | 1 | 0 | 2 | 0 | 1.30643 | 110 | 120 | 0 | 0.0314005 |
155 | 55 | 3 | 344 | 0 | 1 | 0 | 2 | 0 | 1.46327 | 160 | 110 | 0 | 0.0316206 |
189 | 34 | 0 | 156 | 0 | 1 | 0 | 1 | 1 | 1.1145 | 180 | 140 | 1 | 0.0317022 |
256 | 48 | 1 | 214 | 1 | 1 | 1.5 | 1 | 0 | 1 | 108 | 138 | 1 | 0.0319747 |
264 | 53 | 2 | 518 | 0 | 1 | 0 | 1 | 1 | 1.15593 | 130 | 145 | 1 | 0.0321085 |
288 | 52 | 1 | 331 | 1 | 1 | 2.5 | 1 | 1 | 0.96057 | 94 | 160 | 1 | 0.0322196 |
30 | 39 | 2 | 182 | 0 | 1 | 0 | 2 | 0 | 1.41003 | 180 | 110 | 0 | 0.0322735 |
211 | 50 | 2 | 288 | 1 | 1 | 0 | 1 | 0 | 1.06802 | 140 | 140 | 1 | 0.0325019 |
292 | 65 | 1 | 275 | 1 | 1 | 1 | 2 | 1 | 1 | 115 | 130 | 1 | 0.032828 |
136 | 53 | 1 | 260 | 1 | 1 | 3 | 2 | 1 | 1 | 112 | 124 | 0 | 0.0339107 |
247 | 57 | 1 | 255 | 1 | 1 | 3 | 1 | 1 | 1 | 92 | 150 | 1 | 0.0339626 |
139 | 54 | 3 | 221 | 0 | 1 | 1 | 1 | 0 | 2 | 138 | 120 | 0 | 0.0344146 |
32 | 39 | 3 | 200 | 1 | 1 | 1 | 1 | 1 | 1 | 160 | 120 | 0 | 0.0366313 |
177 | 59 | 3 | 188 | 0 | 1 | 1 | 1 | 0 | 1 | 124 | 130 | 0 | 0.0368223 |
86 | 47 | 2 | 241.057 | 0 | 1 | 2 | 1 | 0 | 1 | 145 | 130 | 0 | 0.0369351 |
208 | 49 | 2 | 180 | 0 | 1 | 1 | 1 | 0 | 1 | 156 | 160 | 1 | 0.0377091 |
78 | 46 | 1 | 238 | 0 | 1 | 0 | 1 | 0 | 1.2769 | 90 | 130 | 0 | 0.0380135 |
278 | 41 | 1 | 336 | 1 | 1 | 3 | 1 | 1 | 1 | 118 | 120 | 1 | 0.0391171 |
180 | 59 | 2 | 213 | 0 | 1 | 0 | 1 | 1 | 1.44776 | 100 | 180 | 0 | 0.0392088 |
182 | 60 | 2 | 246 | 0 | 1 | 0 | 0 | 1 | 1.40395 | 135 | 120 | 0 | 0.0408451 |
79 | 46 | 3 | 275 | 1 | 1 | 0 | 1 | 1 | 1.32789 | 165 | 140 | 0 | 0.0412702 |
233 | 48 | 1 | 193 | 1 | 1 | 3 | 1 | 1 | 1 | 102 | 160 | 1 | 0.04237 |
96 | 48 | 1 | 227 | 1 | 1 | 1 | 1 | 0 | 1 | 130 | 150 | 0 | 0.0457815 |
286 | 50 | 1 | 341 | 1 | 1 | 2.5 | 2 | 1 | 1 | 125 | 140 | 1 | 0.0464844 |
277 | 66 | 1 | 276.836 | 1 | 1 | 1 | 1 | 1 | 1 | 94 | 140 | 1 | 0.0468952 |
267 | 55 | 0 | 295 | 0 | 1 | 0 | 1.11432 | 1 | 1.1145 | 136 | 140 | 1 | 0.0469683 |
61 | 43 | 3 | 215 | 0 | 1 | 0 | 2 | 0 | 1.47417 | 175 | 120 | 0 | 0.0490634 |
9 | 34 | 3 | 161 | 0 | 1 | 0 | 1 | 0 | 1.46804 | 190 | 130 | 0 | 0.0499133 |
234 | 48 | 1 | 329 | 1 | 1 | 1.5 | 1 | 1 | 1 | 92 | 160 | 1 | 0.049914 |
10 | 34 | 3 | 214 | 0 | 1 | 0 | 2 | 1 | 1.46008 | 168 | 150 | 0 | 0.050185 |
82 | 46 | 1 | 238 | 1 | 1 | 1 | 2 | 1 | 1 | 140 | 110 | 0 | 0.0513488 |
270 | 56 | 3 | 279 | 0 | 1 | 1 | 1 | 0 | 1 | 150 | 120 | 1 | 0.0514012 |
221 | 60 | 1 | 248 | 0 | 1 | 1 | 1 | 1 | 1 | 125 | 100 | 1 | 0.0528763 |
280 | 44 | 1 | 491 | 0 | 1 | 0 | 1 | 1 | 1.07103 | 135 | 135 | 1 | 0.0529694 |
235 | 48 | 1 | 355 | 1 | 1 | 2 | 1 | 1 | 1 | 99 | 160 | 1 | 0.0532494 |
271 | 56 | 1 | 230 | 1 | 1 | 1.5 | 2 | 1 | 1 | 124 | 150 | 1 | 0.053556 |
162 | 55 | 2 | 220 | 0 | 1 | 0 | 0 | 1 | 1.38884 | 134 | 120 | 0 | 0.0541677 |
62 | 43 | 3 | 249 | 0 | 1 | 0 | 2 | 0 | 1.46809 | 176 | 120 | 0 | 0.0545522 |
103 | 49 | 2 | 207 | 0 | 1 | 0 | 2 | 0 | 1.41247 | 135 | 130 | 0 | 0.0554622 |
229 | 45 | 1 | 219 | 1 | 1 | 1 | 2 | 1 | 1 | 130 | 130 | 1 | 0.057132 |
52 | 42 | 2 | 211 | 0 | 1 | 0 | 2 | 0 | 1.36915 | 137 | 115 | 0 | 0.05732 |
45 | 41 | 3 | 250 | 0 | 1 | 0 | 2 | 0 | 1.40704 | 142 | 110 | 0 | 0.0591437 |
106 | 49 | 1 | 297 | 0 | 0.93087 | 1 | 1 | 1 | 1 | 132 | 120 | 0 | 0.0593337 |
70 | 44 | 1 | 412 | 0 | 1 | 0 | 1 | 1 | 1.34646 | 170 | 150 | 0 | 0.0608182 |
187 | 31 | 1 | 270 | 1 | 1 | 1.5 | 1 | 1 | 1 | 153 | 120 | 1 | 0.0609306 |
284 | 49 | 1 | 222 | 0 | 1 | 2 | 1 | 1 | 1 | 122 | 150 | 1 | 0.0609557 |
142 | 54 | 3 | 253 | 0 | 1 | 0 | 2 | 0 | 1.49633 | 155 | 130 | 0 | 0.0612764 |
266 | 54 | 1 | 216 | 0 | 1 | 1.5 | 1 | 1 | 1 | 105 | 140 | 1 | 0.0616474 |
145 | 54 | 3 | 312 | 0 | 1 | 0 | 1 | 0 | 1.47594 | 130 | 160 | 0 | 0.0620174 |
115 | 51 | 3 | 194 | 0 | 1 | 0 | 1 | 0 | 1.53814 | 170 | 160 | 0 | 0.0643856 |
219 | 58 | 2 | 213 | 0 | 1 | 0 | 2 | 1 | 1.24797 | 140 | 130 | 1 | 0.0646421 |
259 | 51 | 2 | 160 | 0 | 1 | 2 | 1 | 1 | 1 | 150 | 135 | 1 | 0.0650465 |
56 | 42 | 2 | 228 | 1 | 1 | 1.5 | 1 | 1 | 1 | 152 | 120 | 0 | 0.0666529 |
217 | 55 | 1 | 201 | 1 | 1 | 3 | 1 | 1 | 1 | 130 | 140 | 1 | 0.0689589 |
151 | 54 | 3 | 305 | 0 | 1 | 0 | 1 | 1 | 1.5261 | 175 | 160 | 0 | 0.0689688 |
13 | 35 | 1 | 167 | 0 | 1 | 0 | 1 | 0 | 1.33391 | 150 | 140 | 0 | 0.0695931 |
97 | 48 | 3 | 240.484 | 0 | 1 | 0 | 1 | 1 | 1.35496 | 100 | 100 | 0 | 0.0698431 |
179 | 59 | 2 | 318 | 1 | 1 | 1 | 1 | 1 | 1 | 120 | 130 | 0 | 0.0702111 |
240 | 54 | 2 | 237 | 1 | 1 | 1.5 | 1 | 1 | 1.06656 | 150 | 120 | 1 | 0.070729 |
241 | 54 | 1 | 242 | 1 | 1 | 1 | 1 | 1 | 1 | 91 | 130 | 1 | 0.0725249 |
129 | 52 | 2 | 259 | 0 | 1 | 0 | 2 | 1 | 1.46127 | 170 | 140 | 0 | 0.0725429 |
198 | 41 | 1 | 237 | 1 | 0.939522 | 1 | 1 | 1 | 1 | 138 | 120 | 1 | 0.0725942 |
283 | 49 | 1 | 212 | 1 | 1 | 0 | 1 | 1 | 0.956925 | 96 | 128 | 1 | 0.0733429 |
48 | 41 | 3 | 291 | 0 | 1 | 0 | 2 | 1 | 1.42586 | 160 | 120 | 0 | 0.073357 |
206 | 48 | 1 | 260 | 0 | 1 | 2 | 1 | 1 | 1 | 115 | 120 | 1 | 0.0735847 |
251 | 43 | 1 | 175 | 1 | 1 | 1 | 1 | 1 | 1 | 120 | 120 | 1 | 0.0736604 |
216 | 54 | 1 | 224 | 0 | 1 | 2 | 1 | 1 | 1 | 122 | 125 | 1 | 0.0743512 |
27 | 38 | 3 | 275 | 0 | 1.00804 | 0 | 1 | 0 | 1.37389 | 129 | 120 | 0 | 0.0744339 |
83 | 46 | 1 | 240 | 0 | 1 | 0 | 2 | 1 | 1.32033 | 140 | 110 | 0 | 0.0748799 |
253 | 44 | 1 | 290 | 1 | 1 | 2 | 1 | 1 | 1 | 100 | 130 | 1 | 0.0750852 |
65 | 43 | 2 | 240.056 | 0 | 1 | 0 | 1 | 0 | 1.44147 | 175 | 150 | 0 | 0.0752259 |
5 | 32 | 3 | 198 | 0 | 1 | 0 | 1 | 0 | 1.39257 | 165 | 105 | 0 | 0.0754935 |
64 | 43 | 3 | 186 | 0 | 1 | 0 | 1 | 0 | 1.47751 | 154 | 150 | 0 | 0.0760975 |
116 | 51 | 2 | 190 | 0 | 1 | 0 | 1 | 0 | 1.36956 | 120 | 110 | 0 | 0.0768654 |
68 | 44 | 3 | 184 | 0 | 1 | 1 | 1 | 1 | 1 | 142 | 120 | 0 | 0.077039 |
269 | 55 | 1 | 248 | 1 | 1 | 2 | 1 | 1 | 1 | 96 | 145 | 1 | 0.0784453 |
222 | 63 | 1 | 223 | 0 | 1 | 0 | 1 | 1 | 1.19586 | 115 | 150 | 1 | 0.0786586 |
144 | 54 | 3 | 230 | 0 | 1 | 0 | 1 | 0 | 1.48177 | 130 | 150 | 0 | 0.0788217 |
204 | 47 | 1 | 226 | 1 | 1 | 1.5 | 1 | 1 | 1 | 98 | 150 | 1 | 0.078888 |
262 | 52 | 1 | 404 | 1 | 1 | 2 | 1 | 1 | 1 | 124 | 140 | 1 | 0.0790452 |
46 | 41 | 3 | 184 | 0 | 1 | 0 | 1 | 0 | 1.47235 | 180 | 125 | 0 | 0.0791359 |
73 | 45 | 1 | 297 | 0 | 1 | 0 | 1 | 0 | 1.32829 | 144 | 132 | 0 | 0.0798444 |
238 | 52 | 1 | 273.523 | 1 | 1 | 1.5 | 1 | 1 | 1 | 126 | 170 | 1 | 0.0799551 |
58 | 42 | 1 | 358 | 0 | 1 | 0 | 1 | 1 | 1.3385 | 170 | 140 | 0 | 0.0804819 |
114 | 50 | 1 | 215 | 1 | 1 | 0 | 1 | 1 | 1.23782 | 140 | 150 | 0 | 0.0805382 |
192 | 37 | 1 | 207 | 1 | 1 | 1.5 | 1 | 1 | 1 | 130 | 140 | 1 | 0.0807155 |
169 | 56 | 1 | 85 | 0 | 1 | 0 | 1 | 1 | 1.39164 | 140 | 120 | 0 | 0.0808652 |
203 | 47 | 2 | 193 | 1 | 1 | 1 | 1 | 1 | 1 | 145 | 140 | 1 | 0.0824245 |
207 | 48 | 1 | 268 | 1 | 1 | 1 | 1 | 1 | 1 | 103 | 160 | 1 | 0.0824746 |
8 | 33 | 2 | 298 | 0 | 1 | 0 | 1 | 1 | 1.36098 | 185 | 120 | 0 | 0.0830886 |
6 | 32 | 3 | 225 | 0 | 1 | 0 | 1 | 1 | 1.40973 | 184 | 110 | 0 | 0.0831834 |
156 | 55 | 3 | 320 | 0 | 1 | 0 | 1 | 0 | 1.46382 | 155 | 122 | 0 | 0.0851178 |
87 | 47 | 0 | 249 | 0 | 1 | 0 | 1 | 1 | 1.27185 | 150 | 110 | 0 | 0.0854966 |
63 | 43 | 3 | 266 | 0 | 1 | 0 | 1 | 0 | 1.38138 | 118 | 120 | 0 | 0.0855935 |
11 | 34 | 3 | 220 | 0 | 1 | 0 | 1 | 1 | 1.3632 | 150 | 98 | 0 | 0.0861795 |
199 | 43 | 1 | 247 | 1 | 1 | 2 | 1 | 1 | 1 | 130 | 150 | 1 | 0.0869482 |
230 | 46 | 1 | 231 | 1 | 1 | 0 | 1 | 1 | 0.954839 | 115 | 120 | 1 | 0.0870761 |
200 | 46 | 1 | 202 | 1 | 1 | 0 | 1 | 1 | 0.991815 | 150 | 110 | 1 | 0.0879442 |
132 | 53 | 2 | 274 | 0 | 1 | 0 | 1 | 0 | 1.38323 | 130 | 120 | 0 | 0.0884486 |
93 | 48 | 2 | 195 | 0 | 1 | 0 | 1 | 0 | 1.37463 | 125 | 120 | 0 | 0.0886647 |
81 | 46 | 2 | 163 | 0 | 0.995578 | 0 | 1 | 1 | 1.39172 | 116 | 150 | 0 | 0.0887371 |
88 | 47 | 3 | 263 | 0 | 1 | 0 | 1 | 1 | 1.50662 | 174 | 160 | 0 | 0.0891336 |
225 | 38 | 1 | 258.901 | 1 | 1 | 1 | 1 | 1 | 1 | 150 | 110 | 1 | 0.0905199 |
166 | 56 | 3 | 184 | 0 | 1 | 0 | 1 | 1 | 1.43349 | 100 | 130 | 0 | 0.090794 |
16 | 36 | 3 | 166 | 0 | 1 | 0 | 1 | 1 | 1.44486 | 180 | 120 | 0 | 0.0917647 |
104 | 49 | 3 | 253 | 0 | 1 | 0 | 1 | 1 | 1.44605 | 174 | 100 | 0 | 0.0918137 |
128 | 52 | 3 | 196 | 0 | 1 | 0 | 1 | 1 | 1.52954 | 165 | 160 | 0 | 0.0920144 |
239 | 53 | 1 | 246 | 1 | 1 | 0 | 1 | 1 | 0.980103 | 116 | 120 | 1 | 0.0920738 |
124 | 52 | 2 | 272 | 0 | 1 | 0 | 1 | 0 | 1.39657 | 139 | 125 | 0 | 0.0924291 |
249 | 58 | 1 | 263 | 1 | 1 | 2 | 1 | 1 | 1 | 140 | 130 | 1 | 0.0925944 |
164 | 55 | 1 | 229 | 1 | 1 | 0.5 | 1 | 1 | 1 | 110 | 140 | 0 | 0.0933395 |
92 | 48 | 3 | 284 | 0 | 1 | 0 | 1 | 0 | 1.39942 | 120 | 120 | 0 | 0.0933896 |
176 | 58 | 1 | 222 | 0 | 1 | 0 | 1 | 1 | 1.3391 | 100 | 135 | 0 | 0.0943583 |
279 | 43 | 1 | 288 | 1 | 1 | 2 | 1 | 1 | 1 | 135 | 140 | 1 | 0.0953903 |
19 | 36 | 2 | 160 | 0 | 1 | 0 | 1 | 1 | 1.42173 | 172 | 150 | 0 | 0.0962968 |
121 | 51 | 1 | 179 | 0 | 1 | 0 | 1 | 1 | 1.31518 | 100 | 130 | 0 | 0.0964982 |
160 | 55 | 3 | 326 | 0 | 1 | 0 | 1 | 1 | 1.48357 | 155 | 145 | 0 | 0.0965331 |
20 | 37 | 3 | 260 | 0 | 1 | 0 | 1 | 0 | 1.37387 | 130 | 120 | 0 | 0.0966762 |
174 | 58 | 3 | 251 | 0 | 1 | 0 | 1 | 1 | 1.43906 | 110 | 130 | 0 | 0.0969323 |
243 | 54 | 1 | 274.224 | 1 | 1 | 0 | 1 | 1 | 1.00388 | 118 | 140 | 1 | 0.0969578 |
44 | 40 | 2 | 233.377 | 0 | 1 | 0 | 1 | 1 | 1.42926 | 188 | 140 | 0 | 0.0970642 |
36 | 39 | 2 | 339 | 0 | 1 | 0 | 1 | 1 | 1.35734 | 170 | 120 | 0 | 0.097233 |
257 | 49 | 1 | 341 | 1 | 1 | 1 | 1 | 1 | 1 | 120 | 130 | 1 | 0.0981915 |
102 | 49 | 3 | 201 | 0 | 1 | 0 | 1 | 0 | 1.47926 | 164 | 124 | 0 | 0.0998013 |
274 | 59 | 1 | 263.489 | 0 | 1 | 0 | 1 | 1 | 1.16023 | 125 | 130 | 1 | 0.100086 |
123 | 52 | 3 | 245.244 | 0 | 1 | 0 | 1 | 0 | 1.47111 | 140 | 140 | 0 | 0.100332 |
21 | 37 | 2 | 211 | 0 | 1 | 0 | 1 | 0 | 1.36075 | 142 | 130 | 0 | 0.101185 |
201 | 46 | 1 | 186 | 0 | 1 | 0 | 1 | 1 | 1.1109 | 124 | 118 | 1 | 0.101386 |
261 | 52 | 1 | 298 | 1 | 1 | 1 | 1 | 1 | 1 | 110 | 130 | 1 | 0.101688 |
26 | 37 | 1 | 315 | 0 | 1 | 0 | 1 | 1 | 1.30192 | 158 | 130 | 0 | 0.1017 |
202 | 46 | 1 | 277 | 1 | 1 | 1 | 1 | 1 | 1 | 125 | 120 | 1 | 0.101707 |
190 | 35 | 3 | 257 | 0 | 1 | 0 | 1 | 1 | 1.16276 | 140 | 110 | 1 | 0.102072 |
60 | 43 | 3 | 201 | 0 | 1 | 0 | 1 | 0 | 1.4524 | 165 | 120 | 0 | 0.10245 |
245 | 55 | 1 | 268 | 1 | 1 | 1.5 | 1 | 1 | 1 | 128 | 140 | 1 | 0.102462 |
71 | 45 | 3 | 237 | 0 | 1 | 0 | 1 | 0 | 1.47029 | 170 | 130 | 0 | 0.102634 |
236 | 50 | 1 | 233 | 1 | 1 | 2 | 1 | 1 | 1 | 121 | 130 | 1 | 0.102873 |
113 | 50 | 1 | 129 | 0 | 1 | 0 | 1 | 1 | 1.37627 | 135 | 140 | 0 | 0.103391 |
29 | 38 | 2 | 292 | 0 | 1 | 0 | 1 | 1 | 1.34433 | 130 | 145 | 0 | 0.103463 |
101 | 49 | 3 | 237.575 | 0 | 1 | 0 | 1 | 0 | 1.4501 | 160 | 110 | 0 | 0.103521 |
214 | 52 | 1 | 225 | 1 | 1 | 2 | 1 | 1 | 1 | 120 | 130 | 1 | 0.103535 |
209 | 49 | 2 | 265 | 0 | 1 | 0 | 1 | 1 | 1.21401 | 175 | 115 | 1 | 0.103773 |
193 | 38 | 1 | 196 | 0 | 1 | 0 | 1 | 1 | 1.1192 | 166 | 110 | 1 | 0.103921 |
57 | 42 | 2 | 147 | 0 | 1 | 0 | 1 | 1 | 1.42807 | 146 | 160 | 0 | 0.103971 |
2 | 29 | 3 | 234.165 | 0 | 1 | 0 | 1 | 1 | 1.41433 | 170 | 140 | 0 | 0.103982 |
134 | 53 | 3 | 320 | 0 | 1 | 0 | 1 | 1 | 1.47969 | 162 | 140 | 0 | 0.104802 |
197 | 41 | 1 | 289 | 0 | 1 | 0 | 1 | 1 | 1.1158 | 170 | 110 | 1 | 0.104835 |
38 | 39 | 1 | 273 | 0 | 1 | 0 | 1 | 1 | 1.26364 | 132 | 110 | 0 | 0.104857 |
175 | 58 | 2 | 179 | 0 | 1 | 0 | 1 | 1 | 1.47702 | 160 | 140 | 0 | 0.104863 |
108 | 50 | 3 | 202 | 0 | 1 | 0 | 1 | 0 | 1.44341 | 145 | 110 | 0 | 0.104991 |
122 | 52 | 3 | 210 | 0 | 1 | 0 | 1 | 0 | 1.46488 | 148 | 120 | 0 | 0.105036 |
15 | 35 | 3 | 264 | 0 | 1 | 0 | 1 | 1 | 1.44063 | 168 | 150 | 0 | 0.105069 |
287 | 52 | 1 | 266 | 1 | 1 | 2 | 1 | 1 | 1 | 134 | 140 | 1 | 0.10621 |
178 | 59 | 3 | 287 | 0 | 1 | 0 | 1 | 1 | 1.49556 | 150 | 140 | 0 | 0.106245 |
39 | 39 | 1 | 307 | 0 | 1 | 0 | 1 | 1 | 1.28957 | 140 | 130 | 0 | 0.106678 |
194 | 38 | 1 | 282 | 0 | 1 | 0 | 1 | 1 | 1.11737 | 170 | 120 | 1 | 0.107229 |
110 | 50 | 3 | 168 | 0 | 1 | 0 | 1 | 1 | 1.47467 | 160 | 120 | 0 | 0.107247 |
149 | 54 | 3 | 246 | 0 | 1 | 0 | 1 | 1 | 1.4128 | 110 | 120 | 0 | 0.108111 |
258 | 49 | 1 | 234 | 1 | 1 | 1 | 1 | 1 | 1 | 140 | 140 | 1 | 0.108172 |
1 | 29 | 3 | 243 | 0 | 1 | 0 | 1 | 1 | 1.37679 | 160 | 120 | 0 | 0.108396 |
231 | 46 | 1 | 222 | 0 | 1 | 0 | 1 | 1 | 1.1027 | 112 | 130 | 1 | 0.108427 |
47 | 41 | 3 | 245 | 0 | 1 | 0 | 1 | 0 | 1.42871 | 150 | 130 | 0 | 0.108429 |
18 | 36 | 2 | 209 | 0 | 1 | 0 | 1 | 1 | 1.39501 | 178 | 130 | 0 | 0.108539 |
226 | 39 | 1 | 280 | 0 | 1 | 0 | 1 | 1 | 1.08565 | 150 | 110 | 1 | 0.10873 |
34 | 39 | 3 | 240.837 | 0 | 1 | 0 | 1 | 1 | 1.37937 | 120 | 130 | 0 | 0.110684 |
75 | 45 | 2 | 243.4 | 0 | 1 | 0 | 1 | 1 | 1.34597 | 110 | 135 | 0 | 0.110873 |
105 | 49 | 2 | 187 | 0 | 1 | 0 | 1 | 1 | 1.45483 | 172 | 140 | 0 | 0.11151 |
153 | 54 | 2 | 245.877 | 0 | 1 | 0 | 1 | 1 | 1.4127 | 122 | 150 | 0 | 0.111691 |
126 | 52 | 3 | 284 | 0 | 1 | 0 | 1 | 1 | 1.40657 | 118 | 120 | 0 | 0.111722 |
41 | 40 | 3 | 289 | 0 | 1 | 0 | 1 | 1 | 1.44785 | 172 | 140 | 0 | 0.112222 |
181 | 59 | 1 | 242.428 | 0 | 1 | 0 | 1 | 1 | 1.39307 | 140 | 140 | 0 | 0.112398 |
167 | 56 | 2 | 244.212 | 0 | 1 | 0 | 1 | 1 | 1.38763 | 114 | 130 | 0 | 0.112951 |
111 | 50 | 3 | 216 | 0 | 1 | 0 | 1 | 1 | 1.50003 | 170 | 140 | 0 | 0.112987 |
55 | 42 | 3 | 268 | 0 | 1 | 0 | 1 | 1 | 1.42817 | 136 | 150 | 0 | 0.114312 |
99 | 48 | 3 | 238 | 0 | 1 | 0 | 1 | 1 | 1.42436 | 118 | 140 | 0 | 0.114949 |
51 | 41 | 1 | 250 | 0 | 1 | 0 | 1 | 1 | 1.29086 | 142 | 112 | 0 | 0.115035 |
210 | 49 | 1 | 206 | 0 | 1 | 0 | 1 | 1 | 1.18827 | 170 | 130 | 1 | 0.115295 |
25 | 37 | 1 | 223 | 0 | 1 | 0 | 1 | 1 | 1.32205 | 168 | 120 | 0 | 0.115366 |
173 | 58 | 3 | 230 | 0 | 1 | 0 | 1 | 1 | 1.49214 | 150 | 130 | 0 | 0.11884 |
49 | 41 | 3 | 295 | 0 | 1 | 0 | 1 | 1 | 1.42452 | 170 | 120 | 0 | 0.119412 |
212 | 50 | 1 | 264 | 0 | 1 | 0 | 1 | 1 | 1.17306 | 150 | 145 | 1 | 0.120193 |
159 | 55 | 3 | 196 | 0 | 1 | 0 | 1 | 1 | 1.4995 | 150 | 140 | 0 | 0.120452 |
237 | 52 | 1 | 182 | 0 | 1 | 0 | 1 | 1 | 1.16905 | 150 | 120 | 1 | 0.120626 |
215 | 54 | 1 | 216 | 0 | 1 | 0 | 1 | 1 | 1.16328 | 140 | 125 | 1 | 0.121607 |
66 | 43 | 3 | 207 | 0 | 1 | 0 | 1 | 1 | 1.43818 | 138 | 142 | 0 | 0.121936 |
161 | 55 | 2 | 277 | 0 | 1 | 0 | 1 | 1 | 1.40906 | 160 | 110 | 0 | 0.122253 |
28 | 38 | 3 | 297 | 0 | 1 | 0 | 1 | 1 | 1.41162 | 150 | 140 | 0 | 0.123202 |
7 | 32 | 3 | 254 | 0 | 1 | 0 | 1 | 1 | 1.38592 | 155 | 125 | 0 | 0.124091 |
163 | 55 | 1 | 270 | 0 | 1 | 0 | 1 | 1 | 1.34807 | 140 | 120 | 0 | 0.124804 |
43 | 40 | 2 | 281 | 0 | 1 | 0 | 1 | 1 | 1.38178 | 167 | 130 | 0 | 0.125651 |
147 | 54 | 3 | 208 | 0 | 1 | 0 | 1 | 1 | 1.44806 | 142 | 110 | 0 | 0.126332 |
107 | 49 | 1 | 241.799 | 0 | 1 | 0 | 1 | 1 | 1.34211 | 130 | 140 | 0 | 0.126964 |
138 | 53 | 1 | 243 | 0 | 1 | 0 | 1 | 1 | 1.3878 | 155 | 140 | 0 | 0.127054 |
148 | 54 | 3 | 238 | 0 | 1 | 0 | 1 | 1 | 1.46795 | 154 | 120 | 0 | 0.127242 |
100 | 48 | 2 | 211 | 0 | 1 | 0 | 1 | 1 | 1.36923 | 138 | 110 | 0 | 0.127324 |
137 | 53 | 1 | 182 | 0 | 1 | 0 | 1 | 1 | 1.38062 | 148 | 130 | 0 | 0.128076 |
50 | 41 | 3 | 269 | 0 | 1 | 0 | 1 | 1 | 1.4044 | 144 | 125 | 0 | 0.128387 |
133 | 53 | 3 | 240.445 | 0 | 1 | 0 | 1 | 1 | 1.43681 | 132 | 120 | 0 | 0.12858 |
42 | 40 | 2 | 215 | 0 | 1 | 0 | 1 | 1 | 1.36072 | 138 | 130 | 0 | 0.128797 |
119 | 51 | 3 | 188 | 0 | 1 | 0 | 1 | 1 | 1.46193 | 145 | 125 | 0 | 0.129163 |
54 | 42 | 3 | 198 | 0 | 1 | 0 | 1 | 1 | 1.431 | 155 | 120 | 0 | 0.130047 |
135 | 53 | 2 | 195 | 0 | 1 | 0 | 1 | 1 | 1.40632 | 140 | 120 | 0 | 0.130304 |
77 | 45 | 1 | 224 | 0 | 1 | 0 | 1 | 1 | 1.34735 | 144 | 140 | 0 | 0.130991 |
33 | 39 | 3 | 204 | 0 | 1 | 0 | 1 | 1 | 1.40589 | 145 | 120 | 0 | 0.131109 |
24 | 37 | 2 | 194 | 0 | 1 | 0 | 1 | 1 | 1.36811 | 150 | 130 | 0 | 0.13112 |
40 | 40 | 3 | 275 | 0 | 1 | 0 | 1 | 1 | 1.41238 | 150 | 130 | 0 | 0.131227 |
152 | 54 | 2 | 217 | 0 | 1 | 0 | 1 | 1 | 1.40185 | 137 | 120 | 0 | 0.131401 |
76 | 45 | 1 | 225 | 0 | 1 | 0 | 1 | 1 | 1.31877 | 140 | 120 | 0 | 0.131506 |
120 | 51 | 3 | 224 | 0 | 1 | 0 | 1 | 1 | 1.46616 | 150 | 130 | 0 | 0.131625 |
69 | 44 | 3 | 215 | 0 | 1 | 0 | 1 | 1 | 1.42261 | 135 | 130 | 0 | 0.131691 |
53 | 42 | 3 | 196 | 0 | 1 | 0 | 1 | 1 | 1.42536 | 150 | 120 | 0 | 0.132573 |
98 | 48 | 3 | 245 | 0 | 1 | 0 | 1 | 1 | 1.46212 | 160 | 130 | 0 | 0.133118 |
80 | 46 | 2 | 230 | 0 | 1 | 0 | 1 | 1 | 1.38369 | 150 | 120 | 0 | 0.136404 |


Drop Outliers ...
Do you want to drop outliers? [y/n]y
Outliers are dropped.
[ ]:
Example_tasks¶
[2]:
# import datacleanbot and openml
import datacleanbot.dataclean as dc
import openml as oml
import numpy as np
Preparation: Acquire Data¶
The first step is to acquire data from OpneML. The dataset ID can be found in the address.
[3]:
# acquire dataset with dataset ID 4
data = oml.datasets.get_dataset(4)
X, y, categorical_indicator, features = data.get_data(target=data.default_target_attribute, dataset_format='array')
Xy = np.concatenate((X,y.reshape((y.shape[0],1))), axis=1)
Task 1: Show Important Features¶
[4]:
dc.show_important_features(X, y, data.name, features)
Important Features

Task 2: Unify Column Names¶
[5]:
features = dc.unify_name_consistency(features)
Inconsitent Column Names
Column names
============
['duration', 'wage-increase-first-year', 'wage-increase-second-year', 'wage-increase-third-year', 'cost-of-living-adjustment', 'working-hours', 'pension', 'standby-pay', 'shift-differential', 'education-allowance', 'statutory-holidays', 'vacation', 'longterm-disability-assistance', 'contribution-to-dental-plan', 'bereavement-assistance', 'contribution-to-health-plan']
Column names are consistent
Task 3: Show Statistical Information¶
[6]:
dc.show_statistical_info(Xy)
Statistical Information
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 30.000000 | 37.000000 | 37.000000 | 37.000000 | 56.000000 | 22.000000 | 28.000000 | 27.000000 | 31.000000 | 9.000000 | 53.000000 | 51.000000 | 56.000000 | 46.000000 | 15.000000 | 51.000000 | 57.000000 |
mean | 0.100000 | 1.108108 | 1.324324 | 0.594595 | 2.160714 | 0.545455 | 0.285714 | 1.037037 | 4.870968 | 7.444444 | 11.094340 | 0.960784 | 3.803571 | 3.971739 | 3.913333 | 38.039216 | 0.649123 |
std | 0.305129 | 0.774015 | 0.818333 | 0.797895 | 0.707795 | 0.509647 | 0.460044 | 0.939782 | 4.544168 | 5.027701 | 1.259795 | 0.823669 | 1.370596 | 1.164028 | 1.304315 | 2.505680 | 0.481487 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.000000 | 9.000000 | 0.000000 | 2.000000 | 2.000000 | 2.000000 | 27.000000 | 0.000000 |
25% | 0.000000 | 1.000000 | 1.000000 | 0.000000 | 2.000000 | 0.000000 | 0.000000 | 0.000000 | 3.000000 | 2.000000 | 10.000000 | 0.000000 | 2.500000 | 3.000000 | 2.400000 | 37.000000 | 0.000000 |
50% | 0.000000 | 1.000000 | 2.000000 | 0.000000 | 2.000000 | 1.000000 | 0.000000 | 1.000000 | 4.000000 | 8.000000 | 11.000000 | 1.000000 | 4.000000 | 4.000000 | 4.600000 | 38.000000 | 1.000000 |
75% | 0.000000 | 2.000000 | 2.000000 | 1.000000 | 3.000000 | 1.000000 | 1.000000 | 2.000000 | 5.000000 | 12.000000 | 12.000000 | 2.000000 | 4.500000 | 4.500000 | 5.000000 | 40.000000 | 1.000000 |
max | 1.000000 | 2.000000 | 2.000000 | 2.000000 | 3.000000 | 1.000000 | 1.000000 | 2.000000 | 25.000000 | 14.000000 | 15.000000 | 2.000000 | 7.000000 | 7.000000 | 5.100000 | 40.000000 | 1.000000 |
Task 4: Discover Data Types¶
[7]:
# input can be Xy or X
dc.discover_types(Xy)
Discover Data Types
Simple Data Types
['float64', 'int64', 'float64', 'int64', 'int64', 'int64', 'float64', 'int64', 'int64', 'float64', 'int64', 'int64', 'float64', 'float64', 'float64', 'int64', 'bool']
Statistical Data Types
['Type.POSITIVE', 'Type.REAL', 'Type.REAL', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.REAL', 'Type.POSITIVE', 'Type.POSITIVE', 'Type.REAL', 'Type.POSITIVE', 'Type.COUNT']
Task 5: Clean Duplicated Rows¶
[8]:
Xy = dc.clean_duplicated_rows(Xy)
Duplicated Rows
Identifying Duplicated Rows ...
Task 6: Handle Missing Values¶
[9]:
features, Xy = dc.handle_missing(features, Xy)
Missing values
Identify Missing Data ...
The default setting of missing characters is ['n/a', 'na', '--', '?']
Do you want to add extra character? [y/n]n
Number of missing in each feature
0 27
1 20
2 20
3 20
4 1
5 35
6 29
7 30
8 26
9 48
10 4
11 6
12 1
13 11
14 42
15 6
16 0
dtype: int64
Records containing missing values:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 2.0 | NaN | 11.0 | 1.0 | 5.0 | NaN | NaN | 40.0 | 1.0 |
1 | NaN | 2.0 | 2.0 | NaN | 2.0 | 0.0 | NaN | 1.0 | NaN | NaN | 11.0 | 0.0 | 4.5 | 5.8 | NaN | 35.0 | 1.0 |
2 | 0.0 | 1.0 | 1.0 | NaN | NaN | NaN | 0.0 | 2.0 | 5.0 | NaN | 11.0 | 2.0 | NaN | NaN | NaN | 38.0 | 1.0 |
3 | 0.0 | NaN | NaN | 2.0 | 3.0 | 0.0 | NaN | NaN | NaN | NaN | NaN | NaN | 3.7 | 4.0 | 5.0 | NaN | 1.0 |
4 | 0.0 | 1.0 | 1.0 | NaN | 3.0 | NaN | NaN | NaN | NaN | NaN | 12.0 | 1.0 | 4.5 | 4.5 | 5.0 | 40.0 | 1.0 |
Missing correlation between features containing missing values and other features
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.000000 | 0.112373 | 0.259620 | 0.185996 | -0.126773 | 0.030389 | 0.018496 | 0.196296 | -0.304456 | -0.167360 | 0.014479 | 0.132569 | -0.126773 | 0.070290 | 0.327569 | -0.096414 |
1 | 0.112373 | 1.000000 | 0.460811 | 0.152703 | -0.098247 | 0.280850 | 0.575362 | 0.182121 | 0.064742 | -0.084895 | 0.085841 | -0.012609 | -0.098247 | 0.013074 | -0.061512 | -0.132393 |
2 | 0.259620 | 0.460811 | 1.000000 | 0.229730 | -0.098247 | 0.129827 | 0.428296 | 0.182121 | 0.064742 | 0.015918 | 0.229751 | 0.346743 | -0.098247 | 0.106224 | 0.105450 | 0.107175 |
3 | 0.185996 | 0.152703 | 0.229730 | 1.000000 | 0.181757 | 0.280850 | 0.354763 | 0.255745 | -0.082870 | 0.116731 | -0.201979 | -0.012609 | 0.181757 | 0.013074 | 0.021969 | -0.132393 |
4 | -0.126773 | -0.098247 | -0.098247 | 0.181757 | 1.000000 | 0.105946 | -0.135996 | -0.140859 | -0.122380 | 0.057864 | -0.036711 | -0.045835 | 1.000000 | 0.273268 | 0.079860 | -0.045835 |
5 | 0.030389 | 0.280850 | 0.129827 | 0.280850 | 0.105946 | 1.000000 | 0.374342 | 0.113961 | 0.002539 | 0.150845 | -0.064352 | 0.037082 | 0.105946 | -0.160206 | -0.146448 | -0.197772 |
6 | 0.018496 | 0.575362 | 0.428296 | 0.354763 | -0.135996 | 0.374342 | 1.000000 | 0.332923 | 0.195304 | 0.248198 | -0.004820 | -0.006018 | -0.135996 | -0.141967 | -0.268444 | -0.234718 |
7 | 0.196296 | 0.182121 | 0.182121 | 0.255745 | -0.140859 | 0.113961 | 0.332923 | 1.000000 | -0.259902 | 0.071001 | 0.123072 | 0.210905 | -0.140859 | -0.159324 | -0.167984 | -0.018078 |
8 | -0.304456 | 0.064742 | 0.064742 | -0.082870 | -0.122380 | 0.002539 | 0.195304 | -0.259902 | 1.000000 | 0.396558 | 0.299976 | 0.259754 | -0.122380 | -0.269331 | -0.172611 | 0.374528 |
9 | -0.167360 | -0.084895 | 0.015918 | 0.116731 | 0.057864 | 0.150845 | 0.248198 | 0.071001 | 0.396558 | 1.000000 | 0.118958 | 0.148522 | 0.057864 | -0.275913 | -0.149514 | 0.148522 |
10 | 0.014479 | 0.085841 | 0.229751 | -0.201979 | -0.036711 | -0.064352 | -0.004820 | 0.123072 | 0.299976 | 0.118958 | 1.000000 | 0.800943 | -0.036711 | -0.134341 | -0.147760 | 0.353357 |
11 | 0.132569 | -0.012609 | 0.346743 | -0.012609 | -0.045835 | 0.037082 | -0.006018 | 0.210905 | 0.259754 | 0.148522 | 0.800943 | 1.000000 | -0.045835 | -0.167729 | -0.054661 | 0.441176 |
12 | -0.126773 | -0.098247 | -0.098247 | 0.181757 | 1.000000 | 0.105946 | -0.135996 | -0.140859 | -0.122380 | 0.057864 | -0.036711 | -0.045835 | 1.000000 | 0.273268 | 0.079860 | -0.045835 |
13 | 0.070290 | 0.013074 | 0.106224 | 0.013074 | 0.273268 | -0.160206 | -0.141967 | -0.159324 | -0.269331 | -0.275913 | -0.134341 | -0.167729 | 0.273268 | 1.000000 | 0.292239 | -0.022872 |
14 | 0.327569 | -0.061512 | 0.105450 | 0.021969 | 0.079860 | -0.146448 | -0.268444 | -0.167984 | -0.172611 | -0.149514 | -0.147760 | -0.054661 | 0.079860 | 0.292239 | 1.000000 | -0.054661 |
15 | -0.096414 | -0.132393 | 0.107175 | -0.132393 | -0.045835 | -0.197772 | -0.234718 | -0.018078 | 0.374528 | 0.148522 | 0.353357 | 0.441176 | -0.045835 | -0.022872 | -0.054661 | 1.000000 |
Visualize Missing Data ...



Clean Missing Data ...
Choose the missing mechanism [a/b/c/d]:
a.MCAR b.MAR c.MNAR d.Skip
a
Missing percentage is 0.9824561403508771
Imputation score of mean is 0.8515151515151516
Imputation score of mode is 0.8674242424242424
Imputation score of knn is 0.9299242424242424
Imputation score of matrix factorization is 0.9299242424242424
Imputation score of multiple imputation is 0.9291666666666667
Imputation method with the highest socre is knn
The recommended approach is knn
Do you want to apply the recommended approach? [y/n]n
Choose the approach you want to apply [a/b/c/d/e/skip]:
a.Mean b.Mode c.K Nearest Neighbor d.Matrix Factorization e. Multiple Imputation
a
Applying mean imputation ...
Missing values cleaned!
Task 7: Handle Outliers¶
[10]:
Xy = dc.handle_outlier(features,Xy)
Outliers
Recommend Algorithm ...
The recommended approach is isolation forest.
Do you want to apply the recommended outlier detection approach? [y/n]y
Visualize Outliers ...

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | anomaly_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
36 | 1 | 0 | 0 | 2 | 1 | 1 | 1 | 1 | 0 | 4 | 11 | 2 | 2 | 3.97174 | 3.91333 | 40 | 0 | -0.0883411 |
34 | 0 | 1 | 2 | 2 | 3 | 1 | 1 | 0 | 1 | 2 | 10 | 0 | 2 | 2.5 | 2.1 | 40 | 0 | -0.0724028 |
40 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 4.87097 | 7.44444 | 11 | 1 | 4 | 3.97174 | 3.91333 | 38.0392 | 0 | -0.0318737 |
18 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 4.87097 | 7.44444 | 11 | 1 | 2 | 3.97174 | 3.91333 | 38 | 0 | -0.0313807 |
56 | 0 | 2 | 2 | 0.594595 | 3 | 0.545455 | 0 | 1.03704 | 14 | 7.44444 | 9 | 2 | 6 | 6 | 4 | 35 | 1 | -0.0218354 |
8 | 0 | 1 | 1.32432 | 0.594595 | 2 | 0 | 0 | 1.03704 | 25 | 12 | 11 | 0 | 3 | 7 | 3.91333 | 38 | 1 | -0.0199344 |
17 | 0.1 | 1 | 0 | 2 | 1 | 1 | 0 | 1 | 3 | 2 | 9 | 0 | 2.1 | 3.97174 | 3.91333 | 40 | 0 | -0.0124747 |
31 | 0 | 1 | 2 | 2 | 3 | 1 | 0 | 0 | 5 | 7.44444 | 10 | 0 | 3 | 2 | 2.5 | 40 | 0 | -0.00911494 |
6 | 0 | 0 | 1 | 2 | 3 | 0.545455 | 0 | 2 | 4.87097 | 7.44444 | 12 | 2 | 4 | 5 | 5 | 38.0392 | 1 | -0.00885507 |
37 | 0.1 | 1 | 0 | 0 | 1 | 1 | 0 | 2 | 3 | 2 | 9 | 0 | 2.8 | 3.97174 | 3.91333 | 38 | 0 | -0.00362979 |
38 | 0 | 1.10811 | 0 | 0.594595 | 3 | 0.545455 | 0.285714 | 2 | 4.87097 | 7.44444 | 10 | 1 | 2 | 2.5 | 2 | 37 | 0 | 0.00190097 |
25 | 0 | 1 | 2 | 0 | 3 | 0.545455 | 0.285714 | 0 | 4.87097 | 7.44444 | 10 | 0 | 2 | 2 | 2 | 40 | 0 | 0.00266917 |
33 | 0.1 | 0 | 0 | 0 | 2 | 1 | 1 | 0 | 3 | 7.44444 | 10 | 0 | 4 | 5 | 3.91333 | 40 | 0 | 0.00353423 |
41 | 0 | 0 | 2 | 0 | 2 | 0 | 0 | 2 | 4.87097 | 7.44444 | 12 | 2 | 2 | 3 | 3.91333 | 38 | 0 | 0.0133322 |
35 | 0 | 0 | 2 | 0 | 2 | 1 | 0 | 0 | 4.87097 | 7.44444 | 11 | 1 | 2 | 2 | 3.91333 | 40 | 0 | 0.0134714 |
19 | 0.1 | 1.10811 | 1.32432 | 1 | 2 | 0.545455 | 0.285714 | 1.03704 | 5 | 13 | 15 | 2 | 4 | 5 | 3.91333 | 35 | 1 | 0.0161718 |
7 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 3 | 0.545455 | 0.285714 | 1.03704 | 3 | 7.44444 | 12 | 0 | 6.9 | 4.8 | 2.3 | 40 | 1 | 0.0208748 |
11 | 0.1 | 2 | 1.32432 | 0.594595 | 2 | 0.545455 | 0.285714 | 1.03704 | 4 | 7.44444 | 15 | 0.960784 | 6.4 | 6.4 | 3.91333 | 38 | 1 | 0.0243395 |
14 | 0.1 | 1.10811 | 1.32432 | 0 | 1 | 1 | 0.285714 | 1.03704 | 10 | 7.44444 | 11 | 2 | 3 | 3.97174 | 3.91333 | 36 | 1 | 0.0247281 |
9 | 0.1 | 2 | 1.32432 | 0 | 1 | 0.545455 | 0 | 2 | 4 | 7.44444 | 11 | 2 | 5.7 | 3.97174 | 3.91333 | 40 | 1 | 0.0269993 |
44 | 0.1 | 0 | 0 | 0 | 2 | 0.545455 | 1 | 0 | 3 | 7.44444 | 10 | 0 | 4 | 4 | 3.91333 | 40 | 0 | 0.0270557 |
27 | 0 | 1.10811 | 2 | 0 | 2 | 0 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 12 | 2 | 3 | 3 | 3.91333 | 33 | 1 | 0.0310571 |
13 | 0 | 2 | 2 | 1 | 3 | 0.545455 | 0.285714 | 1.03704 | 4 | 7.44444 | 13 | 2 | 3.5 | 4 | 5.1 | 37 | 1 | 0.0398261 |
26 | 0.1 | 0 | 1 | 1 | 2 | 0 | 0 | 1.03704 | 4.87097 | 7.44444 | 10 | 0 | 4.5 | 4.5 | 3.91333 | 38.0392 | 1 | 0.0399781 |
29 | 0 | 1.10811 | 2 | 0.594595 | 3 | 0.545455 | 0.285714 | 0 | 4.87097 | 7.44444 | 10 | 1 | 2 | 2.5 | 3.91333 | 35 | 0 | 0.0402457 |
53 | 0.1 | 2 | 2 | 0 | 3 | 0.545455 | 0 | 2 | 6 | 7.44444 | 11 | 1 | 4 | 3.5 | 3.91333 | 40 | 1 | 0.0404225 |
42 | 0 | 1.10811 | 1.32432 | 2 | 2 | 0.545455 | 0.285714 | 2 | 4.87097 | 7.44444 | 12 | 1 | 2.5 | 2.5 | 3.91333 | 39 | 0 | 0.0435786 |
54 | 0.1 | 1.10811 | 2 | 0 | 3 | 0.545455 | 0 | 2 | 6 | 10 | 11 | 2 | 5 | 4.4 | 3.91333 | 38 | 1 | 0.0459069 |
1 | 0.1 | 2 | 2 | 0.594595 | 2 | 0 | 0.285714 | 1 | 4.87097 | 7.44444 | 11 | 0 | 4.5 | 5.8 | 3.91333 | 35 | 1 | 0.0467527 |
24 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 1 | 0.545455 | 0.285714 | 1.03704 | 3 | 8 | 9 | 2 | 6 | 3.97174 | 3.91333 | 38 | 1 | 0.0475632 |
51 | 0 | 1 | 1.32432 | 1 | 3 | 0 | 0 | 2 | 4.87097 | 7.44444 | 11.0943 | 0.960784 | 2 | 3 | 3.91333 | 38.0392 | 1 | 0.0477415 |
28 | 0 | 2 | 2 | 0 | 2 | 1 | 0 | 1.03704 | 5 | 7.44444 | 11 | 0 | 5 | 4 | 3.91333 | 37 | 1 | 0.0485515 |
22 | 0.1 | 1.10811 | 1.32432 | 1 | 3 | 0.545455 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 11.0943 | 0.960784 | 3.5 | 4 | 4.6 | 27 | 1 | 0.0497246 |
45 | 0.1 | 1 | 1 | 0.594595 | 2 | 1 | 1 | 1.03704 | 2 | 7.44444 | 10 | 0 | 4.5 | 4 | 3.91333 | 40 | 0 | 0.0502415 |
3 | 0 | 1.10811 | 1.32432 | 2 | 3 | 0 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 11.0943 | 0.960784 | 3.7 | 4 | 5 | 38.0392 | 1 | 0.0507025 |
5 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 2 | 0 | 0.285714 | 1.03704 | 6 | 7.44444 | 12 | 1 | 2 | 2.5 | 3.91333 | 35 | 1 | 0.0529418 |
12 | 0.1 | 1 | 1 | 0 | 2 | 1 | 1 | 1.03704 | 2 | 7.44444 | 10 | 0 | 3.5 | 4 | 3.91333 | 40 | 0 | 0.0536357 |
10 | 0 | 1.10811 | 2 | 0 | 3 | 0.545455 | 0.285714 | 1.03704 | 3 | 7.44444 | 13 | 2 | 3.5 | 4 | 4.6 | 36 | 1 | 0.0586344 |
52 | 0 | 1.10811 | 2 | 1 | 3 | 0.545455 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 13 | 2 | 3.5 | 4 | 4.5 | 35 | 1 | 0.0599152 |
49 | 0 | 2 | 2 | 0 | 2 | 0.545455 | 0 | 1 | 4.87097 | 7.44444 | 11 | 1 | 5.7 | 4.5 | 3.91333 | 40 | 1 | 0.0640585 |
16 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 1 | 0.545455 | 0.285714 | 1.03704 | 2 | 7.44444 | 12 | 0 | 2.8 | 3.97174 | 3.91333 | 35 | 1 | 0.0649904 |
48 | 0.1 | 1.10811 | 2 | 0 | 2 | 0.545455 | 0 | 1.03704 | 5 | 14 | 11 | 0 | 5 | 4.5 | 3.91333 | 38 | 1 | 0.0669109 |
2 | 0 | 1 | 1 | 0.594595 | 2.16071 | 0.545455 | 0 | 2 | 5 | 7.44444 | 11 | 2 | 3.80357 | 3.97174 | 3.91333 | 38 | 1 | 0.067712 |
43 | 0 | 1.10811 | 1.32432 | 1 | 2 | 0.545455 | 0.285714 | 0 | 4.87097 | 7.44444 | 11 | 0 | 2.5 | 3 | 3.91333 | 40 | 0 | 0.0685044 |
15 | 0 | 2 | 1.32432 | 0 | 2 | 0.545455 | 0.285714 | 2 | 4.87097 | 7.44444 | 11 | 1 | 4.5 | 4 | 3.91333 | 37 | 1 | 0.0695864 |
50 | 0.1 | 2 | 1.32432 | 0.594595 | 2 | 0.545455 | 0 | 1.03704 | 4.87097 | 7.44444 | 11 | 0.960784 | 7 | 5.3 | 3.91333 | 38.0392 | 1 | 0.0718371 |
39 | 0 | 2 | 1 | 0 | 2 | 0.545455 | 0 | 1.03704 | 4 | 7.44444 | 12 | 1 | 4.5 | 4 | 3.91333 | 40 | 1 | 0.0732847 |
55 | 0 | 1 | 1 | 0.594595 | 3 | 0.545455 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 12 | 1 | 5 | 5 | 5 | 40 | 1 | 0.0735484 |
21 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 2 | 0.545455 | 0.285714 | 0 | 4.87097 | 7.44444 | 11 | 0 | 2.5 | 3 | 3.91333 | 40 | 0 | 0.0737297 |
30 | 0.1 | 1 | 1.32432 | 0 | 3 | 1 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 11 | 1 | 4.5 | 4.5 | 5 | 40 | 1 | 0.0739734 |
32 | 0.1 | 1.10811 | 1.32432 | 0.594595 | 2 | 0.545455 | 0.285714 | 2 | 4.87097 | 7.44444 | 10 | 1 | 2.5 | 2.5 | 3.91333 | 38 | 0 | 0.0760219 |
20 | 0.1 | 2 | 2 | 0.594595 | 2 | 0.545455 | 0.285714 | 1.03704 | 4 | 7.44444 | 12 | 2 | 4.3 | 4.4 | 3.91333 | 38 | 1 | 0.0796261 |
4 | 0 | 1 | 1 | 0.594595 | 3 | 0.545455 | 0.285714 | 1.03704 | 4.87097 | 7.44444 | 12 | 1 | 4.5 | 4.5 | 5 | 40 | 1 | 0.0797176 |
0 | 0 | 1.10811 | 1.32432 | 0.594595 | 1 | 0.545455 | 0.285714 | 1.03704 | 2 | 7.44444 | 11 | 1 | 5 | 3.97174 | 3.91333 | 40 | 1 | 0.0805313 |
46 | 0 | 2 | 2 | 0 | 2 | 0.545455 | 0.285714 | 1.03704 | 5 | 7.44444 | 11 | 1 | 4.5 | 4 | 3.91333 | 40 | 1 | 0.0913668 |
23 | 0.1 | 1 | 2 | 0.594595 | 2 | 0.545455 | 0.285714 | 1.03704 | 4 | 7.44444 | 10 | 2 | 4.5 | 4 | 3.91333 | 40 | 1 | 0.0932795 |
47 | 0.1 | 1 | 1 | 1 | 2 | 0.545455 | 0 | 1.03704 | 4.87097 | 7.44444 | 11.0943 | 0.960784 | 4.6 | 4.6 | 3.91333 | 38 | 1 | 0.101976 |


Drop Outliers ...
Do you want to drop outliers? [y/n]n
Outliers are kept.
[ ]:
API Reference¶
API¶
A data cleaning Python tool.
-
dataclean.
autoclean
(Xy, dataset_name, features)¶ Auto-cleans data.
The following aspects are automatically cleaned: show important features; show statistical information; discover the data type for each feature; identify the duplicated rowsl; unify the inconsistent column names; handle missing values; handle outliers.
Parameters: Xy : array-like
Complete data.
dataset_name : string
features : list
List of feature names.
Returns: Xy_cleaned : array-like
Cleaned data.
-
dataclean.
build_forest
(X, y)¶ Build random forest model from the dataset and compute important features
Parameters: X : array-like, shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape (n_samples,)
Target values (class labels in classification, real numbers in regression).
Returns: importances : array, shape = [n_features]
The feature importances (the higher, the more important the feature).
indices : array, shape = [n_features]
Reverse the importances.
-
dataclean.
clean_duplicated_rows
(Xy)¶ Clean duplicated rows.
Parameters: Xy : array-like
Complete numpy array (target required) of the dataset.
Returns: Xy : array-like
Original data.
Xy_no_duplicate : array-like
Cleaned data without duplicated rows if user wants to drop the duplicated rows.
-
dataclean.
clean_missing
(df, features)¶ Clean missing values in the dataset.
Parameters: df : DataFrame
features : List
List of feature names.
Returns: features_new : List
List of feature names after cleaning.
Xy_filled : array-like
Numpy array where missing values have been cleaned.
-
dataclean.
compute_clustering_metafeatures
(X)¶ Computes clustering meta features.
The following 3 clustering meta features are adopted: Silhouette Coefficient; Calinski_Harabasz Index; Davies_Bouldin Index.
-
dataclean.
compute_imputation_score
(Xy)¶ Computes score of the imputation by applying simple classifiers.
The following simple learners are evaluated: Naive Bayes Learner; Linear Discriminant Learner; One Nearest Neighbor Learner; Decision Node Learner.
Parameters: Xy : array-like
Complete numpy array of the dataset. The training array X has to be imputed already, and the target y is required here and not optional in order to predict the performance of the imputation method.
Returns: imputation_score : float
Predicted score of the imputation method.
-
dataclean.
compute_metafeatures
(X, y)¶ Computes landmarking meta features.
The following landmarking features are computed: Naive Bayes Learner; Linear Discriminant Learner; One Nearest Neighbor Learner; Decision Node Learner; Randomly Chosen Node Learner.
-
dataclean.
deal_mar
(df)¶ Deal with missing data with missing at random pattern.
-
dataclean.
deal_mcar
(df)¶ Deal with missing data with missing completely at random pattern.
-
dataclean.
deal_mnar
(df)¶ Deal with missing data with missing at random pattern.
-
dataclean.
discover_type_heuristic
(data)¶ Infer data types for each feature using simple logic
Parameters: data : numpy array or dataframe
Numeric data needs to be 64 bit.
Returns: result : list
List of data types.
-
dataclean.
discover_types
(Xy)¶ Discover types for numpy array.
Both simple logic rules and Bayesian methods are applied. Bayesian methods can only be applied if Xy are numeric.
Parameters: Xy : numpy array or DataFrame
Xy can only be numeric in order to run the Bayesian model.
-
dataclean.
drop_duplicated_rows
(dataframe)¶ Drop duplicatd rows.
-
dataclean.
drop_outliers
(df, df_outliers)¶ Drops the detected outliers.
-
dataclean.
handle_missing
(features, Xy)¶ Handle missing values.
Recommend the approprate approach to the user given the missing mechanism of the dataset. The user can choose to adopt the recommended approach or take another available approach.
For MCAR, the following methods are evaluated: ‘list deletion’, ‘mean’, ‘mode’, ‘k nearest neighbors’, ‘matrix factorization’, ‘multiple imputation’.
For MAR, the following methods are evaluated: ‘k nearest neighbors’, ‘matrix factorization’, ‘multiple imputation’.
For MNAR, ‘multiple imputation’ is adopted.
Parameters: features : list
List of feature names.
Xy : array-like
Complete numpy array (target required and not optional).
Returns: features_new : List
List of feature names after cleaning.
Xy_filled : array-like
Numpy array where missing values have been cleaned.
-
dataclean.
handle_outlier
(features, Xy)¶ Cleans the outliers.
Recommends the algorithm to the user to detect the outliers and presents the outliers to the user in effective visualizations. The user can decides whether or not to keep the outliers.
Parameters: features : list
List of feature names.
Xy : array-like
Numpy array. Both training vectors and target are required.
Returns: Xy_no_outliers : array-like
Cleaned data where outliers are dropped.
Xy : array-like
Original data where outliers are not found or kept.
-
dataclean.
highlight_outlier
(data)¶ Highlight the maximum in a Series yellow.
-
dataclean.
identify_missing
(df=None)¶ Detect missing values.
Identify the common missing characters such as ‘n/a’, ‘na’, ‘–’ and ‘?’ as missing. User can also customize the characters to be identified as missing.
Parameters: df : DataFrame
Raw data formatted in DataFrame.
Returns: flag : bool
Indicates whether missing values are detected. If true, missing values are detected. Otherwise not.
-
dataclean.
identify_missing_mechanism
(df=None)¶ Tries to guess the missing mechanism of the dataset.
Missing mechanism is not really testable. There may be reasons to suspect that the dataset belongs to one missing mechanism based on the missing correlation between features, but the result is not definite. Relevant information are provided to help the user make the decision. Three missng mechanisms to be guessed: MCAR: Missing completely at ramdom MAR: Missing at random MNAR: Missing not at random (not available here, normally involes field expert)
Parameters: df : DataFrame
Raw data formatted in DataFrame.
-
dataclean.
identify_outliers
(df, algorithm=0, detailed=False)¶ Identifies outliers in multi dimension.
Dataset has to be parsed as numeric beforehand.
-
dataclean.
infer_feature_type
(feature)¶ Infer data types for the given feature using simple logic.
Possible data types to infer: boolean, date, float, integer, string Feature that is not either a boolean, a date, a float or an integer, is classified as a string.
Parameters: feature : array-like
A feature/attribute vector.
Returns: data_type : string
The data type of the given feature/attribute.
-
dataclean.
missing_preprocess
(features, df=None)¶ Drops the redundant information.
Redundant information is dropped before imputation. Detects and drops empty rows. Detects features and instances with extreme large proportion of missing data and reports to the user.
Parameters: features : list
List of feature names.
df : DataFrame
Returns: df : DataFrame
New DataFrame where redundant information may have been deleted.
features_new: list
List of feature names after preprocessing.
-
dataclean.
plot_feature_importances
(dataset_name, features, importances, indices)¶ Plot the 15 most important features.
-
dataclean.
predict_best_anomaly_algorithm
(X, y)¶ Predicts best anomaly detection algorithm.
Recommends the best anomaly detection algorithm to the user given the characteristics of the dataset. The following algorithms are considered: 0: isolation forest; 1: local outlier factor; 2: one class support vector machine.
-
dataclean.
show_important_features
(X, y, data_name, features)¶ Show the most important features of the given dataset.
Computes the most important features of the given dataset using random forest, and present the 15 most useful features to the user with a bar chart.
Parameters: X : array-like, shape (n_samples, n_features)
Training vectors, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape (n_samples,)
Target values (class labels in classification, real numbers in regression).
data_name : string
Dataset name.
features : list
List of feature names.
-
dataclean.
show_statistical_info
(Xy)¶ Show statistical information of the given dataset
Parameters: Xy : array-like
-
dataclean.
train_metalearner
()¶ Train metalearner
-
dataclean.
unify_name_consistency
(names)¶ Unify inconsistent column names.
Parameters: names : list
List of original column names.
Returns: names : list
Unified column names.
-
dataclean.
visualize_missing
(df=None)¶ Visualize missing values.
The missingness of the dataset is visualized in bar chart, matrix and heatmap.
-
dataclean.
visualize_outliers_parallel_coordinates
(df_scaled, df_pred)¶ Visualizes high-dimensional outliers with a parallel coordinates plot.
-
dataclean.
visualize_outliers_scatter
(df, df_pred)¶ Visualizes high-dimensional outliers with a scatter plot.
Selects out the two features most likely to have outliers and shows them in a scatter plot.