Dataset statistics
Number of variables | 6 |
---|---|
Number of observations | 1400 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 65.8 KiB |
Average record size in memory | 48.1 B |
Variable types
Numeric | 5 |
---|---|
Categorical | 1 |
humidity is
highly correlated with temperature and 4 other
fields |
High correlation |
water availability is
highly correlated with temperature and 3 other fields |
High correlation |
label is highly
correlated with temperature and 4 other
fields |
High correlation |
season is
highly correlated with temperature and 4 other
fields |
High correlation |
temperature is
highly correlated with humidity and 3 other fields
|
High correlation |
ph is highly
correlated with humidity and 2 other fields |
High correlation |
label has 100
(7.1%) zeros |
Zeros |
Reproduction
Analysis started | 2023-03-03 12:16:08.202051 |
---|---|
Analysis finished | 2023-03-03 12:16:15.347807 |
Duration | 7.15 seconds |
Software version | pandas-profiling v3.4.0 |
Download configuration | config.json |
Distinct | 1300 |
---|---|
Distinct (%) | 92.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 24.97162055 |
Minimum | 15.33042636 |
---|---|
Maximum | 36.97794384 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 11.1 KiB |
Quantile statistics
Minimum | 15.33042636 |
---|---|
5-th percentile | 18.25405352 |
Q1 | 22.17823907 |
median | 25.14024451 |
Q3 | 27.96322684 |
95-th percentile | 31.22032229 |
Maximum | 36.97794384 |
Range | 21.64751748 |
Interquartile range (IQR) | 5.784987767 |
Descriptive statistics
Standard deviation | 4.081622446 |
---|---|
Coefficient of variation (CV) | 0.1634504432 |
Kurtosis | -0.3263094809 |
Mean | 24.97162055 |
Median Absolute Deviation (MAD) | 2.865742815 |
Skewness | 0.01907108973 |
Sum | 34960.26877 |
Variance | 16.65964179 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
25.33797709 | 2 | 0.1% |
21.869274 | 2 | 0.1% |
23.39128187 | 2 | 0.1% |
18.41932981 | 2 | 0.1% |
20.27317074 | 2 | 0.1% |
24.71417533 | 2 | 0.1% |
22.61359953 | 2 | 0.1% |
26.10018422 | 2 | 0.1% |
23.55882094 | 2 | 0.1% |
19.97215954 | 2 | 0.1% |
Other values (1290) | 1380 |
Value | Count | Frequency (%) |
15.33042636 | 1 | |
15.43546065 | 1 | |
15.46789263 | 1 | |
15.53834801 | 1 | |
15.77370214 | 1 | |
15.78601387 | 1 | |
16.03768615 | 1 | |
16.06522754 | 1 | |
16.24469193 | 1 | |
16.43340342 | 1 |
Value | Count | Frequency (%) |
36.97794384 | 1 | |
36.89163721 | 1 | |
36.75087487 | 1 | |
36.51268371 | 1 | |
36.30049702 | 1 | |
36.20970524 | 1 | |
36.04353699 | 1 | |
36.00415838 | 1 | |
35.95176642 | 1 | |
35.45790488 | 1 |
Distinct | 1300 |
---|---|
Distinct (%) | 92.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 64.61106202 |
Minimum | 14.25803981 |
---|---|
Maximum | 94.96218673 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 11.1 KiB |
Quantile statistics
Minimum | 14.25803981 |
---|---|
5-th percentile | 17.92929023 |
Q1 | 56.82421692 |
median | 68.28832112 |
Q3 | 82.71040882 |
95-th percentile | 91.21936824 |
Maximum | 94.96218673 |
Range | 80.70414692 |
Interquartile range (IQR) | 25.88619191 |
Descriptive statistics
Standard deviation | 22.75378493 |
---|---|
Coefficient of variation (CV) | 0.3521654686 |
Kurtosis | -0.2400944603 |
Mean | 64.61106202 |
Median Absolute Deviation (MAD) | 13.76174229 |
Skewness | -0.8991506833 |
Sum | 90455.48682 |
Variance | 517.7347287 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
68.49835977 | 2 | 0.1% |
61.91044947 | 2 | 0.1% |
61.74427165 | 2 | 0.1% |
64.23580251 | 2 | 0.1% |
63.91281869 | 2 | 0.1% |
56.73426469 | 2 | 0.1% |
63.69070564 | 2 | 0.1% |
71.57476937 | 2 | 0.1% |
71.59351368 | 2 | 0.1% |
57.68272924 | 2 | 0.1% |
Other values (1290) | 1380 |
Value | Count | Frequency (%) |
14.25803981 | 1 | |
14.27327988 | 1 | |
14.2804191 | 1 | |
14.32313811 | 1 | |
14.33847406 | 1 | |
14.42457525 | 1 | |
14.44008871 | 1 | |
14.44228303 | 1 | |
14.69765308 | 1 | |
14.70085967 | 1 |
Value | Count | Frequency (%) |
94.96218673 | 1 | |
94.87679041 | 1 | |
94.86907886 | 1 | |
94.81637388 | 1 | |
94.79453182 | 1 | |
94.78993038 | 1 | |
94.72981338 | 1 | |
94.65343534 | 1 | |
94.57459443 | 1 | |
94.55695552 | 1 |
Distinct | 1300 |
---|---|
Distinct (%) | 92.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 6.565245964 |
Minimum | 3.504752314 |
---|---|
Maximum | 9.93509073 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 11.1 KiB |
Quantile statistics
Minimum | 3.504752314 |
---|---|
5-th percentile | 5.386097545 |
Q1 | 6.06879526 |
median | 6.524478032 |
Q3 | 7.042342972 |
95-th percentile | 7.881354651 |
Maximum | 9.93509073 |
Range | 6.430338416 |
Interquartile range (IQR) | 0.9735477118 |
Descriptive statistics
Standard deviation | 0.8351014206 |
---|---|
Coefficient of variation (CV) | 0.127200325 |
Kurtosis | 1.662209163 |
Mean | 6.565245964 |
Median Absolute Deviation (MAD) | 0.4874527745 |
Skewness | 0.1729917217 |
Sum | 9191.34435 |
Variance | 0.6973943826 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
6.586244581 | 2 | 0.1% |
5.850439831 | 2 | 0.1% |
5.871647806 | 2 | 0.1% |
6.474476516 | 2 | 0.1% |
6.439071996 | 2 | 0.1% |
6.648725327 | 2 | 0.1% |
5.749914421 | 2 | 0.1% |
6.931756558 | 2 | 0.1% |
6.657964753 | 2 | 0.1% |
6.596060648 | 2 | 0.1% |
Other values (1290) | 1380 |
Value | Count | Frequency (%) |
3.504752314 | 1 | |
3.510404312 | 1 | |
3.5253661 | 1 | |
3.532008668 | 1 | |
3.558822825 | 1 | |
3.692863601 | 1 | |
3.71105919 | 1 | |
3.793575185 | 1 | |
3.808429173 | 1 | |
3.828031463 | 1 |
Value | Count | Frequency (%) |
9.93509073 | 1 | |
9.926212291 | 1 | |
9.679240873 | 1 | |
9.45949344 | 1 | |
9.416003106 | 1 | |
9.406887533 | 1 | |
9.392694614 | 1 | |
9.254089438 | 1 | |
9.160691747 | 1 | |
9.112771682 | 1 |
Distinct | 1300 |
---|---|
Distinct (%) | 92.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 91.7846509 |
Minimum | 20.21126747 |
---|---|
Maximum | 298.5601175 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 11.1 KiB |
Quantile statistics
Minimum | 20.21126747 |
---|---|
5-th percentile | 26.51740556 |
Q1 | 51.54654173 |
median | 72.37918315 |
Q3 | 107.4283336 |
95-th percentile | 209.9458982 |
Maximum | 298.5601175 |
Range | 278.34885 |
Interquartile range (IQR) | 55.88179184 |
Descriptive statistics
Standard deviation | 58.68225767 |
---|---|
Coefficient of variation (CV) | 0.6393471795 |
Kurtosis | 1.312178796 |
Mean | 91.7846509 |
Median Absolute Deviation (MAD) | 24.40939669 |
Skewness | 1.364459853 |
Sum | 128498.5113 |
Variance | 3443.607365 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
96.46380213 | 2 | 0.1% |
107.2681929 | 2 | 0.1% |
107.3198135 | 2 | 0.1% |
76.41312437 | 2 | 0.1% |
62.50351892 | 2 | 0.1% |
88.45361858 | 2 | 0.1% |
87.75953857 | 2 | 0.1% |
102.2662445 | 2 | 0.1% |
66.71995467 | 2 | 0.1% |
60.65171481 | 2 | 0.1% |
Other values (1290) | 1380 |
Value | Count | Frequency (%) |
20.21126747 | 1 | |
20.36001144 | 1 | |
20.39020503 | 1 | |
20.49035619 | 1 | |
20.66127836 | 1 | |
20.76212031 | 1 | |
20.76223014 | 1 | |
20.76582087 | 1 | |
20.88620369 | 1 | |
21.0000988 | 1 |
Value | Count | Frequency (%) |
298.5601175 | 1 | |
298.4018471 | 1 | |
295.9248796 | 1 | |
295.6094492 | 1 | |
291.2986618 | 1 | |
290.6793783 | 1 | |
287.5766935 | 1 | |
286.5083725 | 1 | |
285.2493645 | 1 | |
284.4364567 | 1 |
Distinct | 4 |
---|---|
Distinct (%) | 0.3% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 11.1 KiB |
0 | |
---|---|
1 | |
3 | |
2 |
Common Values
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Most occurring characters
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 1400 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 1400 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1400 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 600 | |
1 | 400 | |
3 | 300 | |
2 | 100 | 7.1% |
Distinct | 13 |
---|---|
Distinct (%) | 0.9% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 5.642857143 |
Minimum | 0 |
---|---|
Maximum | 12 |
Zeros | 100 |
Zeros (%) | 7.1% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 11.1 KiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 2 |
median | 5.5 |
Q3 | 9 |
95-th percentile | 12 |
Maximum | 12 |
Range | 12 |
Interquartile range (IQR) | 7 |
Descriptive statistics
Standard deviation | 3.82996617 |
---|---|
Coefficient of variation (CV) | 0.6787281821 |
Kurtosis | -1.286860175 |
Mean | 5.642857143 |
Median Absolute Deviation (MAD) | 3.5 |
Skewness | 0.1216937038 |
Sum | 7900 |
Variance | 14.66864087 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
1 | 200 | |
0 | 100 | 7.1% |
2 | 100 | 7.1% |
3 | 100 | 7.1% |
4 | 100 | 7.1% |
5 | 100 | 7.1% |
6 | 100 | 7.1% |
7 | 100 | 7.1% |
8 | 100 | 7.1% |
9 | 100 | 7.1% |
Other values (3) | 300 |
Value | Count | Frequency (%) |
0 | 100 | |
1 | 200 | |
2 | 100 | |
3 | 100 | |
4 | 100 | |
5 | 100 | |
6 | 100 | |
7 | 100 | |
8 | 100 | |
9 | 100 |
Value | Count | Frequency (%) |
12 | 100 | |
11 | 100 | |
10 | 100 | |
9 | 100 | |
8 | 100 | |
7 | 100 | |
6 | 100 | |
5 | 100 | |
4 | 100 | |
3 | 100 |
Auto
The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
temperature | humidity | ph | water availability | season | label | |
---|---|---|---|---|---|---|
0 | 20.879744 | 82.002744 | 6.502985 | 202.935536 | 0 | 0 |
1 | 21.770462 | 80.319644 | 7.038096 | 226.655537 | 0 | 0 |
2 | 23.004459 | 82.320763 | 7.840207 | 263.964248 | 0 | 0 |
3 | 26.491096 | 80.158363 | 6.980401 | 242.864034 | 0 | 0 |
4 | 20.130175 | 81.604873 | 7.628473 | 262.717340 | 0 | 0 |
5 | 23.058049 | 83.370118 | 7.073454 | 251.055000 | 0 | 0 |
6 | 22.708838 | 82.639414 | 5.700806 | 271.324860 | 0 | 0 |
7 | 20.277744 | 82.894086 | 5.718627 | 241.974195 | 0 | 0 |
8 | 24.515881 | 83.535216 | 6.685346 | 230.446236 | 0 | 0 |
9 | 23.223974 | 83.033227 | 6.336254 | 221.209196 | 0 | 0 |
Last rows
temperature | humidity | ph | water availability | season | label | |
---|---|---|---|---|---|---|
1390 | 23.787560 | 74.367941 | 6.014572 | 172.644265 | 0 | 12 |
1391 | 25.499417 | 75.999876 | 6.663559 | 193.714183 | 0 | 12 |
1392 | 23.249256 | 73.653468 | 6.434611 | 184.767486 | 0 | 12 |
1393 | 26.985822 | 89.055879 | 7.432768 | 193.877871 | 0 | 12 |
1394 | 23.614753 | 86.142903 | 6.987333 | 150.235524 | 0 | 12 |
1395 | 23.874845 | 86.792613 | 6.718725 | 177.514731 | 0 | 12 |
1396 | 23.928879 | 88.071123 | 6.880205 | 154.660874 | 0 | 12 |
1397 | 24.814412 | 81.686889 | 6.861069 | 190.788639 | 0 | 12 |
1398 | 24.447439 | 82.286484 | 6.769346 | 190.968489 | 0 | 12 |
1399 | 26.574217 | 73.819949 | 7.261581 | 159.322307 | 0 | 12 |