Overview

Dataset statistics

Number of variables 6
Number of observations 1400
Missing cells 0
Missing cells (%) 0.0%
Duplicate rows 0
Duplicate rows (%) 0.0%
Total size in memory 65.8 KiB
Average record size in memory 48.1 B

Variable types

Numeric 5
Categorical 1

Alerts

humidity is highly correlated with temperature and 4 other fields High correlation
water availability is highly correlated with temperature and 3 other fields High correlation
label is highly correlated with temperature and 4 other fields High correlation
season is highly correlated with temperature and 4 other fields High correlation
temperature is highly correlated with humidity and 3 other fields High correlation
ph is highly correlated with humidity and 2 other fields High correlation
label has 100 (7.1%) zeros Zeros

Reproduction

Analysis started 2023-03-03 12:16:08.202051
Analysis finished 2023-03-03 12:16:15.347807
Duration 7.15 seconds
Software version pandas-profiling v3.4.0
Download configuration config.json

Variables

temperature
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 1300
Distinct (%) 92.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 24.97162055
Minimum 15.33042636
Maximum 36.97794384
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 11.1 KiB
2023-03-03T17:46:15.466077 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum 15.33042636
5-th percentile 18.25405352
Q1 22.17823907
median 25.14024451
Q3 27.96322684
95-th percentile 31.22032229
Maximum 36.97794384
Range 21.64751748
Interquartile range (IQR) 5.784987767

Descriptive statistics

Standard deviation 4.081622446
Coefficient of variation (CV) 0.1634504432
Kurtosis -0.3263094809
Mean 24.97162055
Median Absolute Deviation (MAD) 2.865742815
Skewness 0.01907108973
Sum 34960.26877
Variance 16.65964179
Monotonicity Not monotonic
2023-03-03T17:46:15.594147 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
25.33797709 2
 
0.1%
21.869274 2
 
0.1%
23.39128187 2
 
0.1%
18.41932981 2
 
0.1%
20.27317074 2
 
0.1%
24.71417533 2
 
0.1%
22.61359953 2
 
0.1%
26.10018422 2
 
0.1%
23.55882094 2
 
0.1%
19.97215954 2
 
0.1%
Other values (1290) 1380
98.6%
Value Count Frequency (%)
15.33042636 1
0.1%
15.43546065 1
0.1%
15.46789263 1
0.1%
15.53834801 1
0.1%
15.77370214 1
0.1%
15.78601387 1
0.1%
16.03768615 1
0.1%
16.06522754 1
0.1%
16.24469193 1
0.1%
16.43340342 1
0.1%
Value Count Frequency (%)
36.97794384 1
0.1%
36.89163721 1
0.1%
36.75087487 1
0.1%
36.51268371 1
0.1%
36.30049702 1
0.1%
36.20970524 1
0.1%
36.04353699 1
0.1%
36.00415838 1
0.1%
35.95176642 1
0.1%
35.45790488 1
0.1%

humidity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 1300
Distinct (%) 92.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 64.61106202
Minimum 14.25803981
Maximum 94.96218673
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 11.1 KiB
2023-03-03T17:46:15.885134 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum 14.25803981
5-th percentile 17.92929023
Q1 56.82421692
median 68.28832112
Q3 82.71040882
95-th percentile 91.21936824
Maximum 94.96218673
Range 80.70414692
Interquartile range (IQR) 25.88619191

Descriptive statistics

Standard deviation 22.75378493
Coefficient of variation (CV) 0.3521654686
Kurtosis -0.2400944603
Mean 64.61106202
Median Absolute Deviation (MAD) 13.76174229
Skewness -0.8991506833
Sum 90455.48682
Variance 517.7347287
Monotonicity Not monotonic
2023-03-03T17:46:16.009685 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
68.49835977 2
 
0.1%
61.91044947 2
 
0.1%
61.74427165 2
 
0.1%
64.23580251 2
 
0.1%
63.91281869 2
 
0.1%
56.73426469 2
 
0.1%
63.69070564 2
 
0.1%
71.57476937 2
 
0.1%
71.59351368 2
 
0.1%
57.68272924 2
 
0.1%
Other values (1290) 1380
98.6%
Value Count Frequency (%)
14.25803981 1
0.1%
14.27327988 1
0.1%
14.2804191 1
0.1%
14.32313811 1
0.1%
14.33847406 1
0.1%
14.42457525 1
0.1%
14.44008871 1
0.1%
14.44228303 1
0.1%
14.69765308 1
0.1%
14.70085967 1
0.1%
Value Count Frequency (%)
94.96218673 1
0.1%
94.87679041 1
0.1%
94.86907886 1
0.1%
94.81637388 1
0.1%
94.79453182 1
0.1%
94.78993038 1
0.1%
94.72981338 1
0.1%
94.65343534 1
0.1%
94.57459443 1
0.1%
94.55695552 1
0.1%

ph
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 1300
Distinct (%) 92.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 6.565245964
Minimum 3.504752314
Maximum 9.93509073
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 11.1 KiB
2023-03-03T17:46:16.150454 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum 3.504752314
5-th percentile 5.386097545
Q1 6.06879526
median 6.524478032
Q3 7.042342972
95-th percentile 7.881354651
Maximum 9.93509073
Range 6.430338416
Interquartile range (IQR) 0.9735477118

Descriptive statistics

Standard deviation 0.8351014206
Coefficient of variation (CV) 0.127200325
Kurtosis 1.662209163
Mean 6.565245964
Median Absolute Deviation (MAD) 0.4874527745
Skewness 0.1729917217
Sum 9191.34435
Variance 0.6973943826
Monotonicity Not monotonic
2023-03-03T17:46:16.270529 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
6.586244581 2
 
0.1%
5.850439831 2
 
0.1%
5.871647806 2
 
0.1%
6.474476516 2
 
0.1%
6.439071996 2
 
0.1%
6.648725327 2
 
0.1%
5.749914421 2
 
0.1%
6.931756558 2
 
0.1%
6.657964753 2
 
0.1%
6.596060648 2
 
0.1%
Other values (1290) 1380
98.6%
Value Count Frequency (%)
3.504752314 1
0.1%
3.510404312 1
0.1%
3.5253661 1
0.1%
3.532008668 1
0.1%
3.558822825 1
0.1%
3.692863601 1
0.1%
3.71105919 1
0.1%
3.793575185 1
0.1%
3.808429173 1
0.1%
3.828031463 1
0.1%
Value Count Frequency (%)
9.93509073 1
0.1%
9.926212291 1
0.1%
9.679240873 1
0.1%
9.45949344 1
0.1%
9.416003106 1
0.1%
9.406887533 1
0.1%
9.392694614 1
0.1%
9.254089438 1
0.1%
9.160691747 1
0.1%
9.112771682 1
0.1%

water availability
Real number (ℝ≥0)

HIGH CORRELATION

Distinct 1300
Distinct (%) 92.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 91.7846509
Minimum 20.21126747
Maximum 298.5601175
Zeros 0
Zeros (%) 0.0%
Negative 0
Negative (%) 0.0%
Memory size 11.1 KiB
2023-03-03T17:46:16.391766 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum 20.21126747
5-th percentile 26.51740556
Q1 51.54654173
median 72.37918315
Q3 107.4283336
95-th percentile 209.9458982
Maximum 298.5601175
Range 278.34885
Interquartile range (IQR) 55.88179184

Descriptive statistics

Standard deviation 58.68225767
Coefficient of variation (CV) 0.6393471795
Kurtosis 1.312178796
Mean 91.7846509
Median Absolute Deviation (MAD) 24.40939669
Skewness 1.364459853
Sum 128498.5113
Variance 3443.607365
Monotonicity Not monotonic
2023-03-03T17:46:16.509851 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Value Count Frequency (%)
96.46380213 2
 
0.1%
107.2681929 2
 
0.1%
107.3198135 2
 
0.1%
76.41312437 2
 
0.1%
62.50351892 2
 
0.1%
88.45361858 2
 
0.1%
87.75953857 2
 
0.1%
102.2662445 2
 
0.1%
66.71995467 2
 
0.1%
60.65171481 2
 
0.1%
Other values (1290) 1380
98.6%
Value Count Frequency (%)
20.21126747 1
0.1%
20.36001144 1
0.1%
20.39020503 1
0.1%
20.49035619 1
0.1%
20.66127836 1
0.1%
20.76212031 1
0.1%
20.76223014 1
0.1%
20.76582087 1
0.1%
20.88620369 1
0.1%
21.0000988 1
0.1%
Value Count Frequency (%)
298.5601175 1
0.1%
298.4018471 1
0.1%
295.9248796 1
0.1%
295.6094492 1
0.1%
291.2986618 1
0.1%
290.6793783 1
0.1%
287.5766935 1
0.1%
286.5083725 1
0.1%
285.2493645 1
0.1%
284.4364567 1
0.1%

season
Categorical

HIGH CORRELATION

Distinct 4
Distinct (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory size 11.1 KiB
0
600 
1
400 
3
300 
2
100 

Length

Max length 1
Median length 1
Mean length 1
Min length 1

Characters and Unicode

Total characters 1400
Distinct characters 4
Distinct categories 1 ?
Distinct scripts 1 ?
Distinct blocks 1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique 0 ?
Unique (%) 0.0%

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Common Values

Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

Length

2023-03-03T17:46:16.627834 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2023-03-03T17:46:16.751573 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

Most occurring characters

Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

Most occurring categories

Value Count Frequency (%)
Decimal Number 1400
100.0%

Most frequent character per category

Decimal Number
Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

Most occurring scripts

Value Count Frequency (%)
Common 1400
100.0%

Most frequent character per script

Common
Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

Most occurring blocks

Value Count Frequency (%)
ASCII 1400
100.0%

Most frequent character per block

ASCII
Value Count Frequency (%)
0 600
42.9%
1 400
28.6%
3 300
21.4%
2 100
 
7.1%

label
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct 13
Distinct (%) 0.9%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Mean 5.642857143
Minimum 0
Maximum 12
Zeros 100
Zeros (%) 7.1%
Negative 0
Negative (%) 0.0%
Memory size 11.1 KiB
2023-03-03T17:46:16.845032 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum 0
5-th percentile 0
Q1 2
median 5.5
Q3 9
95-th percentile 12
Maximum 12
Range 12
Interquartile range (IQR) 7

Descriptive statistics

Standard deviation 3.82996617
Coefficient of variation (CV) 0.6787281821
Kurtosis -1.286860175
Mean 5.642857143
Median Absolute Deviation (MAD) 3.5
Skewness 0.1216937038
Sum 7900
Variance 14.66864087
Monotonicity Not monotonic
2023-03-03T17:46:16.954598 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
Value Count Frequency (%)
1 200
14.3%
0 100
 
7.1%
2 100
 
7.1%
3 100
 
7.1%
4 100
 
7.1%
5 100
 
7.1%
6 100
 
7.1%
7 100
 
7.1%
8 100
 
7.1%
9 100
 
7.1%
Other values (3) 300
21.4%
Value Count Frequency (%)
0 100
7.1%
1 200
14.3%
2 100
7.1%
3 100
7.1%
4 100
7.1%
5 100
7.1%
6 100
7.1%
7 100
7.1%
8 100
7.1%
9 100
7.1%
Value Count Frequency (%)
12 100
7.1%
11 100
7.1%
10 100
7.1%
9 100
7.1%
8 100
7.1%
7 100
7.1%
6 100
7.1%
5 100
7.1%
4 100
7.1%
3 100
7.1%

Interactions

2023-03-03T17:46:14.419671 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:11.983141 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.572142 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.210154 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.822342 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.534388 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.106255 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.686248 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.317507 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.938682 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.651360 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.214918 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.805444 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.432599 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.071686 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.771177 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.324071 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.959670 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.555210 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.188970 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.892289 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:12.429945 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.078132 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:13.686669 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
2023-03-03T17:46:14.308342 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Correlations

2023-03-03T17:46:17.059938 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Auto

The auto setting is an easily interpretable pairwise column metric of the following mapping: vartype-vartype : method, categorical-categorical : Cramer's V, numerical-categorical : Cramer's V (using a discretized numerical column), numerical-numerical : Spearman's ρ. This configuration uses the best suitable for each pair of columns.
2023-03-03T17:46:17.201220 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-03T17:46:17.337460 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-03T17:46:17.491375 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-03T17:46:17.637456 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-03T17:46:15.086629 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-03T17:46:15.270095 image/svg+xml Matplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

temperature humidity ph water availability season label
0 20.879744 82.002744 6.502985 202.935536 0 0
1 21.770462 80.319644 7.038096 226.655537 0 0
2 23.004459 82.320763 7.840207 263.964248 0 0
3 26.491096 80.158363 6.980401 242.864034 0 0
4 20.130175 81.604873 7.628473 262.717340 0 0
5 23.058049 83.370118 7.073454 251.055000 0 0
6 22.708838 82.639414 5.700806 271.324860 0 0
7 20.277744 82.894086 5.718627 241.974195 0 0
8 24.515881 83.535216 6.685346 230.446236 0 0
9 23.223974 83.033227 6.336254 221.209196 0 0

Last rows

temperature humidity ph water availability season label
1390 23.787560 74.367941 6.014572 172.644265 0 12
1391 25.499417 75.999876 6.663559 193.714183 0 12
1392 23.249256 73.653468 6.434611 184.767486 0 12
1393 26.985822 89.055879 7.432768 193.877871 0 12
1394 23.614753 86.142903 6.987333 150.235524 0 12
1395 23.874845 86.792613 6.718725 177.514731 0 12
1396 23.928879 88.071123 6.880205 154.660874 0 12
1397 24.814412 81.686889 6.861069 190.788639 0 12
1398 24.447439 82.286484 6.769346 190.968489 0 12
1399 26.574217 73.819949 7.261581 159.322307 0 12