I am trying to calculate the Z-Score of a Pandas' DataFrame, using scipy's zscore method. Though while successful, I am getting different types returned, depending on which host the program runs.
Thus I am guessing it is related to the different versions for the involved packages.
Still I haven't found the reason for the difference.
- Why do the returned
type
on the two hosts differ?
Host 1 | Host2 |
---|---|
python 3.6.8 | python 3.7.3 |
pandas 1.1.5 | pandas 1.3.1 |
numpy 1.19.5 | numpy 1.19.2 |
scipy 1.5.4 | scipy 1.7.3 |
Example:
Host 1
import numpy as np
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])
# --------------------------------
In [5]: df
Out[5]:
A B C
0 166 135 141
1 156 110 167
2 104 159 114
3 150 156 157
4 163 113 180
In [10]: zscore(df)
Out[10]:
array([[ 0.80546745, 0.01940194, -0.47372066],
[ 0.36290292, -1.19321913, 0.66671797],
[-1.93843265, 1.18351816, -1.65802232],
[ 0.0973642 , 1.03800363, 0.22808773],
[ 0.67269809, -1.0477046 , 1.23693729]])
In [11]: zscore(df, ddof=0)
Out[11]:
array([[ 0.80546745, 0.01940194, -0.47372066],
[ 0.36290292, -1.19321913, 0.66671797],
[-1.93843265, 1.18351816, -1.65802232],
[ 0.0973642 , 1.03800363, 0.22808773],
[ 0.67269809, -1.0477046 , 1.23693729]])
In [12]: type(zscore(df))
Out[12]: numpy.ndarray
Host 2
import numpy as np
import pandas as pd
from scipy.stats import zscore
df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])
# --------------------------------
In [77]: df
Out[77]:
A B C
0 151 188 190
1 195 199 103
2 130 174 188
3 168 194 146
4 171 138 129
In [78]: zscore(df)
Out[78]:
A B C
0 -0.553990 0.428052 1.148875
1 1.477308 0.928963 -1.427210
2 -1.523474 -0.209472 1.089654
3 0.230829 0.701276 -0.153973
4 0.369327 -1.848819 -0.657346
In [79]: zscore(df, ddof=0)
Out[79]:
A B C
0 -0.553990 0.428052 1.148875
1 1.477308 0.928963 -1.427210
2 -1.523474 -0.209472 1.089654
3 0.230829 0.701276 -0.153973
4 0.369327 -1.848819 -0.657346
In [80]: type(zscore(df))
Out[80]: pandas.core.frame.DataFrame
CodePudding user response:
If we look at the source code of scipy's zscore
in version v1.5.4 (such as on Host 1), we can see that the passed input gets converted to a numpy array using np.asanyarray(a)
, which is then further processed and returned. In version v1.7.3 on the other hand (such as on Host 2), the code uses the zmap
function which calculates the z-score of the passed array/DataFrame while preserving its type (see this line).
In conclusion, the culprit for this behavior is the newer scipy version on Host 2. Hope this helps!