I use the following code:
df11 = df_curr_obj.apply(lambda x: [float(b) for a, b in (x.value_counts()/n_new).head(3).items()])
df22 = df_old_obj.apply(lambda x:[float(b) for a, b in (x.value_counts()/n_old).head(3).items()])
df_final = pd.concat([df22,df11], axis=1, keys=('df_old_obj','df_curr_obj'))
to get the following dataframe (cropped rows):
df_old_obj df_curr_obj
_rev [79.5, 0.25] [92.0, 0.5]
team [22.75, 10.25, 10.25] [25.5, 17.0, 12.0]
entitytype [0.25, 0.25, 0.25] [0.5, 0.5, 0.5]
lie [26.25, 1.25, 0.5] [36.0, 1.5, 0.5]
presentation [26.25, 1.5] [36.0, 2.0]
fetalheartbeat [79.25] [91.5]
liquordescription [66.0, 1.75, 0.5] [77.0, 2.5, 1.0]
Firstly the data type of both columns above shows as object, even though I have used float(b) to convert b.
Secondly, how do I get the standard deviation for each list, for example:
df_old_obj df_curr_obj
_rev show St.dev of [79.5, 0.25] show St.dev of [92.0, 0.5]
and so on for all rows..
I know that in the case of wanting to find standard deviation for each column, I must do
df['column'].std()
but my case is not as simple as this, Please help!
CodePudding user response:
The type of columns shows up as Object
because you have lists in your cells and a list is indeed an object in Python.
You can easily compute the standard deviation of each cell with df_final.applymap(lambda x: np.std(x))
.