I'm trying to get the average, cardinality and the standard deviation of a column for each row of a dataframe. I am also trying to do this in a single line. I've been stuck on this question for ages. Thanks
CodePudding user response:
example df:
import numpy as np
import pandas as pd
df = pd.DataFrame(
{
'a': [1, 2, np.NaN],
'b': [5, 2, 2],
'c': [3, 2, np.NaN],
'd': [1, 1, 1]
}
)
a b c d
0 1.0 5 3.0 1
1 2.0 2 2.0 1
2 NaN 2 NaN 1
You can calculate the values like this using the axis=1 parameter:
df.mean(axis=1) # average
df.nunique(axis=1) # cardinality
df.std(axis=1) # standard deviation
example:
df.mean()
gives you:
a 1.5
b 3.0
c 2.5
d 1.0
which is for column a: 1 2/2=1.5
and df.mean(axis=1)
gives you:
0 2.50
1 1.75
2 1.50
which is for the first row: 1 5 3 1/4 = 2.5
CodePudding user response:
You could transpose
and aggregate
like this (using @bitflip data as example, thanks):
import numpy as np
import pandas as pd
df = pd.DataFrame(
{'a': [1, 2, np.NaN],
'b': [5, 2, 2],
'c': [3, 2, np.NaN],
'd': [1, 1, 1]})
0 1 2
mean 2.500000 1.75 1.500000
nunique 3.000000 2.00 2.000000
std 1.914854 0.50 0.707107