Home > Software design >  Apply function across pandas dataframe (Kruskal-Wallice)
Apply function across pandas dataframe (Kruskal-Wallice)

Time:12-16

I have a dataframe df1 and can't seem to apply this function across columns for each row.

import scipy.stats as stats

d = {'sample1':[2,3,1,0,5], 'sample2':[3,0,0,2,3], 'sample3':[0,0,0,3,4]}
df1 = pd.DataFrame(d)


# create new column containing p_value from kruskal test.
df1['p_val'] = df1.apply(lambda x: stats.kruskal(x.sample1, x.sample2, x.sample3), axis=1)

I keep getting:

TypeError: len() of unsized object

df1:

sample1 sample2 sample3
0   2      3    0
1   3      0    0
2   1      0    0
3   0      2    3
4   5      3    4

Desired output (obviously I made the p_values up):

sample1 sample2 sample3  P_val
0   2      3    0        0.07
1   3      0    0        0.2
2   1      0    0        0.001
3   0      2    3        0.5
4   5      3    4        0.02

CodePudding user response:

It looks like the scipy.kruskal function expects an array of numbers for each of the 3 arguments you are passing in.

Hence, modifying the lambda function to produce a list for every column entry in the row does the trick

df1['p_val'] = df1.apply(lambda x: stats.kruskal([x.sample1], [x.sample2], [x.sample3]), axis=1)

CodePudding user response:

it looks like scipy.kruskal take arugment in one dimensional so write

df1['p_val'] = df1.apply(lambda x: stats.kruskal([x.sample1], [x.sample2], [x.sample3]), axis=1) it will solve the error but as Kruskal-Wallis H test return test stats and p value, So ,modifying your code like

df1['test_stat'],df1['p_val'] = df1.apply(lambda x: stats.kruskal([x.sample1], [x.sample2], [x.sample3]), axis=1)

it will help to retrieve both values in different field

  • Related