Home > database >  T-test on multiple dataframes and output into dataframe
T-test on multiple dataframes and output into dataframe

Time:07-11

I have multiple pandas data frames that I perform the t-tests on.

All data frames consist of a specific column named count that I perform the t-tests on.

def retrieve_name(var):
   callers_local_vars = inspect.currentframe().f_back.f_globals.items()
   return [var_name for var_name, var_val in callers_local_vars if var_val is var][0]

list1,list2 = [a_apple, b_apple, c_apple, d_apple], [a_orange, b_orange, c_orange, d_orange]
for x in range(len(list1)):
    for y in range(len(list2)):
        print('---------')
        print(retrieve_name(list1[x])   ' and '  retrieve_name(list2[y]) )
        print(ttest_ind(list1[x]['count'], list2[y]['count']))

Currently the code above is giving me what I want as shown by the sample output below

---------
a_apple and a_orange
Ttest_indResult(statistic=0.17363567462699322, pvalue=0.8625959461018068)
---------
a_apple and b_orange
Ttest_indResult(statistic=-2.8910258868131904, pvalue=0.004956889487716552)
---------
a_apple and c_orange
Ttest_indResult(statistic=-1.2417412995214525, pvalue=0.21800606685014645)
---------
a_apple and d_orange
Ttest_indResult(statistic=-3.2827337601654727, pvalue=0.0015326335802097308)

I am expecting the output to be in a data frame. How can I accomplish this?

    group1      group2          statistics              pvalue
0   a_apple     a_orange    0.17363567462699322     0.8625959461018068
1   a_apple     b_orange    -2.8910258868131904     0.004956889487716552 
2   a_apple     c_orange    -1.2417412995214525     0.21800606685014645
3   a_apple     d_orange    -3.2827337601654727     0.0015326335802097308

```

CodePudding user response:

You can simply append the group names and the results of the T-tests to a list and convert this to a dataframe.

Expanding on your code try this:

...
results = []
for x in range(len(list1)):
    for y in range(len(list2)):
        
        name_list1 = retrieve_name(list1[x])
        name_list2 = retrieve_name(list2[y])
        ttest_result = ttest_ind(list1[x]['count'], list2[y]['count'])
        results.append([name_list1, name_list2, ttest_result[0], ttest_result[1]])
        
        print('---------')
        print(name_list1   ' and ' name_list2)
        print(ttest_result)

        
df = pd.DataFrame(results, columns=["group1", "group2", "statistics", "pvalue"])
print(df.head())
  • Related