I have multiple pandas data frames that I perform the t-tests on.
All data frames consist of a specific column named count
that I perform the t-tests on.
def retrieve_name(var):
callers_local_vars = inspect.currentframe().f_back.f_globals.items()
return [var_name for var_name, var_val in callers_local_vars if var_val is var][0]
list1,list2 = [a_apple, b_apple, c_apple, d_apple], [a_orange, b_orange, c_orange, d_orange]
for x in range(len(list1)):
for y in range(len(list2)):
print('---------')
print(retrieve_name(list1[x]) ' and ' retrieve_name(list2[y]) )
print(ttest_ind(list1[x]['count'], list2[y]['count']))
Currently the code above is giving me what I want as shown by the sample output below
---------
a_apple and a_orange
Ttest_indResult(statistic=0.17363567462699322, pvalue=0.8625959461018068)
---------
a_apple and b_orange
Ttest_indResult(statistic=-2.8910258868131904, pvalue=0.004956889487716552)
---------
a_apple and c_orange
Ttest_indResult(statistic=-1.2417412995214525, pvalue=0.21800606685014645)
---------
a_apple and d_orange
Ttest_indResult(statistic=-3.2827337601654727, pvalue=0.0015326335802097308)
I am expecting the output to be in a data frame. How can I accomplish this?
group1 group2 statistics pvalue
0 a_apple a_orange 0.17363567462699322 0.8625959461018068
1 a_apple b_orange -2.8910258868131904 0.004956889487716552
2 a_apple c_orange -1.2417412995214525 0.21800606685014645
3 a_apple d_orange -3.2827337601654727 0.0015326335802097308
```
CodePudding user response:
You can simply append the group names and the results of the T-tests to a list and convert this to a dataframe.
Expanding on your code try this:
...
results = []
for x in range(len(list1)):
for y in range(len(list2)):
name_list1 = retrieve_name(list1[x])
name_list2 = retrieve_name(list2[y])
ttest_result = ttest_ind(list1[x]['count'], list2[y]['count'])
results.append([name_list1, name_list2, ttest_result[0], ttest_result[1]])
print('---------')
print(name_list1 ' and ' name_list2)
print(ttest_result)
df = pd.DataFrame(results, columns=["group1", "group2", "statistics", "pvalue"])
print(df.head())