I have the following data:
data = {'treatment_1': [80, 0, 0, 8],
'treatment_2': [78, 62],
'treatment_3': [85, 62, 10, 3, 18, 18, 98, 71, 78, 12, 52, 39, 24, 13],
'treatment_4': [78, 33, 78, 40, 47, 32]
}
I am trying to run an ANOVA comparing these four treatments. As you can see, there are different numbers of data points in each treatment. Now, this shouldn't be a problem in theory, because ANOVA does not assume equal sample sizes. First, I tried to create a DataFrame. The code:
import pandas as pd
df = pd.DataFrame(data)
Gives me the error message:
ValueError: All arrays must be of the same length
So, this tells me that a DataFrame will not work. But no matter how I search for "Anova with unequal sample sizes," all I find is information using lists (and their code does not work with dictionaries) and/or equal sample sizes (which do not explain how to adjust for unequal sample sizes). How should I approach an ANOVA with dictionaries of different lengths? Or maybe I'm going about this wrong using dictionaries in the first place?
CodePudding user response:
data = {'treatment_1': [80, 0, 0, 8],
'treatment_2': [78, 62],
'treatment_3': [85, 62, 10, 3, 18, 18, 98, 71, 78, 12, 52, 39, 24, 13],
'treatment_4': [78, 33, 78, 40, 47, 32]
}
df = pd.DataFrame({k: pd.Series(v) for k, v in data.items()})
print(df)
Prints:
treatment_1 treatment_2 treatment_3 treatment_4
0 80.0 78.0 85 78.0
1 0.0 62.0 62 33.0
2 0.0 NaN 10 78.0
3 8.0 NaN 3 40.0
4 NaN NaN 18 47.0
5 NaN NaN 18 32.0
6 NaN NaN 98 NaN
7 NaN NaN 71 NaN
8 NaN NaN 78 NaN
9 NaN NaN 12 NaN
10 NaN NaN 52 NaN
11 NaN NaN 39 NaN
12 NaN NaN 24 NaN
13 NaN NaN 13 NaN