Let's say I have people chew a type of gum while reading a question, and then answer a test question. Sometimes they would chew orange gum while reading and answering a question. Sometimes they would chew peppermint. Not everyone chewed and answered all of the questions.
Let's say I have my data laid out like this:
ID | Gum Type | Test (1= correct, 2=incorrect) |
---|---|---|
1 | Orange | 1 |
1 | Orange | 0 |
1 | Peppermint | 0 |
1 | Peppermint | 1 |
2 | Orange | 0 |
2 | Peppermint | 1 |
I want to create a violin plot where on my x-axis, I have Gum Type, and on my Y-axis, I have the Proportion correct on the test, and participant 1 would show up as only one data point for Orange, and One data point for Peppermint. So participant one would show up on the "Orange" violin plot as one data point, in the middle (got 50% of orange questions correct).
CodePudding user response:
Use:
data = '''ID Gum Type Test (1= correct, 2=incorrect)
1 Orange 1
1 Orange 0
1 Peppermint 0
1 Peppermint 1
2 Orange 0
2 Peppermint 1'''
data = [x.split(' ') for x in data.split('\n')]
import seaborn as sns
df = pd.DataFrame(data[1:], columns = data[0])
df['Test (1= correct, 2=incorrect)'] = df['Test (1= correct, 2=incorrect)'].astype(int)
df1 = df.groupby(['ID', 'Gum Type'])['Test (1= correct, 2=incorrect)'].mean().to_frame().reset_index()
ax = sns.violinplot(x="Gum Type", y="Test (1= correct, 2=incorrect)", data=df1)
Output: