My dataset is like this, where the data points in each row or column are pandas objects.
I have tried something like this and i can have a bar plots for each individual Pi (i=1...14) but not only is it a laborious, it does not look what I want:
fig, ax = plt.subplots(2, 3, figsize=(8,2))
sns.stripplot(data=df.loc[7]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[0,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92A11'].split(','), ax=ax[0,0], color='orange', orient='h')
sns.stripplot(data=df.loc[7]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[0,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['92B11'].split(','), ax=ax[0,1], color='orange', orient='h')
sns.stripplot(data=df.loc[7]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[0,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[7]['82B11'].split(','), ax=ax[0,2], color='orange', orient='h')
sns.stripplot(data=df.loc[6]['92A11'].split(','), dodge=True, linewidth=1, ax=ax[1,0], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92A11'].split(','), ax=ax[1,0], color='orange', orient='h')
sns.stripplot(data=df.loc[6]['92B11'].split(','), dodge=True, linewidth=1, ax=ax[1,1], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['92B11'].split(','), ax=ax[1,1], color='orange', orient='h')
sns.stripplot(data=df.loc[6]['82B11'].split(','), dodge=True, linewidth=1, ax=ax[1,2], color='black', jitter=False, orient='h')
sns.violinplot(data=df.loc[6]['82B11'].split(','), ax=ax[1,2], color='orange', orient='h')
ax[0,0].set_xlim(0,200)
ax[0,1].set_xlim(0,200)
ax[0,2].set_xlim(0,200)
ax[1,0].set_xlim(0,200)
ax[1,1].set_xlim(0,200)
ax[1,2].set_xlim(0,200)
ax[1,0].set_xlabel('92A11')
ax[1,1].set_xlabel('92A11')
ax[1,2].set_xlabel('92A11')
ax[0,0].set_ylabel('P8')
ax[1,0].set_ylabel('P7')
fig.tight_layout()
CodePudding user response:
Seaborn works easiest with its data in long form. pd.melt
can convert the 3 individual columns into a variable-name and value column. pd.explode
can convert lists in cells into multiple rows. As the current entries are strings, they can be converted to lists of numbers before calling .explode()
.
The following code supposes you want box plots, not bars. It also supposes that an entry with only '0'
is supposed to stay empty.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('raw_data.csv')
df = df.rename(columns={df.columns[0]: 'Location'}) # suppose the first column is named 'Location'
df_melted = df.melt(id_vars='Location', var_name='Dataset', value_name='Datapoint')
df_melted['Datapoint'] = df_melted['Datapoint'].map(
lambda s: [np.nan] if s == '0' else [float(v) for v in s.split(',')])
df_exploded = df_melted.explode(column='Datapoint', ignore_index=True)
df_exploded['Datapoint'] = pd.to_numeric(df_exploded['Datapoint']) # make sure the column is numeric
g = sns.catplot(data=df_exploded, col='Dataset', kind='box', x='Datapoint', y='Location')
g.set(xlabel='', ylabel='') # remove x and y label
PS: df_merged
looks like:
Location Dataset Datapoint
0 P1 92A11 [nan]
1 P2 92A11 [10.0, 15.0]
2 P3 92A11 [100.0, 40.0, 60.0, 50.0, 70.0, 50.0, 30.0, 40...
3 P4 92A11 [55.0, 40.0, 20.0, 15.0, 15.0, 15.0, 15.0, 10....
4 P5 92A11 [nan]
...
df_exploded
has each value inside the list into a separate row:
Location Dataset Datapoint
0 P1 92A11 NaN
1 P2 92A11 10.0
1 P2 92A11 15.0
2 P3 92A11 100.0
2 P3 92A11 40.0
...