Home > OS >  How do I draw seaborn boxplot with two data sets of different length?
How do I draw seaborn boxplot with two data sets of different length?

Time:03-31

I have two data sets, NA and HG

len(NA)=267

NA = [73,49,53...]

len(HG)=176 (HG is similar list like NA)

I want to draw the two data sets into one plot like this (I got this one by drawing them independently and then modify the plot by photoshop...which I can not do the same for another data set as they have different axis): enter image description here

The seaborn adopts data in forms of numpy array and panda dataframe, which all requires the array set to be of equal length, which, in my case, does not stand, because HG has 176 data points and NA has 267.

Currently, what I did is transfer the list to pandas dataframe and then plot via

HG = sns.boxplot(data=HG, width = 0.2)

I tried HG = sns.boxplot(data=HG NA, width = 0.2) and it returned me with an empty plot so...help please.

Thank you so much!

CodePudding user response:

I assume NA and HG are dataframes with one column since your plotting a box plot. So you can concat them into one df and then plot, there will be NaN for the large df but seaborn will ignore those

df = pd.concat([NA, HG], axis=1)
sns.plot(data=df)

CodePudding user response:

The following creates a DataFrame where the missing HG values are just filled with NaNs. These are ignored by the boxplot. There are many ways to generate a DataFrame with the length of the longest list. An alternative to the join shown below would be itertools.zip_longest or pd.concat as suggested by @Kenan.

NA = [73, 49, 53, 20, 20, 20, 20, 20, 20, 20, 20, 20]
HG = [73, 30, 60]

df = pd.Series(NA, name="NA").to_frame().join(pd.Series(HG, name="HG"))
sns.boxplot(data=df, width = 0.2) 

enter image description here


Or maybe you are interested in using Holoviews, which gives you fully interactive plots in a very simple manner (when used in Jupyter with a bokeh backend). For your case that would look like the following:

import holoviews as hv

NA = [73, 49, 53, 20, 20, 20, 20, 20, 20, 20, 20, 20]
HG = [73, 30, 60]

hv.BoxWhisker(NA, label="NA") * hv.BoxWhisker(HG, label="HG")

enter image description here

  • Related