Home > Mobile >  Several boxes out of one column [Boxplot]
Several boxes out of one column [Boxplot]

Time:12-28

I have a dataframe with two columns (time_id and param1). For column 2 (param1) I want to create a boxplot according time.

However I want to split it into into three (or n) parts. Which means that there is not only one box in the box plot, but one box per time range. For example (n=3) a box based on the values ​​20,3,4,21,19 [time 1-3] and a box based on 8,9,18,6,4 [time 4-6] etc.

So the following code creates a boxplot of the entire column.

import pandas as pd

# initialize data of lists.
data = {'time_id':[1,1,2,3,3,4,5,5,5,6,7,8,8,9],
        'param1':[20,3,4,21,19,8,9,18,6,4,2,3,7,1]}
 
# Create DataFrame
df = pd.DataFrame(data)

boxplot = df.boxplot(column='param1')

What is an elegant way to divide the column into three boxes so that the box plot looks like this (exemplary):

enter image description here

CodePudding user response:

I am not sure it is the most elegant way to do it but you can use the function cut() from pandas and the method of dataframe .pivot() :

df["class"]=pd.cut(df['time_id'], bins=3, labels=False)
df = df.drop("time_id", axis=1).pivot(columns="class")
boxplot = df.boxplot()

CodePudding user response:

Since your time series is more or less evenly spaced, I would also go with enter image description here

In a real-world example, your time series probably contains float, not integer numbers. These float numbers might be rather long, making the labels also rather annoyingly long. In this case, you can abbreviate their output with bin_labels = ['time {mini:.2} : {maxi:.2}'.for.... but this wouldn't work with integer arrays.

  • Related