Home > database >  Pandas boxplot of data per <year,month>
Pandas boxplot of data per <year,month>

Time:11-17

I have a dataframe with data per day

df = A. B. date
     1. 2. 2016-12-31
     7. 2. 2016-12-22
     4. 3. 2016-12-12
     6. 8. 2017-02-03
     8. 3. 2017-01-31
     1. 0. 2016-12-11

And so on. I have around 100 rows per month, from Dec-2016 to now. I want to create a sns.boxplot, where x-axis is <year,month> and y is the value of the column B. How can I do it? Thanks

CodePudding user response:

Given the following data frame:

import io; df = pd.read_csv(io.StringIO('''A. B. date
     1. 2. 2016-12-31
     7. 2. 2016-12-22
     4. 3. 2016-12-12
     6. 8. 2017-02-03
     8. 3. 2017-01-31
     1. 0. 2016-12-11'''), sep=r'[\n\.]', engine='python')
df.rename(columns=str.strip, inplace=True)
df['date'] = pd.to_datetime(df['date'])

Just group by the date and month keys, which you can access via the date column. Then aggregate (I use sum here; consider mean) and plot.

m_df = df.groupby([df['date'].dt.year, df['date'].dt.month])['B'].sum()

enter image description here

CodePudding user response:

Based on the information that you gave, I recreated your dataframe

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

d = {'A': [1., 7., 4., 6., 8., 1.], 'B': [2., 2., 3., 8., 3., 0.], 'date': ["2016-12-31", "2016-12-22", "2016-12-12", "2017-02-03", "2017-01-31", "2016-12-11"]}
df = pd.DataFrame(data=d)
print(df)

>>>
     A    B        date
0  1.0  2.0  2016-12-31
1  7.0  2.0  2016-12-22
2  4.0  3.0  2016-12-12
3  6.0  8.0  2017-02-03
4  8.0  3.0  2017-01-31
5  1.0  0.0  2016-12-11

I then transformed your column to be a datetime and sorted the values, so it appears ordered on the plot. I also transformed the datetime to be only <year,month> resulting in this

df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df['date'] = df['date'].dt.strftime('%Y-%m')

>>>
     A    B     date
5  1.0  0.0  2016-12
2  4.0  3.0  2016-12
1  7.0  2.0  2016-12
0  1.0  2.0  2016-12
4  8.0  3.0  2017-01
3  6.0  8.0  2017-02

I then plotted a simple boxplot, since you didn't provide more information.

sns.boxplot(data=df, x="date", y="B")

which produce this plot

boxplot

Hope this helps.

  • Related