I have a dataframe with data per day
df = A. B. date
1. 2. 2016-12-31
7. 2. 2016-12-22
4. 3. 2016-12-12
6. 8. 2017-02-03
8. 3. 2017-01-31
1. 0. 2016-12-11
And so on. I have around 100 rows per month, from Dec-2016 to now. I want to create a sns.boxplot, where x-axis is <year,month> and y is the value of the column B. How can I do it? Thanks
CodePudding user response:
Given the following data frame:
import io; df = pd.read_csv(io.StringIO('''A. B. date
1. 2. 2016-12-31
7. 2. 2016-12-22
4. 3. 2016-12-12
6. 8. 2017-02-03
8. 3. 2017-01-31
1. 0. 2016-12-11'''), sep=r'[\n\.]', engine='python')
df.rename(columns=str.strip, inplace=True)
df['date'] = pd.to_datetime(df['date'])
Just group by the date and month keys, which you can access via the date
column. Then aggregate (I use sum
here; consider mean
) and plot.
m_df = df.groupby([df['date'].dt.year, df['date'].dt.month])['B'].sum()
CodePudding user response:
Based on the information that you gave, I recreated your dataframe
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
d = {'A': [1., 7., 4., 6., 8., 1.], 'B': [2., 2., 3., 8., 3., 0.], 'date': ["2016-12-31", "2016-12-22", "2016-12-12", "2017-02-03", "2017-01-31", "2016-12-11"]}
df = pd.DataFrame(data=d)
print(df)
>>>
A B date
0 1.0 2.0 2016-12-31
1 7.0 2.0 2016-12-22
2 4.0 3.0 2016-12-12
3 6.0 8.0 2017-02-03
4 8.0 3.0 2017-01-31
5 1.0 0.0 2016-12-11
I then transformed your column to be a datetime
and sorted the values, so it appears ordered on the plot. I also transformed the datetime to be only <year,month> resulting in this
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date')
df['date'] = df['date'].dt.strftime('%Y-%m')
>>>
A B date
5 1.0 0.0 2016-12
2 4.0 3.0 2016-12
1 7.0 2.0 2016-12
0 1.0 2.0 2016-12
4 8.0 3.0 2017-01
3 6.0 8.0 2017-02
I then plotted a simple boxplot
, since you didn't provide more information.
sns.boxplot(data=df, x="date", y="B")
which produce this plot
Hope this helps.