I have a problem. I would like to summarise the different years of the months. Unfortunately, I don't know how to calculate the months on a yearly basis. .value_counts()
calculates the whole months, is there also the possibility to say only for the different years. I tried something like groupby df.groupby(['year', 'month']).count()
, but it doesn't work.
Dataframe
id date
0 1 2021-02-22
1 2 2021-03-22
2 3 2021-04-22
3 4 2021-02-15
4 5 2021-09-15
5 6 2020-01-12
6 7 2022-02-22
Code
import pandas as pd
import seaborn as sns
d = {'id': [1, 2, 3, 4, 5, 6, 7],
'date': ['2021-02-22', '2021-03-22', '2021-04-22',
'2021-02-15', '2021-09-15', '2020-01-12', '2022-02-22'],
}
df = pd.DataFrame(data=d)
print(df)
df['month'] = pd.to_datetime(df['date'], errors='coerce').dt.month
df['year'] = pd.to_datetime(df['date'], errors='coerce').dt.year
sns.lineplot(data=df, x="month", y="", hue="month")
[OUT]
>>> ValueError: Could not interpret value `` for parameter `y`
What I tried
#df_month = df__orders['orderDate_month'].value_counts().reset_index()
#df_month.columns = ['month', 'count']
#sns.lineplot(data=df_month, x='month', y='count')
#df_new = df.groupby(by=["year, month"]).count()
df.groupby(['year', 'month']).count()
df_new
[OUT]
year
2020 1
2021 5
2022 1
dtype: int64
What I want
year month count
2020 1 1
2021 2 2
2021 3 1
2021 4 1
2021 9 1
2022 2 2
CodePudding user response:
Use GroupBy.size
:
df_month = df.groupby(['year', 'month']).size().reset_index(name='count')
print (df_month)
year month count
0 2020 1 1
1 2021 2 2
2 2021 3 1
3 2021 4 1
4 2021 9 1
5 2022 2 1
and then add hue
parameter:
sns.lineplot(data=df_month, x='month', y='count', hue='year')