I am fairly new to working with pandas. I have a dataframe with individual entries like this:
dfImport:
id | date_created | date_closed |
---|---|---|
0 | 01-07-2020 | |
1 | 02-09-2020 | 10-09-2020 |
2 | 07-03-2019 | 02-09-2020 |
I would like to filter it in a way, that I get the total number of created and closed objects (count id's) grouped by Year and Quarter and Month like this:
dfInOut:
Year | Qrt | month | number_created | number_closed |
---|---|---|---|---|
2019 | 1 | March | 1 | 0 |
2020 | 3 | July | 1 | 0 |
September | 1 | 2 |
I guess I'd have to use some combination of crosstab or group_by, but I tried out alot of ideas and already did research on the problem, but I can't seem to figure out a way. I guess it's an issue of understanding. Thanks in advance!
CodePudding user response:
Use DataFrame.melt
with crosstab
:
df['date_created'] = pd.to_datetime(df['date_created'], dayfirst=True)
df['date_closed'] = pd.to_datetime(df['date_closed'], dayfirst=True)
df1 = df.melt(value_vars=['date_created','date_closed']).dropna()
df = (pd.crosstab([df1['value'].dt.year.rename('Year'),
df1['value'].dt.quarter.rename('Qrt'),
df1['value'].dt.month.rename('Month')], df1['variable'])
[['date_created','date_closed']])
print (df)
variable date_created date_closed
Year Qrt Month
2019 1 3 1 0
2020 3 7 1 0
9 1 2
df = df.rename_axis(None, axis=1).reset_index()
print (df)
Year Qrt Month date_created date_closed
0 2019 1 3 1 0
1 2020 3 7 1 0
2 2020 3 9 1 2