Home > Software design >  How to crosstab or count dataframe rows by date in pandas
How to crosstab or count dataframe rows by date in pandas

Time:11-11

I am fairly new to working with pandas. I have a dataframe with individual entries like this:

dfImport:

id date_created date_closed
0 01-07-2020
1 02-09-2020 10-09-2020
2 07-03-2019 02-09-2020

I would like to filter it in a way, that I get the total number of created and closed objects (count id's) grouped by Year and Quarter and Month like this:

dfInOut:

Year Qrt month number_created number_closed
2019 1 March 1 0
2020 3 July 1 0
September 1 2

I guess I'd have to use some combination of crosstab or group_by, but I tried out alot of ideas and already did research on the problem, but I can't seem to figure out a way. I guess it's an issue of understanding. Thanks in advance!

CodePudding user response:

Use DataFrame.melt with crosstab:

df['date_created'] = pd.to_datetime(df['date_created'], dayfirst=True)
df['date_closed'] = pd.to_datetime(df['date_closed'], dayfirst=True)

df1 = df.melt(value_vars=['date_created','date_closed']).dropna()
df = (pd.crosstab([df1['value'].dt.year.rename('Year'), 
                   df1['value'].dt.quarter.rename('Qrt'), 
                  df1['value'].dt.month.rename('Month')], df1['variable'])
         [['date_created','date_closed']])

print (df)
variable        date_created  date_closed
Year Qrt Month                           
2019 1   3                 1            0
2020 3   7                 1            0
         9                 1            2

df = df.rename_axis(None, axis=1).reset_index()
print (df)
   Year  Qrt  Month  date_created  date_closed
0  2019    1      3             1            0
1  2020    3      7             1            0
2  2020    3      9             1            2
  • Related