I have a dataset that consists of the date of the first and the last payment of a customer. The dates are in a separate column so the dataset is something like this:
id | first_payment | last_payment |
---|---|---|
11 | 23/10/2010 | 0 |
11 | 0 | 24/11/2010 |
223 | 11/01/2012 | 0 |
223 | 0 | 12/02/2012 |
How do I group by id in order to look like this?
id | first_payment | last_payment |
---|---|---|
11 | 23/10/2010 | 24/11/2010 |
223 | 11/01/2012 | 12/02/2012 |
I used groupby.df['id'].sum()
but cannot join the dates.
CodePudding user response:
We can use groupby.max
:
df.groupby("id", as_index=False).max()
id first_payment last_payment
0 11 23/10/2010 24/11/2010
1 223 11/01/2012 12/02/2012
But that's kind of hacky.
Safest way is to first convert your dates to datetime, then use first
:
date_cols = ["first_payment", "last_payment"]
df[date_cols] = df[date_cols].apply(pd.to_datetime, errors="coerce")
df.groupby("id").first().reset_index()
id first_payment last_payment
0 11 23/10/2010 24/11/2010
1 223 11/01/2012 12/02/2012
CodePudding user response:
I am assuming that the first_payment
is the only non-zero first_payment column and the last payment
is only non-zerovalue in the last_payment column, since you were trying to aggregate using the sum()
method, if that is true then you can do this:
df.replace('0',np.NaN) # Replace zero by Null values
df.replace(0,np.NaN) # If zeros are numerical
df_new = df.groupby('id').agg({'first_payment': 'first',
'last_payment': 'last'})