Groupby customer's id and keep the dates which are not null in Pandas-CodePudding

I have a dataset that consists of the date of the first and the last payment of a customer. The dates are in a separate column so the dataset is something like this:

id	first_payment	last_payment
11	23/10/2010	0
11	0	24/11/2010
223	11/01/2012	0
223	0	12/02/2012

How do I group by id in order to look like this?

id	first_payment	last_payment
11	23/10/2010	24/11/2010
223	11/01/2012	12/02/2012

I used groupby.df['id'].sum() but cannot join the dates.

CodePudding user response：

We can use groupby.max:

df.groupby("id", as_index=False).max()

    id first_payment last_payment
0   11    23/10/2010   24/11/2010
1  223    11/01/2012   12/02/2012

But that's kind of hacky.

Safest way is to first convert your dates to datetime, then use first:

date_cols = ["first_payment", "last_payment"]
df[date_cols] = df[date_cols].apply(pd.to_datetime, errors="coerce")

df.groupby("id").first().reset_index()

    id first_payment last_payment
0   11    23/10/2010   24/11/2010
1  223    11/01/2012   12/02/2012

CodePudding user response：

I am assuming that the first_payment is the only non-zero first_payment column and the last payment is only non-zerovalue in the last_payment column, since you were trying to aggregate using the sum() method, if that is true then you can do this:

df.replace('0',np.NaN)   # Replace zero by Null values
df.replace(0,np.NaN)     # If zeros are numerical
df_new = df.groupby('id').agg({'first_payment': 'first', 
                               'last_payment': 'last'})