Home > other >  Computing the difference of min/max of two columns, per group, using pandas
Computing the difference of min/max of two columns, per group, using pandas

Time:10-05

Given the following dataset:

group_id from_date to_date
0 2020-01-01 2020-02-01
0 2020-02-01 2020-03-01
0 2020-03-01 2020-04-01
1 2020-01-01 2020-02-01
1 2020-02-01 2020-03-01

I'm trying to compute, for each group, max(to_date) - min(from_date), such as the result looks like ( - some days depending on months duration):

group_id duration_days
0 90
1 60

Using the following correctly computes the duration, but returns an ungrouped dataframe of 5 rows:

groupby(["group_id"])
  .apply(lambda x: x.assign(duration_days=(np.max(x["to_date"])-np.min(x["from_date"])).days))` 

I've tried using aggregate but I haven't managed to make it work with a function using two columns.

CodePudding user response:

Let us not do with apply lambda. melt np.ptp

out = df.melt('group_id').groupby('group_id')['value'].agg(np.ptp).reset_index(name = 'duration_days')
Out[16]: 
   group_id duration_days
0         0       91 days
1         1       60 days

CodePudding user response:

You approach was almost correct, just aggregate as Series and take care of renaming afterwards:

# ensure datetime
df['from_date'] = pd.to_datetime(df['from_date'])
df['to_date'] = pd.to_datetime(df['to_date'])

(df.groupby('group_id')
   .apply(lambda g: g['to_date'].max()-g['from_date'].min())
   .reset_index(name='duration_days')
)

output:

   group_id duration_days
0         0       91 days
1         1       60 days

CodePudding user response:

Compute the maximum and minimum per group (I am making sure they are datetime objects, they probably are already):

maxi = df.groupby('group_id').to_date.max()
mini = df.groupby('group_id').from_date.min()

Then subtract them from each other:

(maxi - mini).reset_index()

Output:

   group_id  to_date
0          0 91 days
1          1 60 days
  • Related