Pandas DF - Efficient way to loop through DF to find minimum values of one column from rows with com-CodePudding

I have a dataframe that looks something like:

matter	work_date
1	01/01/2020
2	01/02/2020
1	01/04/2020
2	01/05/2020

I want a new column which finds the minimum work_date of all rows with the same matter number so that I can do some time delta calculations. so the final result would look like this:

matter	work_date	first_date
1	01/01/2020	01/01/2020
2	01/02/2020	01/02/2020
1	01/04/2020	01/01/2020
2	01/05/2020	01/02/2020

Right now, I'm using the following code, but it is taking quite a while to run (the dataframe has approx 300k rows and I'm on an ancient computer).

min_dict = {}
def check_dict(val):
    return min_dict.setdefault(val,min(df[df['tmatter']==val]['tworkdt']))

df['first_day'] = df.apply (lambda row: check_dict(row.tmatter), axis = 1)

Is there a better way to approach this?

CodePudding user response：

transform does what you want and should be fast

The steps are (1) group the rows together that have the same matter (2) for each group calculate the minimum work_date and (3) save these values as a new column.

import pandas as pd
import io

df = pd.read_csv(io.StringIO("""
matter  work_date
1   01/01/2020
2   01/02/2020
1   01/04/2020
2   01/05/2020
"""), delim_whitespace=True)

df['first_date'] = df.groupby('matter')['work_date'].transform('min')
print(df)