How do I get minimum date of a group in pandas and get corresponding value of a column value in different column using pandas?
data = {'date_value': ['2014-06-22', '2014-06-24', '2014-06-25', '2014-06-25', '2014-07-02', '2014-07-06', '2014-07-05', '2014-07-27'],
'type': ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C'],
'sum_col': [1, 2, 3, 5, 4, 3, 2, 1]}
df = pd.DataFrame(data, columns=['date_value', 'type', 'sum_col'])
df['date_value'] = pd.to_datetime(df['date_value'])
df[['min_date','new_sum_col']] = df.groupby('type')['date_value','sum_col'].transform('min')
The result I get is :
date_value type sum_col min_date new_sum_col
0 2014-06-22 A 1 2014-06-22 1
1 2014-06-24 A 2 2014-06-22 1
2 2014-06-25 A 3 2014-06-22 1
3 2014-06-25 B 5 2014-06-25 2
4 2014-07-02 B 4 2014-06-25 2
5 2014-07-06 B 3 2014-06-25 2
6 2014-07-05 B 2 2014-06-25 2
7 2014-07-27 C 1 2014-07-27 1
Result I want is :
date_value type sum_col min_date new_sum_col
0 2014-06-22 A 1 2014-06-22 1
1 2014-06-24 A 2 2014-06-22 1
2 2014-06-25 A 3 2014-06-22 1
3 2014-06-25 B 5 2014-06-25 5
4 2014-07-02 B 4 2014-06-25 5
5 2014-07-06 B 3 2014-06-25 5
6 2014-07-05 B 2 2014-06-25 5
7 2014-07-27 C 1 2014-07-27 1
Any help would be appreciated.
Thank you,
Sam
CodePudding user response:
You can use transform('first')
after sorting by the date_value
to get both column values for the row with the minimum date:
df[['min_date', 'new_sum_col']] = df.sort_values('date_value', ascending=True).groupby('type')['date_value', 'sum_col'].transform('first')
CodePudding user response:
Alternative method using idxmin
:
df[['min_date', 'new_sum_col']] = df.loc[df.groupby('type')['date_value'].transform('idxmin'), ['date_value', 'sum_col']].values