I have a table with 3 main columns. I would like to first group the data by Company ID, then get the Highest Post Valuation per Company ID, and its corresponding Deal Date.
Question: How do I add corresponding Deal Date in?
The data:
Company ID | Post Valuation | Deal Date | ||
---|---|---|---|---|
60 | 119616-85 | NaN | 2022-03-01 | |
80 | 160988-50 | 6.77 | 2022-02-10 | |
85 | 108827-47 | NaN | 2022-02-01 | |
89 | 154876-33 | 1.40 | 2022-01-27 | |
104 | 435509-92 | 6.16 | 2022-01-05 | |
107 | 186777-73 | 17.26 | 2022-01-03 | |
111 | 232001-47 | NaN | 2022-01-01 | |
113 | 160988-50 | NaN | 2021-12-31 | |
119 | 114196-78 | NaN | 2021-12-15 | |
128 | 481375-00 | 2.82 | 2021-12-01 | |
130 | 128348-20 | NaN | 2021-11-25 | |
131 | 166855-60 | 658.36 | 2021-11-25 | |
150 | 113503-87 | NaN | 2021-10-20 | |
156 | 178448-68 | 21.75 | 2021-10-07 | |
170 | 479007-64 | NaN | 2021-09-13 | |
182 | 128479-51 | NaN | 2021-09-01 | |
185 | 113503-87 | NaN | 2021-08-31 | |
186 | 128348-20 | NaN | 2021-08-30 | |
191 | 108643-42 | 8.02 | 2021-08-13 | |
192 | 186272-74 | NaN | 2021-08-12 |
The attempt
df_X.sort_values('Post Valuation', ascending=True).groupby('Company ID', as_index=False)['Post Valuation'].first()
CodePudding user response:
Sort and drop duplicates:
result = df.sort_values('Post Valuation').drop_duplicates(subset='Company ID', keep='last')