I have the following pandas dataframe:
Year | Quarter | Year-Month | Current Value | Future Value | Date |
---|---|---|---|---|---|
2021 | Q3 | 2021-10 | 0.411 | NaN | 2021-10-01 |
2021 | Q4 | 2022-01 | -0.100 | NaN | 2022-01-01 |
2022 | Q1 | 2022-04 | -0.224 | NaN | 2022-04-01 |
2022 | Q1 | 2022-03 | 0.110 | 0.092 | 2022-03-01 |
In the last two rows it can be seen that the Year and Quarter values are duplicates but the Year-Month, no. So, what I want to do in this case is to identify such cases. In this example where Year and Quarter are duplicates, I want to take the value from the last one, move it to the row above and delete the last row. So the result will look like this:
Year | Quarter | Year-Month | Current Value | Future Value | Date |
---|---|---|---|---|---|
2021 | Q3 | 2021-10 | 0.411 | NaN | 2021-10-01 |
2021 | Q4 | 2022-01 | -0.100 | NaN | 2022-01-01 |
2022 | Q1 | 2022-04 | -0.224 | 0.092 | 2022-04-01 |
CodePudding user response:
Assuming rows are sorted by decreasing Year-Month, you can use GroupBy.agg
with a crafted dictionary for aggregation ('first' value by default, 'last' for "Future Value"):
d = {c: 'first' for c in df}
d['Future Value'] = 'last'
out = df.groupby(['Year', 'Quarter'], as_index=False).agg(d)
output:
Year Quarter Year-Month Current Value Future Value Date
0 2021 Q3 2021-10 0.411 NaN 2021-10-01
1 2021 Q4 2022-01 -0.100 NaN 2022-01-01
2 2022 Q1 2022-04 -0.224 0.092 2022-04-01