Home > database >  Merge rows and convert a string in row value to a user-defined one when condition related to other c
Merge rows and convert a string in row value to a user-defined one when condition related to other c

Time:06-09

Assuming I'm dealing with this dataframe:

ID Qualified Year Amount A Amount B
1 No 2020 0 150
1 No 2019 0 100
1 Yes 2019 10 15
1 No 2018 0 100
1 Yes 2018 10 150
2 Yes 2020 0 200
2 No 2017 0 100
... ... ... ...

My desired output should be like this:

ID Qualified Year Amount A Amount B
1 No 2020 0 150
1 Partial 2019 10 115
1 Partial 2018 10 250
2 Yes 2020 0 200
2 No 2017 0 100
... ... ... ...

As you can see, Qualified column creates new merged values (Yes & No -> Partial, amount A B ) from a condition: a year in an ID includes both Yes and No in Qualified column.

Don't know how to approach it. Anyone could provide any methodology?

CodePudding user response:

You can use the function agg() and groupby() to perform this operation.

agg() allows you to use not only common aggregation functions (such as sum, mean, etc.) but also custom defined functions.

I would do as follows:

def agg_qualify(x):
    values = x.unique()
    if len(x)>1:
        return 'Partial'
    return values[0]

df.groupby(['ID', 'Year']).agg({
    'Qualified': lambda x: agg_qualify(x),
    'Amount A': 'sum',
    'Amount B': 'sum',
    }).reset_index()

Output:

   ID   Year Qualified  Amount A  Amount B
0  1   2018    Partial        10     250.0
1  1   2019    Partial        10     115.0
2  1   2020        No          0     150.0
3  2   2020       Yes          0     200.0
  • Related