Home > Enterprise >  How can I Merge rows by some value in a column in pandas on specific condition?
How can I Merge rows by some value in a column in pandas on specific condition?

Time:09-23

I need a help in pandas to group the rows based on a specific condition. I have a dataset as follows:

    Name    Source  Country     Severity

    ABC     XYZ     USA         Low     
    DEF     XYZ     England     High        
    ABC     XYZ     India       Medium   
    EFG     XYZ     Algeria     High   
    DEF     XYZ     UK          Medium  
    

I want to group these rows based on the Name field in such a way that Country should be appended by rows in the column and Severity is set based on Its Highest priority.

After that output table looks like this:

    Name    Source  Country      Severity

    ABC     XYZ     USA, India   Medium     
    DEF     XYZ     England, UK  High        
    EFG     XYZ     Algeria      High 

I am able to aggregate the first 3 columns using below code but not get solution for merging severity.

df = df.groupby('Name').agg({'source':'first', 'Country': ', '.join })

CodePudding user response:

You should convert your Severity to an ordered Categorical.

This enables you to use a simple max:

df['Severity'] = pd.Categorical(df['Severity'],
                                categories=['Low', 'Medium', 'High'],
                                ordered=True)

out = (df
 .groupby('Name')
 .agg({'Source':'first',
       'Country': ', '.join,
       'Severity': 'max'})
)

output:

     Source      Country Severity
Name                             
ABC     XYZ   USA, India   Medium
DEF     XYZ  England, UK     High
EFG     XYZ      Algeria     High

CodePudding user response:

You can try converting the Severity column to a number, then aggregating Severity-number based on max, and then converting the Severity-number column back to a word like so:

import pandas

dataframe = pandas.DataFrame({'Name': ['ABC', 'DEF', 'ABC', 'EFG', 'DEF'], 'Source': ['XYZ', 'XYZ', 'XYZ', 'XYZ', 'XYZ'], 'Country': ['USA', 'England', 'India', 'Algeria', 'UK'], 'Severity': ['Low', 'High', 'Medium', 'High', 'Medium']})
severity_to_number = {'Low': 1, 'Medium': 2, 'High': 3}
severity_to_word = inv_map = {v: k for k, v in severity_to_number.items()}

dataframe['Severity-number'] = dataframe['Severity'].replace(severity_to_number)
dataframe = dataframe.groupby('Name').agg({'Source':'first', 'Country': ', '.join, 'Severity-number':'max' })
dataframe['Severity'] = dataframe['Severity-number'].replace(severity_to_word)
del dataframe['Severity-number']
print(dataframe)

Output

     Source      Country Severity
Name
ABC     XYZ   USA, India   Medium
DEF     XYZ  England, UK     High
EFG     XYZ      Algeria     High
  • Related