I need a help in pandas to group the rows based on a specific condition. I have a dataset as follows:
Name Source Country Severity
ABC XYZ USA Low
DEF XYZ England High
ABC XYZ India Medium
EFG XYZ Algeria High
DEF XYZ UK Medium
I want to group these rows based on the Name field in such a way that Country should be appended by rows in the column and Severity is set based on Its Highest priority.
After that output table looks like this:
Name Source Country Severity
ABC XYZ USA, India Medium
DEF XYZ England, UK High
EFG XYZ Algeria High
I am able to aggregate the first 3 columns using below code but not get solution for merging severity.
df = df.groupby('Name').agg({'source':'first', 'Country': ', '.join })
CodePudding user response:
You should convert your Severity to an ordered Categorical
.
This enables you to use a simple max
:
df['Severity'] = pd.Categorical(df['Severity'],
categories=['Low', 'Medium', 'High'],
ordered=True)
out = (df
.groupby('Name')
.agg({'Source':'first',
'Country': ', '.join,
'Severity': 'max'})
)
output:
Source Country Severity
Name
ABC XYZ USA, India Medium
DEF XYZ England, UK High
EFG XYZ Algeria High
CodePudding user response:
You can try converting the Severity column to a number, then aggregating Severity-number based on max, and then converting the Severity-number column back to a word like so:
import pandas
dataframe = pandas.DataFrame({'Name': ['ABC', 'DEF', 'ABC', 'EFG', 'DEF'], 'Source': ['XYZ', 'XYZ', 'XYZ', 'XYZ', 'XYZ'], 'Country': ['USA', 'England', 'India', 'Algeria', 'UK'], 'Severity': ['Low', 'High', 'Medium', 'High', 'Medium']})
severity_to_number = {'Low': 1, 'Medium': 2, 'High': 3}
severity_to_word = inv_map = {v: k for k, v in severity_to_number.items()}
dataframe['Severity-number'] = dataframe['Severity'].replace(severity_to_number)
dataframe = dataframe.groupby('Name').agg({'Source':'first', 'Country': ', '.join, 'Severity-number':'max' })
dataframe['Severity'] = dataframe['Severity-number'].replace(severity_to_word)
del dataframe['Severity-number']
print(dataframe)
Output
Source Country Severity
Name
ABC XYZ USA, India Medium
DEF XYZ England, UK High
EFG XYZ Algeria High