Home > front end >  Duplicate and rename column value where condition is satisfied in dataframe
Duplicate and rename column value where condition is satisfied in dataframe

Time:02-28

age gender occupation
19 Female High School
45 Male Designer
34 Both Gender Coder

This is how my dataframe looks; wherever the gender is 'Both Gender' I would like to duplicate the rows and create a row for both genders.

Expected Result -

age gender occupation
19 Female High School
45 Male Designer
34 Female Coder
34 Male Coder

CodePudding user response:

dict1 = {"age": [19,45,34],
         "gender": ["F","M","B"],
         "occupation" : ["HS","grad","Coder"]}
df1 = pd.DataFrame(dict1)    
df = df1[df1.gender == "B"]
df2 = df1[df1.gender == "B"]
df1 = df1[(df1.gender == "M") | (df1.gender == "F")]
df.gender = "M"
df2.gender = "F"
df3 = pd.concat([df1,df,df2])
df3

I think the above code will work.

CodePudding user response:

Assuming by dataframe you mean Pandas, one approach could be to convert values in the 'gender' column to a list of the values you want to appear in the final dataframe and just use the explode function to create a row for each item in the list in the specified column:

import pandas as pd

data = {'age': [19, 45, 34], 'gender': ['Female', 'Male', 'Both Gender'], 'occupation': ['High School', 'Designer', 'Coder']}
df = pd.DataFrame(data)

df['gender'] = [['Male', 'Female']  if x == 'Both Gender' else x for x in df['gender']]
df = df.explode(column='gender')

This gives you an intermediate step of:

   age          gender   occupation
0   19          Female  High School
1   45            Male     Designer
2   34  [Male, Female]        Coder

Then after using explode, this becomes:

   age  gender   occupation
0   19  Female  High School
1   45    Male     Designer
2   34    Male        Coder
2   34  Female        Coder
  • Related