age | gender | occupation |
---|---|---|
19 | Female | High School |
45 | Male | Designer |
34 | Both Gender | Coder |
This is how my dataframe looks; wherever the gender is 'Both Gender' I would like to duplicate the rows and create a row for both genders.
Expected Result -
age | gender | occupation |
---|---|---|
19 | Female | High School |
45 | Male | Designer |
34 | Female | Coder |
34 | Male | Coder |
CodePudding user response:
dict1 = {"age": [19,45,34],
"gender": ["F","M","B"],
"occupation" : ["HS","grad","Coder"]}
df1 = pd.DataFrame(dict1)
df = df1[df1.gender == "B"]
df2 = df1[df1.gender == "B"]
df1 = df1[(df1.gender == "M") | (df1.gender == "F")]
df.gender = "M"
df2.gender = "F"
df3 = pd.concat([df1,df,df2])
df3
I think the above code will work.
CodePudding user response:
Assuming by dataframe you mean Pandas, one approach could be to convert values in the 'gender' column to a list of the values you want to appear in the final dataframe and just use the explode
function to create a row for each item in the list in the specified column:
import pandas as pd
data = {'age': [19, 45, 34], 'gender': ['Female', 'Male', 'Both Gender'], 'occupation': ['High School', 'Designer', 'Coder']}
df = pd.DataFrame(data)
df['gender'] = [['Male', 'Female'] if x == 'Both Gender' else x for x in df['gender']]
df = df.explode(column='gender')
This gives you an intermediate step of:
age gender occupation
0 19 Female High School
1 45 Male Designer
2 34 [Male, Female] Coder
Then after using explode, this becomes:
age gender occupation
0 19 Female High School
1 45 Male Designer
2 34 Male Coder
2 34 Female Coder