I have a column in a Dataframe that states the relationship of a person to another, Son, Mother etc. I want to create a new Column that has Close Family, Extended Family, Outsider. So basically, Where Relationship = Daughter Then Close Family.
I have figured out the following way to to do it but am eager to find a more efficent way, as I have a few more columns to do a similar thing with:
def func(a):
if "Acquaintance" in a:
return "Outsider"
elif "Wife" in a:
return "Close Family"
elif "Stranger" in a:
return "Unknown"
elif "Girlfriend" in a:
return "Close Family"
elif "Ex-Husband" in a:
return "Extended Family"
elif "Brother" in a:
return "Close Family"
elif "Stepdaughter" in a:
return "Extended Family"
elif "Husband" in a:
return "Close Family"
elif "Sister" in a:
return "Close Family"
elif "Friend" in a:
return "Outsider"
elif "Family" in a:
return "Undefined Family"
elif "Neighbour" in a:
return "Outsider"
elif "Father" in a:
return "Close Family"
elif "In-Law" in a:
return "Extended Family"
elif "Son" in a:
return "Close Family"
elif "Ex-Wife" in a:
return "Extended Family"
elif "Boyfriend" in a:
return "Unmarried Partner"
elif "Mother" in a:
return "Close Family"
elif "Common-Law Husband" in a:
return "Close Family"
elif "Common-Law Wife" in a:
return "Close Family"
elif "Stepfather" in a:
return "Extended Family"
elif "Stepson" in a:
return "Extended Family"
elif "Stepmother" in a:
return "Extended Family"
elif "Daughter" in a:
return "Close Family"
elif "Boyfriend/Girlfriend" in a:
return "Unmarried Partner"
elif "Employer" in a:
return "Outsider"
elif "Employee" in a:
return "Close Family"
else:
return "Unknown"
df["relationship_type"] = df.relationship.apply(lambda x: func(x))
df
As you can see its a very long winded piece of code so hopefully there are some more efficent ways to do this!
Thanks :)
CodePudding user response:
as Nin suggested, create a dictionary and use .map
.apply
can be inefficent as you loose the benefits of vectorisation.
d = {
"Acquaintance": "Outsider",
"Wife": "Close Family",
"Stranger": "Unknown",
"Girlfriend": "Close Family",
"Ex-Husband": "Extended Family",
"Brother": "Close Family",
"Stepdaughter": "Extended Family",
"Husband": "Close Family",
"Sister": "Close Family",
"Friend": "Outsider",
"Family": "Undefined Family",
"Neighbour": "Outsider",
"Father": "Close Family",
"In-Law": "Extended Family",
"Son": "Close Family",
"Ex-Wife": "Extended Family",
"Boyfriend": "Unmarried Partner",
"Mother": "Close Family",
"Common-Law Husband": "Close Family",
"Common-Law Wife": "Close Family",
"Stepfather": "Extended Family",
"Stepson": "Extended Family",
"Stepmother": "Extended Family",
"Daughter": "Close Family",
"Boyfriend/Girlfriend": "Unmarried Partner",
"Employer": "Outsider",
"Employee": "Close Family",
}
df['relationship_type'] = df['relationship'].map(d).fillna('Unknown')
CodePudding user response:
Same as @Umar.H but I reversed the dictionary to be easier to maintain:
d = {'Outsider': ['Acquaintance', 'Friend', 'Neighbour', 'Employer'],
'Close Family': ['Wife', 'Girlfriend', 'Brother', 'Husband', 'Sister',
'Father', 'Son', 'Mother', 'Common-Law Husband',
'Common-Law Wife', 'Daughter', 'Employee'],
'Extended Family': ['Ex-Husband', 'Stepdaughter', 'In-Law', 'Ex-Wife',
'Stepfather', 'Stepson', 'Stepmother'],
'Unmarried Partner': ['Boyfriend', 'Boyfriend/Girlfriend'],
'Unknown': ['Stranger'], 'Undefined Family': ['Family']}
MAPPING = {v: k for k, l in d.items() for v in l}
df['relationship_type'] = df['relationship'].map(MAPPING)
Output:
>>> df
relationship relationship_type
0 Acquaintance Outsider
1 In-Law Extended Family
2 Neighbour Outsider
3 Stranger Unknown
4 Mother Close Family
5 Daughter Close Family
6 Sister Close Family
7 Husband Close Family
8 Common-Law Wife Close Family
9 Daughter Close Family