Home > Software engineering >  Is there a more efficent way to create a new column based on string values in another column?
Is there a more efficent way to create a new column based on string values in another column?

Time:04-23

I have a column in a Dataframe that states the relationship of a person to another, Son, Mother etc. I want to create a new Column that has Close Family, Extended Family, Outsider. So basically, Where Relationship = Daughter Then Close Family.

I have figured out the following way to to do it but am eager to find a more efficent way, as I have a few more columns to do a similar thing with:

def func(a):
    if "Acquaintance" in a:
        return "Outsider"
    elif "Wife" in a:
        return "Close Family"
    elif "Stranger" in a:
        return "Unknown"
    elif "Girlfriend" in a:
        return "Close Family"
    elif "Ex-Husband" in a:
        return "Extended Family"
    elif "Brother" in a:
        return "Close Family"
    elif "Stepdaughter" in a:
        return "Extended Family"
    elif "Husband" in a:
        return "Close Family"
    elif "Sister" in a:
        return "Close Family"
    elif "Friend" in a:
        return "Outsider"
    elif "Family" in a:
        return "Undefined Family"
    elif "Neighbour" in a:
        return "Outsider"
    elif "Father" in a:
        return "Close Family"
    elif "In-Law" in a:
        return "Extended Family"
    elif "Son" in a:
        return "Close Family"
    elif "Ex-Wife" in a:
        return "Extended Family"
    elif "Boyfriend" in a:
        return "Unmarried Partner"
    elif "Mother" in a:
        return "Close Family"
    elif "Common-Law Husband" in a:
        return "Close Family"
    elif "Common-Law Wife" in a:
        return "Close Family"
    elif "Stepfather" in a:
        return "Extended Family"
    elif "Stepson" in a:
        return "Extended Family"
    elif "Stepmother" in a:
        return "Extended Family"
    elif "Daughter" in a:
        return "Close Family"
    elif "Boyfriend/Girlfriend" in a:
        return "Unmarried Partner"
    elif "Employer" in a:
        return "Outsider"
    elif "Employee" in a:
        return "Close Family"
    else:
        return "Unknown"

df["relationship_type"] = df.relationship.apply(lambda x: func(x))
df

As you can see its a very long winded piece of code so hopefully there are some more efficent ways to do this!

Thanks :)

CodePudding user response:

as Nin suggested, create a dictionary and use .map

.apply can be inefficent as you loose the benefits of vectorisation.

d = {
    "Acquaintance": "Outsider",
    "Wife": "Close Family",
    "Stranger": "Unknown",
    "Girlfriend": "Close Family",
    "Ex-Husband": "Extended Family",
    "Brother": "Close Family",
    "Stepdaughter": "Extended Family",
    "Husband": "Close Family",
    "Sister": "Close Family",
    "Friend": "Outsider",
    "Family": "Undefined Family",
    "Neighbour": "Outsider",
    "Father": "Close Family",
    "In-Law": "Extended Family",
    "Son": "Close Family",
    "Ex-Wife": "Extended Family",
    "Boyfriend": "Unmarried Partner",
    "Mother": "Close Family",
    "Common-Law Husband": "Close Family",
    "Common-Law Wife": "Close Family",
    "Stepfather": "Extended Family",
    "Stepson": "Extended Family",
    "Stepmother": "Extended Family",
    "Daughter": "Close Family",
    "Boyfriend/Girlfriend": "Unmarried Partner",
    "Employer": "Outsider",
    "Employee": "Close Family",
}

df['relationship_type'] = df['relationship'].map(d).fillna('Unknown')

CodePudding user response:

Same as @Umar.H but I reversed the dictionary to be easier to maintain:

d = {'Outsider': ['Acquaintance', 'Friend', 'Neighbour', 'Employer'],
     'Close Family': ['Wife', 'Girlfriend', 'Brother', 'Husband', 'Sister',
                      'Father', 'Son', 'Mother', 'Common-Law Husband',
                      'Common-Law Wife', 'Daughter', 'Employee'],
     'Extended Family': ['Ex-Husband', 'Stepdaughter', 'In-Law', 'Ex-Wife',
                         'Stepfather', 'Stepson', 'Stepmother'],
     'Unmarried Partner': ['Boyfriend', 'Boyfriend/Girlfriend'],
     'Unknown': ['Stranger'], 'Undefined Family': ['Family']}

MAPPING = {v: k for k, l in d.items() for v in l}

df['relationship_type'] = df['relationship'].map(MAPPING)

Output:

>>> df
      relationship relationship_type
0     Acquaintance          Outsider
1           In-Law   Extended Family
2        Neighbour          Outsider
3         Stranger           Unknown
4           Mother      Close Family
5         Daughter      Close Family
6           Sister      Close Family
7          Husband      Close Family
8  Common-Law Wife      Close Family
9         Daughter      Close Family
  • Related