Home > Software engineering >  How to produce three new columns from one column that has unequal length in Python (pandas or Numpy)
How to produce three new columns from one column that has unequal length in Python (pandas or Numpy)

Time:08-12

I have a column that is a list of names, some have titles and some don't, how do I create the right columns for the right length for each row?

Names

Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super

But I would like to have

Title    First Name     Last Name
         Simon          Cool
Mrs.     Sarah          Smart
Mr       Harry          Adams
         Rupert         Clever
Miss     Jane           Super

My code that does not work is:

import pandas as pd
title = []
firstname = []
lastname = []

saluations = ['Mr', 'Mr.', 'Mrs', 'Mrs.', 'Miss', 'Ms', 'Ms.', 'Dr.', 'Lord', 'Lady']


x = df.names.str.split(' ').str[0]
y = df.names.str.split(' ').str[1]
z = df.names.str.split(' ').str[-1]

if x in saluations:
    title.append(x)
    firstname.append(y)
else:
    firstname.append(x)
lastname.append(z)

I have also tried:

import pandas as pd
for a in df.names.str.split(' ').str[0]:
    for b in df.names.str.split(' ').str[1]:
        if a in saluations:
            title.append(a)
            firstname.append(b)
        else:
            firstname.append(a)
        
for c in df.names.str.split(' ').str[-1]:
    lastname.append(c)

Any help you have for me I would be very grateful and also any pointers that would make my code neater.

CodePudding user response:

import pandas as pd
from io import StringIO

salutations = [
    'Mr', 'Mr.', 'Mrs', 'Mrs.', 
    'Miss', 'Ms', 'Ms.', 'Dr.', 
    'Lord', 'Lady',
    'Sir', 'Master', 'Dr.',
]

data = '''
Names
Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super
Lordina Mahama
Ladya Cheryl
'''

df = (
    pd.read_csv(StringIO(data), squeeze=True)
    .str.split(n=2)
    .apply(pd.Series)
    .set_axis(['Title','First Name','Last Name'], axis='columns')
)

not_title = ~df['Title'].isin(salutations)
df[not_title] = df[not_title].shift(1, axis=1)

df.fillna('', inplace=True)

CodePudding user response:

Just convert the data row by row. You don't need to search fancy pandas APIs.

This is an example pseudo code.

import pandas

df = ...
honorifics = ['Mr', ...]

new_data = []
for name in df.names:
    tokens = name.split()
    if not tokens[0] in honorifics:
        tokens.insert(0, '')
    new_data.append(tokens)
new_frame = pandas.DataFrame(new_data, columns=['honorific', 'first', 'last'])
  • Related