How to produce three new columns from one column that has unequal length in Python (pandas or Numpy)-CodePudding

I have a column that is a list of names, some have titles and some don't, how do I create the right columns for the right length for each row?

Names

Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super

But I would like to have

Title    First Name     Last Name
         Simon          Cool
Mrs.     Sarah          Smart
Mr       Harry          Adams
         Rupert         Clever
Miss     Jane           Super

My code that does not work is:

import pandas as pd
title = []
firstname = []
lastname = []

saluations = ['Mr', 'Mr.', 'Mrs', 'Mrs.', 'Miss', 'Ms', 'Ms.', 'Dr.', 'Lord', 'Lady']


x = df.names.str.split(' ').str[0]
y = df.names.str.split(' ').str[1]
z = df.names.str.split(' ').str[-1]

if x in saluations:
    title.append(x)
    firstname.append(y)
else:
    firstname.append(x)
lastname.append(z)

I have also tried:

import pandas as pd
for a in df.names.str.split(' ').str[0]:
    for b in df.names.str.split(' ').str[1]:
        if a in saluations:
            title.append(a)
            firstname.append(b)
        else:
            firstname.append(a)
        
for c in df.names.str.split(' ').str[-1]:
    lastname.append(c)

Any help you have for me I would be very grateful and also any pointers that would make my code neater.

CodePudding user response：

import pandas as pd
from io import StringIO

salutations = [
    'Mr', 'Mr.', 'Mrs', 'Mrs.', 
    'Miss', 'Ms', 'Ms.', 'Dr.', 
    'Lord', 'Lady',
    'Sir', 'Master', 'Dr.',
]

data = '''
Names
Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super
Lordina Mahama
Ladya Cheryl
'''

df = (
    pd.read_csv(StringIO(data), squeeze=True)
    .str.split(n=2)
    .apply(pd.Series)
    .set_axis(['Title','First Name','Last Name'], axis='columns')
)

not_title = ~df['Title'].isin(salutations)
df[not_title] = df[not_title].shift(1, axis=1)

df.fillna('', inplace=True)

CodePudding user response：

Just convert the data row by row. You don't need to search fancy pandas APIs.

This is an example pseudo code.

import pandas

df = ...
honorifics = ['Mr', ...]

new_data = []
for name in df.names:
    tokens = name.split()
    if not tokens[0] in honorifics:
        tokens.insert(0, '')
    new_data.append(tokens)
new_frame = pandas.DataFrame(new_data, columns=['honorific', 'first', 'last'])