I have a column that is a list of names, some have titles and some don't, how do I create the right columns for the right length for each row?
Names
Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super
But I would like to have
Title First Name Last Name
Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super
My code that does not work is:
import pandas as pd
title = []
firstname = []
lastname = []
saluations = ['Mr', 'Mr.', 'Mrs', 'Mrs.', 'Miss', 'Ms', 'Ms.', 'Dr.', 'Lord', 'Lady']
x = df.names.str.split(' ').str[0]
y = df.names.str.split(' ').str[1]
z = df.names.str.split(' ').str[-1]
if x in saluations:
title.append(x)
firstname.append(y)
else:
firstname.append(x)
lastname.append(z)
I have also tried:
import pandas as pd
for a in df.names.str.split(' ').str[0]:
for b in df.names.str.split(' ').str[1]:
if a in saluations:
title.append(a)
firstname.append(b)
else:
firstname.append(a)
for c in df.names.str.split(' ').str[-1]:
lastname.append(c)
Any help you have for me I would be very grateful and also any pointers that would make my code neater.
CodePudding user response:
import pandas as pd
from io import StringIO
salutations = [
'Mr', 'Mr.', 'Mrs', 'Mrs.',
'Miss', 'Ms', 'Ms.', 'Dr.',
'Lord', 'Lady',
'Sir', 'Master', 'Dr.',
]
data = '''
Names
Simon Cool
Mrs. Sarah Smart
Mr Harry Adams
Rupert Clever
Miss Jane Super
Lordina Mahama
Ladya Cheryl
'''
df = (
pd.read_csv(StringIO(data), squeeze=True)
.str.split(n=2)
.apply(pd.Series)
.set_axis(['Title','First Name','Last Name'], axis='columns')
)
not_title = ~df['Title'].isin(salutations)
df[not_title] = df[not_title].shift(1, axis=1)
df.fillna('', inplace=True)
CodePudding user response:
Just convert the data row by row. You don't need to search fancy pandas APIs.
This is an example pseudo code.
import pandas
df = ...
honorifics = ['Mr', ...]
new_data = []
for name in df.names:
tokens = name.split()
if not tokens[0] in honorifics:
tokens.insert(0, '')
new_data.append(tokens)
new_frame = pandas.DataFrame(new_data, columns=['honorific', 'first', 'last'])