How to turn last name, name and second name into initials?-CodePudding

I have a dataframe with last name, name and second name:

name
Johnson John William
Peterson Andrew James
Burnham Edward Alexander
....

I want to create new column "initials" which will take only last name and add with underscore first letters of first and second names:

name                         initials
Johnson John William        Johnson_J_W
Peterson Andrew James       Peterson_A_J
Burnham Edward Alexander    Burnham_E_A
....

How could I do that in short way? I have idea of using split() and than create three columns, extracting first letters from two of them than joining all three again with underscores, but it seems inefficient

CodePudding user response：

Assuming pandas, you can use a simple regex, you will benefit from a vectorized (i.e. fast) string operation:

df['initials'] = df['name'].str.replace(r'\s ([A-Z])[a-z] ', r'_\1', regex=True)

If the case doesn't matter:

df['initials'] = df['name'].str.replace(r'\s(\w)\w ', r'_\1', regex=True)

output:

                       name      initials
0      Johnson John William   Johnson_J_W
1     Peterson Andrew James  Peterson_A_J
2  Burnham Edward Alexander   Burnham_E_A

CodePudding user response：

I would use pandas' apply method, passing a function ('to_initials') that processes each entry in the 'name' column of the dataframe.

def to_initials(x):
last, first, second = x.split(" ")
return last "_" first[0] "_" second[0]

df = pd.DataFrame({"name":["Johnson John William","Peterson Andrew James","Burnham Edward Alexander"]})

df["initials"] = df["name"].apply(to_initials)

or it's possible to use python's lambda functions and do it in one line.

df["initials"] = df["name"].apply(lambda x: x.split(" ")[0] "_" x.split(" ")[1][0] "_" x.split(" ")[2][0])

If there are entries with only one name, you would have to extend this function though.