Home > other >  Python transfering names in Pandas dataframe column into a dictionary alphabetically
Python transfering names in Pandas dataframe column into a dictionary alphabetically

Time:08-05

I want to create a dictionary of full names that is sorted by last name alphabetically. The output would look something like...

{'A': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'],

'B': ['Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'] .... }

until all the names are read and put into their respective sub_lists in the dictionary.

I've tried creating a new dictionary that includes all the English alphabet as keys, and if the last name's first letter matched one of the keys, I would append the full name to a list that is a value of the key. However, the output I get is all the names in a list for that particular key.

Example output: {'A': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'... 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'],

'B': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'... 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'],

'C':...... }

I've also tried using the built-in function .update(), but all previous iterated names would be overwritten. The output I would get looks something like this:

{'A': 'M. Azizad',

'B': 'M. Bänninger',

'C': 'S. Czempiel',

'D': 'S. D�\xadas-Mondragón', }

My question is what is the best way for me to separate the names into their respective sub-lists? Thank you in advance!

Some of my code:

sorted_main_db = main_db.sort_values(by="auth_surname")

sorted_main_dict = sorted_main_db.to_dict()

norm_dict = dict.fromkeys(string.ascii_uppercase, [])

unnorm_dict = {}

for key, value in sorted_main_dict.items(): #for key value in sorted main dictionary

for i in value: #iterator in dictionary values
    
    if 'auth_name' in key: #focus on the author's name
        
        if sorted_main_dict['auth_surname'][i] == None or sorted_main_dict['auth_surname'][i][0] not in norm_dict: #accounts for null and letters not in English alphabet

            unnorm_dict.update({key: value})

        if sorted_main_dict['auth_surname'][i][0] in norm_dict.keys(): #if the first letter of last name matches one of the keys

            norm_dict[sorted_main_dict['auth_surname'][i][0]].append(sorted_main_dict['auth_name'][i]) #append that name to the dictionary
            

CodePudding user response:

Assuming you are using pandas, Here is a quick code for you:

df['l_name_intial'] = df.name.apply(lambda x: x.split(" ")[-1][0])
df2 = df.groupby('l_name_intial')['name'].apply(list)
print(df2)

Which results in:

l_name_intial
A    [V. Aakalu, J.L. Accini, Kimberly A. Aeling, K...
B    [Nicholas P. Breznay, B. Breznock, Rebecca Bri...
S                                 [M.A. Brito Sanfiel]

Basically you separate the last name initial letter to a separate column. Then you use group by to group them in a list.

CodePudding user response:

it's quite easy, but the real task here is to extract last names correctly, for your provided example it could looks like this:

names = ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev',
         'T. Afzar', 'Heidi Agnic', 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 
         'M.A. Brito Sanfiel', 'Reinhard Brunmeir']

s = pd.Series(names)
s.groupby(s.str.extract(r'^. ?\.? ([A-Z])[^\.]',expand=False)).apply(list).to_dict()
                        #^^^^^^^^^^^^^^^^^^^^ extracts the first letter of a last name
>>> out
'''
{'A': ['V. Aakalu',
       'J.L. Accini',
       'Kimberly A. Aeling',
       'Konstantin Afanaciev',
       'T. Afzar',
       'Heidi Agnic'],
 'B': ['Nicholas P. Breznay',
       'B. Breznock',
       'Rebecca Brincks',
       'M.A. Brito Sanfiel',
       'Reinhard Brunmeir']}
  • Related