I want to create a dictionary of full names that is sorted by last name alphabetically. The output would look something like...
{'A': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'],
'B': ['Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'] .... }
until all the names are read and put into their respective sub_lists in the dictionary.
I've tried creating a new dictionary that includes all the English alphabet as keys, and if the last name's first letter matched one of the keys, I would append the full name to a list that is a value of the key. However, the output I get is all the names in a list for that particular key.
Example output: {'A': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'... 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'],
'B': ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev', 'T. Afzar', 'Heidi Agnic'... 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks', 'M.A. Brito Sanfiel', 'Reinhard Brunmeir'],
'C':...... }
I've also tried using the built-in function .update(), but all previous iterated names would be overwritten. The output I would get looks something like this:
{'A': 'M. Azizad',
'B': 'M. Bänninger',
'C': 'S. Czempiel',
'D': 'S. D�\xadas-Mondragón', }
My question is what is the best way for me to separate the names into their respective sub-lists? Thank you in advance!
Some of my code:
sorted_main_db = main_db.sort_values(by="auth_surname")
sorted_main_dict = sorted_main_db.to_dict()
norm_dict = dict.fromkeys(string.ascii_uppercase, [])
unnorm_dict = {}
for key, value in sorted_main_dict.items(): #for key value in sorted main dictionary
for i in value: #iterator in dictionary values
if 'auth_name' in key: #focus on the author's name
if sorted_main_dict['auth_surname'][i] == None or sorted_main_dict['auth_surname'][i][0] not in norm_dict: #accounts for null and letters not in English alphabet
unnorm_dict.update({key: value})
if sorted_main_dict['auth_surname'][i][0] in norm_dict.keys(): #if the first letter of last name matches one of the keys
norm_dict[sorted_main_dict['auth_surname'][i][0]].append(sorted_main_dict['auth_name'][i]) #append that name to the dictionary
CodePudding user response:
Assuming you are using pandas, Here is a quick code for you:
df['l_name_intial'] = df.name.apply(lambda x: x.split(" ")[-1][0])
df2 = df.groupby('l_name_intial')['name'].apply(list)
print(df2)
Which results in:
l_name_intial
A [V. Aakalu, J.L. Accini, Kimberly A. Aeling, K...
B [Nicholas P. Breznay, B. Breznock, Rebecca Bri...
S [M.A. Brito Sanfiel]
Basically you separate the last name initial letter to a separate column. Then you use group by to group them in a list.
CodePudding user response:
it's quite easy, but the real task here is to extract last names correctly, for your provided example it could looks like this:
names = ['V. Aakalu', 'J.L. Accini', 'Kimberly A. Aeling', 'Konstantin Afanaciev',
'T. Afzar', 'Heidi Agnic', 'Nicholas P. Breznay', 'B. Breznock', 'Rebecca Brincks',
'M.A. Brito Sanfiel', 'Reinhard Brunmeir']
s = pd.Series(names)
s.groupby(s.str.extract(r'^. ?\.? ([A-Z])[^\.]',expand=False)).apply(list).to_dict()
#^^^^^^^^^^^^^^^^^^^^ extracts the first letter of a last name
>>> out
'''
{'A': ['V. Aakalu',
'J.L. Accini',
'Kimberly A. Aeling',
'Konstantin Afanaciev',
'T. Afzar',
'Heidi Agnic'],
'B': ['Nicholas P. Breznay',
'B. Breznock',
'Rebecca Brincks',
'M.A. Brito Sanfiel',
'Reinhard Brunmeir']}