Home > database >  Is there a way to rank male and female names in a list based on their average ranking number?
Is there a way to rank male and female names in a list based on their average ranking number?

Time:03-17

I am trying to figure out how to compile a list of the top 10 male and female names based on their average rank and a second list of the top 10 male and female names based on their birth rate. I have included my code so far that has put each name into a list with their birth rate and what their ranking was over the past 10 years.

# Importing Required Modules
import requests
import re

#Function that sorts the lists into dictionaries
def sexName(unorderedList):
    orderedList = []
    checker = True
    for individualPerson in unorderedList:
        counter = int(0)
        if len(orderedList) == 0:
            orderedList.append(individualPerson)
        else:
            for groupedPerson in orderedList:
                if groupedPerson[0] == individualPerson[0]:
                    checker = True
                    break
                else:
                    checker = False
                counter  = 1
            
            if checker == False:
                orderedList.append(individualPerson)
            else:
                orderedList[counter][1]  = individualPerson[1]
                orderedList[counter].append(individualPerson[2])

    return orderedList


# Declarations
pattern = r"<td>([0-9]{1,4})</td> <td>([A-z]*)</td><td>([0-9]*,*[0-9]*)</td>\n <td>([A-z]*)</td>\n<td>([0-9]*,*[0-9]*)</td>"
year = 2010
listOfnames = []
maleNamesunorded = list()
femaleNamesunorded = list()

#Loop that pulls data
while(year <= 2019):
    #List that holds the tuples, cleared for every year
    listOfnames.clear()
    
    #Pulling the data and adding it to the list
    url = "https://www.ssa.gov/cgi-bin/popularnames.cgi"
    dataSent = {'year':year, 'top':10, 'number':'n'}
    dataReceived = requests.post(url, data=dataSent).text
    listOfnames = re.findall(pattern, dataReceived)
    
    #Creating the lists of top ten male and female names from 2010-2019
    for name in listOfnames:
        if(int(name[0]) <= 10):
            maleNamesunorded.append([name[1], int(name[2].replace(",","")), int(name[0])])    #Name, births, rank (of first year the name appears)
            femaleNamesunorded.append([name[3], int(name[4].replace(",","")), int(name[0])])
            
        else:   #If their rank isn't in the top 10 this line skips them
            pass
    year  = 1

maleNamesordered = sexName(maleNamesunorded)
femaleNamesordered = sexName(femaleNamesunorded)


print("Female names: ")
for i in femaleNamesordered:
    print(i)

print()
print("Male names: ")
for i in maleNamesordered:
    print(i)

print("Number of female names: ", len(femaleNamesordered))
print("Number of female names: ", len(maleNamesordered))

Thank you in advance!!

CodePudding user response:

Put your data in a dataframe and use pandas rank: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html

CodePudding user response:

If I understand your question, the problem seems to be with sorting the lists as you've prepared them.

Ranking based on the birth rates is slightly simpler, so we can start there.

A simple way to sort in python is to use sorted (see: https://realpython.com/python-sort/)

The call will look like:

sortedByRate = sorted(maleNamesordered, key=birthRateKey, reverse=True)

Here, birthRateKey is a function created to return the 'key' you want to sort by, in this case the birthrate, which is at index [1] in your list, so that function looks like:

def birthRateKey(v):
    return v[1]

reverse=True just puts them in 'ascending' order. Strictly speaking, there isn't a need to define a function, you can an anonymous function that accomplishes the same thing:

maleNamesorderedByBirthRate = sorted(maleNamesordered, key=lambda i: i[1], reverse=True)

The other part of your questions, based on average rank is slightly more involved. First, we want to average all the ranks for the past 10 years for each name. A kink is that for some names they weren't in the top 10.

I am not sure how you want to handle this. To average all the ranks you have for each name, you could do something like:

femaleNameAverageRanking = []

for nameDataList in femaleNamesordered:
    aggregate = sum(nameDataList[2:])
    average = aggregate / 10.0
    femaleNameAverageRanking.append([nameDataList[0], average])  

Here, I use sum() to add up all the ranks that you have, and then divide by 10 to calculate a rank. I don't believe this is exactly what you need, but provide it as an example. Note that [2:] is the python slice syntax, and it will give you an array starting at index 2 and extending until the end. sum() is function that will in this case, add up all the elements of the array.

Hopefully this puts you on the right track. I believe you may need to take a closer look at how you are paring down your list -- you may want to get the annual ranks for names outside of the 10 year window and use that for the averaged ranking.

I've tried to keep the code as simple as possible, you can definitely find more concise ways to accomplish this using some of python's fancier features, but I tried to use things that you were more likely to have come across based on your pasted code style.

  • Related