Home > database >  identify all records with a certain score, and print the corresponding name
identify all records with a certain score, and print the corresponding name

Time:04-21

So I have data in a .csv file and have turned all the columns in the .csv into data dictionaries containing the relevant information e.g. data_dict['Date'] would give me all the records of dates. there's about 170k records.

What I am trying to do is identify all Countries with a certain score above, lets say 100, and print them. So countries is one column and score is another, but there are about 50 columns total. My thought process was to find the numbers above 100 and then print the corresponding countries.

my data dictionaries look like this, kinda these are just examples. ['Countries'] = AAA, AAB, AAC, AAD...... ['Score'] = 20, 30, 40, 50..... note: the country AAA's score is 20, they are within the same record So the output I want should be like - the countries with scores higher than 100 are x, y, z.......

I dont even know where to start so I cant really provide code. Bonus points if you can divide every 'Score' record by 10 before printing the countries. I know this is a huge long shot but any assistance would be appreciated :)

CodePudding user response:

list_of_dics is a list of dics loaded from csv. countries_with_score_higher_than_100 is you answer.

list_of_dicts = [
        {'Country': 'Germany', 'Score': 50, 'Some_other_data': 3},
        {'Country': 'Poland', 'Score': 90, 'Some_other_data': 7},
        {'Country': 'Hungary', 'Score': 90, 'Some_other_data': 3},
        {'Country': 'America', 'Score': 110, 'Some_other_data': 3},
        {'Country': 'Spain', 'Score': 120, 'Some_other_data': 4},
    ]
    
countries_with_score_higher_than_100 = []
for dic in list_of_dicts:
    if dic['Score'] > 100:
        dic['Score'] = dic['Score'] / 10
        countries_with_score_higher_than_100.append(dic)
    
print(countries_with_score_higher_than_100)

CodePudding user response:

If I understand correctly:

data_dict = {
    "Countries":
        [
            'AAA',
            'AAB',
            'AAC',
            'AAD'
        ],
    "Score":
        [
            20,
            30,
            400,
            500
        ]
}
# first create an index:
i = 0

# Now loop all the countries:
while i < len(data_dict['Countries']):
    # get the score for this country:
    country = data_dict['Countries'][i]
    score = data_dict['Score'][i]
    
    # do the check
    if score > 100:
        # do the print. divide by 10
        print(f"{country} has a score of: {score / 10}")

    # increment the index to process the next record
    i = i   1

Fair warning. Your data should really not be structured like this. It makes it messy when you have to deal with a country that doesn't have a score or any scenario where the data isn't 100% perfect and ordered in the same way.

A cleaner data format would be:

data_dict = {
    "Countries":
        {
            'AAA': {
                "Score": 20, 
                "Date": '2022-04-20'
            },
            'AAB': {
                "Score": 30, 
                "Date": '2022-04-20'
            },
            'AAC': {
                "Score": 400,
                "Date": '2022-04-20'
            },
            'AAD': {
                "Score": 500,
                "Date": '2022-04-20'
            }
        }
}

that way you can keep track of each countries data points by that country:

for country_name, data_points in data_dict["Countries"].items():
    print(f"County name: {country_name}")
    print(f"Score: {data_points['Score']}")
    print(f"Date: {data_points['Date']}")
    print("----")

output:

County name: AAA
Score: 20
Date: 2022-04-20
----
County name: AAB
Score: 30
Date: 2022-04-20
----
County name: AAC
Score: 400
Date: 2022-04-20
----
County name: AAD
Score: 500
Date: 2022-04-20
----
  • Related