Home > database >  Loop though one csv file and use the values as parameters to find records in another csv file
Loop though one csv file and use the values as parameters to find records in another csv file

Time:07-19

I have two csv files that look like this:

table 1:

ID Urgent number
123 2
234 3

table 2:

ID Part Date
123 A 01/01/2022
123 A 01/01/2022
123 A 01/01/2022
123 A 01/01/2022
123 A 01/01/2022
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022

I'm trying to take the first table and use the ID to search the second table for the number of records that are in the 'urgent number' column and then assign a value to a new column. table 2 can have more records than the 'urgent number' in table 1 so I’m guessing that i need to use loops to do it.

I've put the data into data frames and found some code that has helped me loop through the first table, but where im stuck now using this in another loop to search the second table.

    table1 = pd.read_csv (r'CustomerFloat.csv')
    table2 = pd.read_csv (r'EKanbanOrderbook.csv')

    content_of_rows = {}
    for row in CF.itertuples():
        index = row[0]
        uniqueID = row[1]
        urgentDeficit = row[6]
   
        content_of_rows.update({index:{"uniqueID":uniqueID, 
    "urgentDeficit": urgentDeficit}})
    #print(content_of_rows)

    for row in content_of_rows:
    
        print(content_of_rows[row])

I tried this to get it to start looping through the second table and it just returns 0 and I’m not sure what else to try or search to solve this.

    for row in content_of_rows:
        i = 0
        while i < urgentDeficit:
        
            print(urgentDeficit)
            i  = 1  

Can anyone point me in the right direction?

The output should look like this:

ID Part Date urgent
123 A 01/01/2022 x
123 A 01/01/2022 x
123 A 01/01/2022 x
123 A 01/01/2022
123 A 01/01/2022
234 B 01/01/2022 x
234 B 01/01/2022 x
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022
234 B 01/01/2022

Thanks

CodePudding user response:

You can achieve this by using pandas package. The question isnt clear enough, so you can put the expected output there. I don't know if it gives the result as you expected but you can try this :

import pandas as pd

file_one = pd.read_csv('CustomerFloat.csv')
file_two =  pd.read_csv('EKanbanOrderbook.csv')

final_data = file_two[(file_one['ID'].isin(file_two['ID']))]
print(final_data)

You can use pandas feature to compare rather than iterating the file because if the file data adds up then the program may take time to process.

CodePudding user response:

You can join the two tables based on ID which is common to both:

table1 = pd.read_csv (r'CustomerFloat.csv')
table2 = pd.read_csv (r'EKanbanOrderbook.csv')
table2.merge(table1, on='ID')

By default merge does an inner join how='inner', meaning that it only keeps rows that have urgent number. If you want all rows from table2 you should do a left join:

table2.merge(table1, on='ID', how='left')

In this case, you will have all the rows from table2 and None if the row doesn't have any urgent number.

After that you groupby based on ID and loop over each group and populate only urgent number of rows. If you provide part of the dataset I can help you with that.

  • Related