Home > Mobile >  Comparing a specific string in two csv files and making third csv based on those strings
Comparing a specific string in two csv files and making third csv based on those strings

Time:02-12

I have this csv file:

OU                                   Mailfile   
CORP:Jenny Smith:                    mail/246802.nsf
"CORP:John Smith:,John Smith:"       mail/123456.nsf
STORE:Mary Poppins:                  mail/789012.nsf
STORE:Tony Stark:                    mail/345678.nsf
STORE:Carmen Sandiego:               mail/901234.nsf
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
STORES:store123                      mail/369121.nsf
NEWS:CORPmanager                     mail/137112.nsf
NEWS:STOREmanager                    mail/083561.nsf

Then file2.csv:

OU                      
CORP
STORE     
NEWS

For every line in file2.csv that has 'CORP', 'STORE', or 'NEWS', I want to search through file1.csv and create a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.

Such as NEWSall.csv

OU                                   Mailfile

NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
NEWS:CORPmanager                     mail/137112.nsf
NEWS:STOREmanager                    mail/083561.nsf

But I'm getting this output instead:

In CORPall.csv

OU                                   Mailfile
CORP:Jenny Smith:                    mail/246802.nsf
CORP:John Smith:,John Smith:         mail/123456.nsf
NEWS:CORPmanager                     mail/137112.nsf

Then NEWSall.csv

OU                                   Mailfile
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf

Since CORP is also in news, it appears in the CORPall.csv. I don't want that. If it says NEWS before the : then that should be in NEWSall.csv, etc.

I used this snippet that someone showed me on here:

for dept in OUList['OU']:
    df[df['OU'].str.contains(dept)]
    df_dept['OU'].to_csv(f'{dept}all.csv', index=False)

It works great. But it only looks for what's in the whole file. So I need a way to search before the : (mind there's two in some cases)

CodePudding user response:

You can use str.startswith to filter the dataframe by checking if the start of each string element matches the pattern in the OUList.

for dept in OUList['OU']:
    df_dept = df[df['OU'].str.startswith(f'{dept}:')]
    df_dept.to_csv(f'{dept}all.csv', index=False)

CodePudding user response:

It sounds like the "OU" column should be split into two columns on the : character. You can do this with df['OU'].str.split(':'). Save the output to new columns and then you can use the same filter technique on the column created from the left of :

  • Related