Comparing a specific string in two csv files and making third csv based on those strings-CodePudding

I have this csv file:

OU                                   Mailfile   
CORP:Jenny Smith:                    mail/246802.nsf
"CORP:John Smith:,John Smith:"       mail/123456.nsf
STORE:Mary Poppins:                  mail/789012.nsf
STORE:Tony Stark:                    mail/345678.nsf
STORE:Carmen Sandiego:               mail/901234.nsf
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
STORES:store123                      mail/369121.nsf
NEWS:CORPmanager                     mail/137112.nsf
NEWS:STOREmanager                    mail/083561.nsf

Then file2.csv:

OU                      
CORP
STORE     
NEWS

For every line in file2.csv that has 'CORP', 'STORE', or 'NEWS', I want to search through file1.csv and create a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.

Such as NEWSall.csv

OU                                   Mailfile

NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf
NEWS:CORPmanager                     mail/137112.nsf
NEWS:STOREmanager                    mail/083561.nsf

But I'm getting this output instead:

In CORPall.csv

OU                                   Mailfile
CORP:Jenny Smith:                    mail/246802.nsf
CORP:John Smith:,John Smith:         mail/123456.nsf
NEWS:CORPmanager                     mail/137112.nsf

Then NEWSall.csv

OU                                   Mailfile
NEWS:Peter Parker:                   mail/567890.nsf
NEWS:Clark Kent:                     mail/654321.nsf

Since CORP is also in news, it appears in the CORPall.csv. I don't want that. If it says NEWS before the : then that should be in NEWSall.csv, etc.

I used this snippet that someone showed me on here:

for dept in OUList['OU']:
    df[df['OU'].str.contains(dept)]
    df_dept['OU'].to_csv(f'{dept}all.csv', index=False)

It works great. But it only looks for what's in the whole file. So I need a way to search before the : (mind there's two in some cases)

CodePudding user response：

You can use str.startswith to filter the dataframe by checking if the start of each string element matches the pattern in the OUList.

for dept in OUList['OU']:
    df_dept = df[df['OU'].str.startswith(f'{dept}:')]
    df_dept.to_csv(f'{dept}all.csv', index=False)

CodePudding user response：

It sounds like the "OU" column should be split into two columns on the : character. You can do this with df['OU'].str.split(':'). Save the output to new columns and then you can use the same filter technique on the column created from the left of :