I have this csv file:
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
"CORP:John Smith:,John Smith:" mail/123456.nsf
STORE:Mary Poppins: mail/789012.nsf
STORE:Tony Stark: mail/345678.nsf
STORE:Carmen Sandiego: mail/901234.nsf
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
STORES:store123 mail/369121.nsf
NEWS:CORPmanager mail/137112.nsf
NEWS:STOREmanager mail/083561.nsf
Then file2.csv:
OU
CORP
STORE
NEWS
For every line in file2.csv that has 'CORP', 'STORE', or 'NEWS', I want to search through file1.csv and create a file, such as STOREall.csv, CORPall.csv, and NEWSall.csv.
Such as NEWSall.csv
OU Mailfile
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
NEWS:CORPmanager mail/137112.nsf
NEWS:STOREmanager mail/083561.nsf
But I'm getting this output instead:
In CORPall.csv
OU Mailfile
CORP:Jenny Smith: mail/246802.nsf
CORP:John Smith:,John Smith: mail/123456.nsf
NEWS:CORPmanager mail/137112.nsf
Then NEWSall.csv
OU Mailfile
NEWS:Peter Parker: mail/567890.nsf
NEWS:Clark Kent: mail/654321.nsf
Since CORP is also in news, it appears in the CORPall.csv. I don't want that. If it says NEWS
before the :
then that should be in NEWSall.csv, etc.
I used this snippet that someone showed me on here:
for dept in OUList['OU']:
df[df['OU'].str.contains(dept)]
df_dept['OU'].to_csv(f'{dept}all.csv', index=False)
It works great. But it only looks for what's in the whole file. So I need a way to search before the :
(mind there's two in some cases)
CodePudding user response:
You can use str.startswith
to filter the dataframe by checking if the start of each string element matches the pattern in the OUList
.
for dept in OUList['OU']:
df_dept = df[df['OU'].str.startswith(f'{dept}:')]
df_dept.to_csv(f'{dept}all.csv', index=False)
CodePudding user response:
It sounds like the "OU" column should be split into two columns on the :
character. You can do this with df['OU'].str.split(':')
. Save the output to new columns and then you can use the same filter technique on the column created from the left of :