Home > other >  Parsing and searching a CSV file
Parsing and searching a CSV file

Time:01-31

I want to parse through a contacts list CSV file that looks like this:

First Name  Last Name  Full Name                            Short Name  Phone Number
Jenny       Smith      CN=Jenny Smith/OU=CORP/O=COMPANY     jesmi       6468675309
Mary        Poppin     CN=Mary Poppins/OU=STORE/O=COMPANY   mapop       7005555578
Tony        Stark      CN=Tony Stark/OU=STORE/O=COMPANY     tostar      6007777798
Peter       Parker     CN=Peter Parker/OU=NEWS/O=COMPANY    pepar       5008889090

I want to be able to search through column "Full Name" and select string "OU=STORE" and move all rows that contain "OU=STORE" aside, and then move it to it's own csv file called "store.csv". Then repeat the same process for "OU=CORP" and "OU=NEWS".


This is what I want my output to look like:

Store.csv should contain only this information once the process is done.

First Name  Last Name  Full Name                            Short Name  Phone Number
Mary        Poppin     CN=Mary Poppins/OU=STORE/O=COMPANY   mapop       7005555578
Tony        Stark      CN=Tony Stark/OU=STORE/O=COMPANY     tostar      6007777798

corp.csv

First Name  Last Name  Full Name                            Short Name  Phone Number
Jenny       Smith      CN=Jenny Smith/OU=CORP/O=COMPANY     jesmi       6468675309

news.csv

First Name  Last Name  Full Name                            Short Name  Phone Number    
Peter       Parker     CN=Peter Parker/OU=NEWS/O=COMPANY    pepar       5008889090

I have a small script of what I've done so far but I'm not sure what to do in the end:

import pandas as pd
import csv

#this is the source folder    
source_dir = 'C:/Users/username/documents/contacts/contactslist.csv'

#this is the folder where I want to move the parsed data.
store_target_dir = 'C:/Users/username/documents/contacts/store/'
corp_target_dir = 'C:/Users/username/documents/contacts/corp/'
news_target_dir = 'C:/Users/username/documents/contacts/news/'

col_list = ["Full Name"]

store = 'OU=STORE'
corp = 'OU=CORP'
news = 'OU=NEWS'

#When it comes time to move the data to their folders with their csv name
csvName = store_target_dir   "/"   "store.csv"
csvName2 = corp_target_dir   "/"   "corp.csv"
csvName3 = news_target_dir   "/"  "news.csv"

#opening the file
file = open(source_dir)

#reading the csv file
df = pd.read_csv(file)

CodePudding user response:

To filter you DataFrame, you could do something like this:

# key is the value you are looking for, e.g. 'OU=STORE'    
indices = [key in value for value in df['File Name']]
subset = df[indices]

indices is a bool list indicating whether a line contains key or not

CodePudding user response:

You could extract the OU= value and add it as another column. .unique() could then be used to determine the 3 possible values and then each CSV created based on the that value. For example:

import pandas as pd

df = pd.read_csv('contactslist.csv', dtype={'Phone Number': str})
df['file'] = df['Full Name'].str.extract(r'OU=(\S )/')

for key in df['file'].unique():
    df_filtered = df.loc[df['file'] == key]
    df_filtered = df_filtered.drop(['file'], axis=1)
    df_filtered.to_csv(f"{key}.csv", index=False)      
  •  Tags:  
  • Related