I'm trying to read the text from a column called location
in a CSV
file, and then retrieve all the lines that do not contain the specific word(s) in that column, but the column that I want to read has the city name as well as the country name.
Here are the column names and example lines of data from the CSV
file:
company // level // title // location
Amazon // L5 // Solution Architect // Sydney, NS, Australia
IBM // Band // Analyst // Chicago, IL
Oracle // M3 // Architect // San Francisco, CA
Ubisoft // L4 // Manager // Tokyo, TY, Japan
Here is the output I am looking for:
company // level // title // location
IBM // Band // Analyst // Chicago, IL
Oracle // M3 // Architect // San Francisco, CA
I have something like this so far:
dataset = pd.read_csv("raw_data.csv")
other_countries = ['Australia', 'Brazil', 'Japan']
After this point, I feel very lost about how to go about reading the specific column, searching for the particular words in that column, and then returning the lines of data that I am interested in.
CodePudding user response:
I personally like using boolean masks for stuff like this.
sort_out = ['Australia', 'Japan', 'Brazil']
col = df['location']
bool_mask = [not any([y in sort_out for y in x.replace(' ','').split(',')]) for x in col]
df = df[bool_mask]
A boolean mask is simply a list of boolean values. In this case representing a row in a dataframe. When applied if the value is True the row is displayed. If it's False it's hidden. You can use this to perform a check on each row to see if it matches any number of conditions. I can explain the process further if you need.
CodePudding user response:
Try this code , it doesn't require any data frame manipulation module
other_countries = ['Australia', 'Brazil', 'Japan']
file = open("test.csv" , "r")
file = file.read().splitlines()
file.pop(0)
new_list = []
for line in file :
location = line.split("//")[3]
t = 0
flag = 0
while t < len(other_countries) :
if other_countries[t] in location :
flag = flag 1
t= t 1
if flag == 0 :
new_list.append(line)
new_file = open("newcsvfile" , "w")
for item in new_list :
new_file.write(item "\n")
new_file.close()