I am new to Python and working on a project, I need to clean my data.
I want to add a new column with only the words "IN" or "OUT".
picture of full data where the words "IN" and "OUT" are
import pandas as pd
df = pd.read_excel("tourniquets_26.07.2022_test.xls") #Read Excel as file as Dataframe
#My try to use a loop to find it
for word in df["Deur intelligente eenheid"]:
if word == "IN" or word == "UIT":
df['IN/UIT'] = word
#Display top 5 rows
print(df.head(5))
#To save it back as Excel
df.to_excel("tourniquets_26.07.2022_test.xls") #Write DataFrame back as Excel file
Thank you in advance!
CodePudding user response:
From what I understand, the logic would be that the column in your current df always has (somewhere within it) either the word "IN" our "UIT", and you want a new column that has only the corresponding "IN" or "UIT" word, right?
If so, you can use Series.apply
to do this. Create a function that returns "IN" or "UIT" depending on what word is in a sentence:
def my_fun(sentence):
if "IN" in sentence:
return "IN"
else: #assuming that if it doesn't contain "IN", it must contain "UIT"
return "UIT"
and apply this to df["col_1"] to create a new column. Assuming your column is called "col_1":
df['IN/UIT'] = df["col_1"].apply(my_fun)
CodePudding user response:
You can use np.where:
import numpy as np
df['IN/UIT'] = np.where(df['col'].str.startswith('IN'), 'IN', 'UIT')