I want to iterate through a column in a pandas DataFrame and manipulate the data to create a new column based on the existing column. For example...
For row in df['column_variable']:
if 'substring1' in row:
df['new_column'] = ...
elif 'substring2' in row:
df['new column'] = ...
elif: 'substring3' in row:
df['new column'] = ...
else:
df['new column'] = 'Not Applicable'
Even though type(row)
returns 'str'
meaning it is of the class string, this code keeps returning the new column as all 'Not Applicable' meaning it is not detecting any of the strings in any of the rows in the data frame even when I can see they are there.
I am sure there is an easy way to do this...PLEASE HELP!
I have tried the following aswell...
For row in df['column_variable']:
if row.find('substring1') != -1:
df['new_column'] = ...
elif row.find('substring2') != -1:
df['new column'] = ...
elif: row.find('substring3') != -1:
df['new column'] = ...
else:
df['new column'] = 'Not Applicable'
And I continue to get all entries of the new column being 'Not Applicable'. Once again it is not finding the string in the existing column.
Is it an issue with the data type or something?
CodePudding user response:
You could use a nested for
loop:
# For each row in the dataframe
for row in df['column_variable']:
# Set boolean to indicate if a substring was found
substr_found = False
# For each substring
for sub_str in ["substring1", "substring2"]:
# If the substring is in the row
if sub_str in row:
# Execute code...
df['new_column'] = ...
# Substring was found!
substr_found = True
# If substring was not found
if not substr_found:
# Set invalid code...
df['new column'] = 'Not Applicable'
CodePudding user response:
You can create an empty list, add new values there and the create the new column as last step:
all_data = []
for row in df["column_variable"]:
if "substring1" in row:
all_data.append("Found 1")
elif "substring2" in row:
all_data.append("Found 2")
elif "substring3" in row:
all_data.append("Found 3")
else:
all_data.append("Not Applicable")
df["new column"] = all_data
print(df)
Prints:
column_variable new column
0 this is substring1 Found 1
1 this is substring2 Found 2
2 this is substring1 Found 1
3 this is substring3 Found 3
CodePudding user response:
Maybe the shortest way I can think of:
#Dummy DataFrame
df = pd.DataFrame([[1,"substr1"],[3,"bla"],[5,"bla"]],columns=["abc","col_to_check"])
substrings = ["substr1","substr2", "substr3"]
content = df["col_to_check"].unique().tolist() # Unique content of column
for subs in substrings: # Go through all your substrings
if subs in content: # Check if substring is in column
df[subs] = 0 # Fill your new column with whatever you want
CodePudding user response:
import pandas as pd
Create DataFrame
tup_lst = []
for i in ['substring1','substring2','substring3','substring4']:
tup = (i,'to_be_replaced')
print(tup)
tup_lst.append(tup)
('substring1', 'to_be_replaced')
('substring2', 'to_be_replaced')
('substring3', 'to_be_replaced')
('substring4', 'to_be_replaced')
df = pd.DataFrame.from_records(tup_lst)
df.columns = ['column_variable','other_column']
print(df)
column_variable other_column
0 substring1 to_be_replaced
1 substring2 to_be_replaced
2 substring3 to_be_replaced
3 substring4 to_be_replaced
Modify Dataframe using .loc
df.loc[:, 'other_column'] = 'Not Applicable'
df.loc[df['column_variable'] == 'substring1', 'other_column'] = '...'
df.loc[df['column_variable'] == 'substring2', 'other_column'] = 'something_else'
df.loc[df['column_variable'] == 'substring3', 'other_column'] = 'Yes'
print(df)
column_variable other_column
0 substring1 ...
1 substring2 something_else
2 substring3 Yes
3 substring4 Not Applicable