Home > other >  Python - Find a substring within a string using an IF statement when iterating through a pandas Data
Python - Find a substring within a string using an IF statement when iterating through a pandas Data

Time:09-30

I want to iterate through a column in a pandas DataFrame and manipulate the data to create a new column based on the existing column. For example...

For row in df['column_variable']:

      if 'substring1' in row:

              df['new_column'] = ...
      
      elif 'substring2' in row:

              df['new column'] = ...

      elif: 'substring3' in row:

              df['new column'] = ...

      else:

              df['new column'] = 'Not Applicable'

Even though type(row) returns 'str' meaning it is of the class string, this code keeps returning the new column as all 'Not Applicable' meaning it is not detecting any of the strings in any of the rows in the data frame even when I can see they are there.

I am sure there is an easy way to do this...PLEASE HELP!

I have tried the following aswell...

For row in df['column_variable']:

  if row.find('substring1') != -1:

          df['new_column'] = ...

  elif row.find('substring2') != -1:

          df['new column'] = ...

  elif: row.find('substring3') != -1:

          df['new column'] = ...

  else:

          df['new column'] = 'Not Applicable'

And I continue to get all entries of the new column being 'Not Applicable'. Once again it is not finding the string in the existing column.

Is it an issue with the data type or something?

CodePudding user response:

You could use a nested for loop:

# For each row in the dataframe
for row in df['column_variable']:
    # Set boolean to indicate if a substring was found
    substr_found = False

    # For each substring
    for sub_str in ["substring1", "substring2"]:
        # If the substring is in the row
        if sub_str in row:
            # Execute code...
            df['new_column'] = ...

            # Substring was found!
            substr_found = True

    # If substring was not found
    if not substr_found:
        # Set invalid code...
        df['new column'] = 'Not Applicable'

CodePudding user response:

You can create an empty list, add new values there and the create the new column as last step:

all_data = []
for row in df["column_variable"]:
    if "substring1" in row:
        all_data.append("Found 1")
    elif "substring2" in row:
        all_data.append("Found 2")
    elif "substring3" in row:
        all_data.append("Found 3")
    else:
        all_data.append("Not Applicable")

df["new column"] = all_data

print(df)

Prints:

      column_variable new column
0  this is substring1    Found 1
1  this is substring2    Found 2
2  this is substring1    Found 1
3  this is substring3    Found 3

CodePudding user response:

Maybe the shortest way I can think of:

#Dummy DataFrame
df = pd.DataFrame([[1,"substr1"],[3,"bla"],[5,"bla"]],columns=["abc","col_to_check"])

substrings = ["substr1","substr2", "substr3"]
content = df["col_to_check"].unique().tolist() # Unique content of column

for subs in substrings: # Go through all your substrings
    if subs in content: # Check if substring is in column
        df[subs] = 0 # Fill your new column with whatever you want

CodePudding user response:

import pandas as pd

Create DataFrame

tup_lst = []

for i in ['substring1','substring2','substring3','substring4']:
    tup = (i,'to_be_replaced')
    print(tup)
    tup_lst.append(tup)
('substring1', 'to_be_replaced')
('substring2', 'to_be_replaced')
('substring3', 'to_be_replaced')
('substring4', 'to_be_replaced')
df = pd.DataFrame.from_records(tup_lst)
df.columns = ['column_variable','other_column']
print(df)
  column_variable    other_column
0      substring1  to_be_replaced
1      substring2  to_be_replaced
2      substring3  to_be_replaced
3      substring4  to_be_replaced

Modify Dataframe using .loc

df.loc[:, 'other_column'] = 'Not Applicable'
df.loc[df['column_variable'] == 'substring1', 'other_column'] = '...'
df.loc[df['column_variable'] == 'substring2', 'other_column'] = 'something_else'
df.loc[df['column_variable'] == 'substring3', 'other_column'] = 'Yes'
print(df)
  column_variable    other_column
0      substring1             ...
1      substring2  something_else
2      substring3             Yes
3      substring4  Not Applicable
  • Related