Home > Mobile >  How to find defined strings in values in columns in DataFrame using loop in Pandas Python?
How to find defined strings in values in columns in DataFrame using loop in Pandas Python?

Time:08-03

I have code in Pandas Python like below:

df (both columns are string - "object" data type):

col1 col2
123445 one
653144 bbt
aaaBc tro

code:

nominal_variables = ['col1', 'col2']
list_of_numbers = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
variables_with_numbers = list()

for col in nominal_variables:
    for el in list_of_numbers:
        if df[col].str.contains(f'{el}').any():
           variables_with_numbers.append(col)

I need to fill my list "variables_with_numbers" by columns wich contain numbers (string).

  1. First of all I make a loop by all kolumns in my df
  2. Secondly I make a loop by list of numbers in string which I will search in values in columns from the first loop
  3. Thirtly I am looking for in columns from the tirst loop elements from the second loop and if it is True I add this col to my list

Nevertheless using my code I have Error: AttributeError: Can only use .str accessor with string values! but I do not know why, because all columns in my DF are "object".

So as a result I need: variables_with_numbers = ['col1']

How can I modify my code in Python Pandas ?

CodePudding user response:

IIUC, you want to check which columns of your data frame contain at least one element from a defined list e.g. list_of_numbers. If that is true, you can start joining your list by | and check columns row-wise if one value contains one element from your list. This results in a boolean mask that can be used to query your data frame. Then, just drop nan values column-wisely (if they only contain nan values) and call the columns argument.

Code:

import pandas as pd

df = pd.DataFrame({
    "col1": ["123445", "653144", "aaaBc"],
    "col2": ["one", "bbt", "tro"]
})

list_of_numbers = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
list(df[df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1)].dropna(axis=1, how="all").columns)

# Or this (equivalent to the line above):
# list(df.columns[df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1).any()])

# Result: ['col1']

You can also produce a series with booleans for columns that contain at least one value from your list:

df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1).any()

--------------
col1    True
col2    False
dtype: bool
--------------

If you want to check which columns contain numeric values, you can simple use this approach:

list(df[df.apply(lambda x: x.str.isnumeric(), axis=1)].dropna(axis=1, how="all").columns)
  • Related