I have code in Pandas Python like below:
df (both columns are string - "object" data type):
col1 | col2 |
---|---|
123445 | one |
653144 | bbt |
aaaBc | tro |
code:
nominal_variables = ['col1', 'col2']
list_of_numbers = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
variables_with_numbers = list()
for col in nominal_variables:
for el in list_of_numbers:
if df[col].str.contains(f'{el}').any():
variables_with_numbers.append(col)
I need to fill my list "variables_with_numbers" by columns wich contain numbers (string).
- First of all I make a loop by all kolumns in my df
- Secondly I make a loop by list of numbers in string which I will search in values in columns from the first loop
- Thirtly I am looking for in columns from the tirst loop elements from the second loop and if it is True I add this col to my list
Nevertheless using my code I have Error: AttributeError: Can only use .str accessor with string values!
but I do not know why, because all columns in my DF are "object".
So as a result I need: variables_with_numbers = ['col1']
How can I modify my code in Python Pandas ?
CodePudding user response:
IIUC, you want to check which columns of your data frame contain at least one element from a defined list e.g. list_of_numbers
. If that is true, you can start joining your list by |
and check columns row-wise if one value contains one element from your list. This results in a boolean mask that can be used to query your data frame. Then, just drop nan values column-wisely (if they only contain nan values) and call the columns argument.
Code:
import pandas as pd
df = pd.DataFrame({
"col1": ["123445", "653144", "aaaBc"],
"col2": ["one", "bbt", "tro"]
})
list_of_numbers = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
list(df[df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1)].dropna(axis=1, how="all").columns)
# Or this (equivalent to the line above):
# list(df.columns[df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1).any()])
# Result: ['col1']
You can also produce a series with booleans for columns that contain at least one value from your list:
df.apply(lambda x: x.str.contains("|".join(list_of_numbers)), axis=1).any()
--------------
col1 True
col2 False
dtype: bool
--------------
If you want to check which columns contain numeric values, you can simple use this approach:
list(df[df.apply(lambda x: x.str.isnumeric(), axis=1)].dropna(axis=1, how="all").columns)