Scenario
I want to filter a dataframe in pandas. It should return a dataframe with all rows that have a numeric value in a list of columns - the length of the list is arbitrary.
Example
a b c
1 1 1 1
2 1 g 8
3 h 1 1
4 2 2 2
- If I call my function with columns [b, c], I expect rows 1, 3, and 4.
- If I call my function with columns [a], I expect rows 1, 2, and 4.
I came up with this implementation that doesn't feel pythonic but works:
import typing
import pandas as pd
def filter_df(dataframe: pd.DataFrame, filter_columns: typing.List[str]) -> pd.DataFrame:
and_connected_filters = None
for column_name in filter_columns:
condition: pd.Series = dataframe[column_name].str.isnumeric()
if and_connected_filters is None:
and_connected_filters = condition
else:
and_connected_filters = and_connected_filters & condition
return dataframe[and_connected_filters]
Is there a more pythonic way to chain a list of items with an operator (&
)?
I'm thinking of an equivalent of ",".join(...)
but couldn't find anything.
CodePudding user response:
I would use a different approach, check the status as 2D and aggregate with all
:
cols = ['b', 'c']
s = (df[cols].apply(pd.to_numeric, errors='coerce')
.notna().all(axis=1)
)
out = s[s].index.to_list()
Or, if you are sure to have strings as input:
cols = ['b', 'c']
s = (df[cols]
.apply(lambda s: s.str.isnumeric())
.all(axis=1)
)
out = s[s].index.to_list()
output: [1, 3, 4]
Variant with numpy.logical_and.reduce
:
import numpy as np
df.index[np.logical_and.reduce([df[c].str.isnumeric() for c in cols])].to_list()
output: [1, 3, 4]