I would like to get the number of rows in which any of the values contain a space
My starting point is:
li = [['erer','sfsddf', 'dfdf dfdf'],['erer','sfs/ df', 'dfdf dfdf'],['erer','sfsd/df', 'dfdf dfdf'],['er/er','sfsddf', 'dfdf dfdf'],['er/er','sfsddf', 'dfdf dfdf'],['er er','sfsddf', 'dfdf dfdf']]
df = pd.DataFrame(li, columns=[str(i) for i in range(3)])
df.apply(lambda col: col.str.contains(' ', na=False), axis=1)
(following: Filter pandas DataFrame by substring criteria)
But from that I dont know how to use that boolean DF to filter the original DF. What I want is to filter out all the rows where the boleans are False, i.e. not having space.
CodePudding user response:
For filtering rows contains space use DataFrame.any
for boolean mask and filter in boolean indexing
:
df1 = df[df.apply(lambda col: col.str.contains(' ', na=False), axis=1).any(axis=1)]
print (df1)
0 1 2
0 erer sfsddf dfdf dfdf
1 erer sfs/ df dfdf dfdf
2 erer sfsd/df dfdf dfdf
3 er/er sfsddf dfdf dfdf
4 er/er sfsddf dfdf dfdf
5 er er sfsddf dfdf dfdf
If need count rows with space use Series.str.count
with sum
:
df['sum'] = df.apply(lambda col: col.str.count(' '), axis=1).sum(axis=1)
print (df)
0 1 2 sum
0 erer sfsddf dfdf dfdf 1
1 erer sfs/ df dfdf dfdf 2
2 erer sfsd/df dfdf dfdf 1
3 er/er sfsddf dfdf dfdf 1
4 er/er sfsddf dfdf dfdf 1
5 er er sfsddf dfdf dfdf 2