Home > Back-end >  python: filter pandas DataFrame getting all the rows in which in any of the values the string contai
python: filter pandas DataFrame getting all the rows in which in any of the values the string contai

Time:10-15

I would like to get the number of rows in which any of the values contain a space

My starting point is:

li = [['erer','sfsddf', 'dfdf dfdf'],['erer','sfs/ df', 'dfdf dfdf'],['erer','sfsd/df', 'dfdf dfdf'],['er/er','sfsddf', 'dfdf dfdf'],['er/er','sfsddf', 'dfdf dfdf'],['er er','sfsddf', 'dfdf dfdf']]
df = pd.DataFrame(li, columns=[str(i) for i in range(3)])
df.apply(lambda col: col.str.contains(' ', na=False), axis=1)

(following: Filter pandas DataFrame by substring criteria)

But from that I dont know how to use that boolean DF to filter the original DF. What I want is to filter out all the rows where the boleans are False, i.e. not having space.

CodePudding user response:

For filtering rows contains space use DataFrame.any for boolean mask and filter in boolean indexing:

df1 = df[df.apply(lambda col: col.str.contains(' ', na=False), axis=1).any(axis=1)]
print (df1)
       0        1          2
0   erer   sfsddf  dfdf dfdf
1   erer  sfs/ df  dfdf dfdf
2   erer  sfsd/df  dfdf dfdf
3  er/er   sfsddf  dfdf dfdf
4  er/er   sfsddf  dfdf dfdf
5  er er   sfsddf  dfdf dfdf

If need count rows with space use Series.str.count with sum:

df['sum'] = df.apply(lambda col: col.str.count(' '), axis=1).sum(axis=1)
print (df)
       0        1          2  sum
0   erer   sfsddf  dfdf dfdf    1
1   erer  sfs/ df  dfdf dfdf    2
2   erer  sfsd/df  dfdf dfdf    1
3  er/er   sfsddf  dfdf dfdf    1
4  er/er   sfsddf  dfdf dfdf    1
5  er er   sfsddf  dfdf dfdf    2
  • Related