Home > OS >  Pandas: Search for char in row and write finding into new column
Pandas: Search for char in row and write finding into new column

Time:12-12

I'd like to search for a list of special characters in my dataframe rows.
If one of the characters has been found, I'd like to write 1 into a new column.
If not, the column should contain a 0.

s1 = pd.Series([1, 'test1', 'name1', 'a'])
s2 = pd.Series([2, 'test2', 'name2', 'b'])
s3 = pd.Series([3, 'ttei3', 'name3', 'c'])
s4 = pd.Series([4, 'test4', 'Nome4', 'd'])
s5 = pd.Series([5 ,'test5', 'name5', 'e'])

df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)],  columns =  ["id", "A", "B", "C"])

print(df1)

Will print this:

   id        A        B  C
0   1    test1    name1  a
1   2    test2    name2  b
2   3    ttei3    name3  c
3   4    test4    Nome4  d
4   5    test5    name5  e

Now my approach (which doesn't work because I can only use str.contains only works for str columns as it seems.)

key_chars = ['i', 'o']
for idx, row in df1.iterrows():
    for c in key_chars:
        if row.str.contains(c):
            df1[idx, 'found'] = 1
        else:
            df1[idx, 'found'] = 0

How am I able to find my characters in the whole row?

Target dataframe would be:

   id        A        B  C found
0   1    test1    name1  a     0
1   2    test2    name2  b     0
2   3    ttei3    name3  c     1
3   4    test4    Nome4  d     1
4   5    test5    name5  e     0

CodePudding user response:

In [19]: df1.select_dtypes('object').apply(lambda x: x.str.contains('[io]')).any(axis=1)
Out[19]:
0    False
1    False
2     True
3     True
4    False
dtype: bool

select_dtypes selects the columns with the given dtype. Alternatively, if you know the columns upfront, you can just select them using df1[['A', 'B', 'C']]

CodePudding user response:

You can also use stack / unstack:

df1['found'] = df1.stack().str.contains('[io]').unstack().sum(1).astype(int)
print(df1)

# Output:
   id      A      B  C  found
0   1  test1  name1  a      0
1   2  test2  name2  b      0
2   3  ttei3  name3  c      1
3   4  test4  Nome4  d      1
4   5  test5  name5  e      0

CodePudding user response:

You can check treat strings as array and use the in keyword.

for i,r in df1.iterrows():
...     if 'test' in r['A']:
...             df1.loc[i,'newcol'] = 1
  • Related