I'd like to search for a list of special characters in my dataframe rows.
If one of the characters has been found, I'd like to write 1 into a new column.
If not, the column should contain a 0.
s1 = pd.Series([1, 'test1', 'name1', 'a'])
s2 = pd.Series([2, 'test2', 'name2', 'b'])
s3 = pd.Series([3, 'ttei3', 'name3', 'c'])
s4 = pd.Series([4, 'test4', 'Nome4', 'd'])
s5 = pd.Series([5 ,'test5', 'name5', 'e'])
df1 = pd.DataFrame([list(s1), list(s2),list(s3),list(s4),list(s5)], columns = ["id", "A", "B", "C"])
print(df1)
Will print this:
id A B C
0 1 test1 name1 a
1 2 test2 name2 b
2 3 ttei3 name3 c
3 4 test4 Nome4 d
4 5 test5 name5 e
Now my approach (which doesn't work because I can only use str.contains only works for str columns as it seems.)
key_chars = ['i', 'o']
for idx, row in df1.iterrows():
for c in key_chars:
if row.str.contains(c):
df1[idx, 'found'] = 1
else:
df1[idx, 'found'] = 0
How am I able to find my characters in the whole row?
Target dataframe would be:
id A B C found
0 1 test1 name1 a 0
1 2 test2 name2 b 0
2 3 ttei3 name3 c 1
3 4 test4 Nome4 d 1
4 5 test5 name5 e 0
CodePudding user response:
In [19]: df1.select_dtypes('object').apply(lambda x: x.str.contains('[io]')).any(axis=1)
Out[19]:
0 False
1 False
2 True
3 True
4 False
dtype: bool
select_dtypes
selects the columns with the given dtype. Alternatively, if you know the columns upfront, you can just select them using df1[['A', 'B', 'C']]
CodePudding user response:
You can also use stack
/ unstack
:
df1['found'] = df1.stack().str.contains('[io]').unstack().sum(1).astype(int)
print(df1)
# Output:
id A B C found
0 1 test1 name1 a 0
1 2 test2 name2 b 0
2 3 ttei3 name3 c 1
3 4 test4 Nome4 d 1
4 5 test5 name5 e 0
CodePudding user response:
You can check treat strings as array and use the in keyword.
for i,r in df1.iterrows():
... if 'test' in r['A']:
... df1.loc[i,'newcol'] = 1