This is auxiliary to this post: Is there better way to iterate over nested loop for rows (30000)?
I would like to find rows in following table, which contain non dutch bank numbers. I create an extra column in the dataframe in the following code with value 'ja', when a bank number is not dutch, i.e. it does not contain characters 'NL'. Any better ways to loop over the rows?
import pandas as pd
import re
data={'IBAN':['NLAMRO123456789','NLINGB126656723','BGFFEO128856754','NLAMRO123896763','DUDMRO567456722','NLRABO123456712']}
df=pd.DataFrame(data)
df['No Dutch Bank']=None
substring='NL'
row_count=len(df.index)
for i in range (0, row_count):
mainstring=df['IBAN'][i]
if re.search(substring, mainstring) is None:
df['No Dutch Bank'][i]='ja'
df.head()
So far I have reached to this expression after browsing through internet:
df['No Dutch Bank'] = list(map(lambda x: "None" if x else 'ja', (re.search('NL', x))))
CodePudding user response:
A loop isn't necessary here. Neither is regex.
If you're only looking for IBAN (not) starting with "NL":
import pandas as pd
data={'IBAN':['NLAMRO123456789','NLINGB126656723','BGFFEO128856754','NLAMRO123896763','DUDMRO567456722','NLRABO123456712']}
df=pd.DataFrame(data)
df['No Dutch Bank'] = df['IBAN'].str[0:2] != 'NL'
Output:
IBAN No Dutch Bank
0 NLAMRO123456789 False
1 NLINGB126656723 False
2 BGFFEO128856754 True
3 NLAMRO123896763 False
4 DUDMRO567456722 True
5 NLRABO123456712 False
Note that a boolean column is much more convenient than string "ja" or "None" for filtering: use df[df['No Dutch Bank']]
or df[~df['No Dutch Bank']]
If "NL" can be found anywhere in the string:
df['No Dutch Bank'] = df['IBAN'].str.find('NL') == -1
CodePudding user response:
I finally used the above answer by Tranbi in this format:
df.loc[df['IBAN'].str.find('NL') == -1, 'No Dutch Bank']='ja'