How to use regex as an alternative to nested loop with re.search in Pandas?-CodePudding

This is auxiliary to this post: Is there better way to iterate over nested loop for rows (30000)?

I would like to find rows in following table, which contain non dutch bank numbers. I create an extra column in the dataframe in the following code with value 'ja', when a bank number is not dutch, i.e. it does not contain characters 'NL'. Any better ways to loop over the rows?

import pandas as pd
import re

data={'IBAN':['NLAMRO123456789','NLINGB126656723','BGFFEO128856754','NLAMRO123896763','DUDMRO567456722','NLRABO123456712']}
df=pd.DataFrame(data)

df['No Dutch Bank']=None
substring='NL'
row_count=len(df.index)


for i in range (0, row_count):
    mainstring=df['IBAN'][i]
    if re.search(substring, mainstring) is None:
        df['No Dutch Bank'][i]='ja'
        
df.head()

So far I have reached to this expression after browsing through internet:

df['No Dutch Bank'] = list(map(lambda x: "None" if x else 'ja', (re.search('NL', x))))

CodePudding user response：

A loop isn't necessary here. Neither is regex.

If you're only looking for IBAN (not) starting with "NL":

import pandas as pd

data={'IBAN':['NLAMRO123456789','NLINGB126656723','BGFFEO128856754','NLAMRO123896763','DUDMRO567456722','NLRABO123456712']}
df=pd.DataFrame(data)

df['No Dutch Bank'] = df['IBAN'].str[0:2] != 'NL'

Output:

              IBAN  No Dutch Bank
0  NLAMRO123456789          False
1  NLINGB126656723          False
2  BGFFEO128856754           True
3  NLAMRO123896763          False
4  DUDMRO567456722           True
5  NLRABO123456712          False

Note that a boolean column is much more convenient than string "ja" or "None" for filtering: use df[df['No Dutch Bank']] or df[~df['No Dutch Bank']]

If "NL" can be found anywhere in the string:

df['No Dutch Bank'] = df['IBAN'].str.find('NL') == -1

CodePudding user response：

I finally used the above answer by Tranbi in this format:

df.loc[df['IBAN'].str.find('NL') == -1, 'No Dutch Bank']='ja'