I want to replace dataframe patterns using regular expressions
For example, I've following table. I want to replace account number digits with N
, e.g. if the account is 5 numbers then it should be replaced with five N's NNNNN
.
Source
Account_Num,Facility Name,Address,City
10605,SAGE MEMORIAL HOSPITAL,STATE ROUTE 264 SOUTH 191,GANADO
2425,WOODRIDGE BEHAVIORAL CENTER,600 NORTH 7TH STREET,XDSDSD
Target
Account_Num,Facility Name,Address,City
NNNNN,AAAA AAAAAAAA AAAAAAA,STATE ROUTE 264 SOUTH 191,GANADO
NNNN,WOODRIDGE BEHAVIORAL CENTER,600 NORTH 7TH STREET,XDSDSD
I was trying with following code:
print(df.replace(to_replace=(\[re.search(r'\\d ',str(df_str))\]),value='NNNNN', regex=True))
CodePudding user response:
You can use .replace
with multiple regular expression conditions:
df = df.astype(str).replace([r'[a-zA-Z]', '\d'], ['A', 'N'], regex=True)
Output:
>>> df
Account_Num Facility Name Address City
0 NNNNN AAAA AAAAAAAA AAAAAAAA AAAAA AAAAA NNN AAAAA NNN AAAAAA
1 NNNN AAAAAAAAA AAAAAAAAAA AAAAAA NNN AAAAA NAA AAAAAA AAAAAA