I have a pandas data frame with a text column, in the text, many personal data has been replaced by XX, so there is a lot of X,XX, XXX... How I can eliminate all the repeated X? I'm trying with the following code, but I had to put all the possibilities of X, so it doesn't look like a practical approach.
def cleanning(Complaint):
Complaint = re.sub(r'# /', ' ', Complaint)
Complaint = re.sub("\d", "\s", Complaint)
Complaint = re.sub("XX", "XXXX", Complaint)
Complaint = re.sub("xx", "xxxx", Complaint)
Complaint = re.sub("@", "XXXXXXXX", Complaint)
Complaint = Complaint.replace('\n', ' ')
Complaint = Complaint.replace('\r', ' ')
return Complaint
CodePudding user response:
You could replace "X"
with an empty string. So for example:
df[column].str.replace('X', '')
CodePudding user response:
I am not sure you expecting for replacing lenght of X upto 10 with empty string. For replace X combination data with empty string, you can use this,
df.replace(to_replace=r'[xX]{1,10}', value='', regex=True)
If you provide your dataframe, it might be helpful for further case