Home > Software engineering >  Python_Pandas: Eliminate repeated character
Python_Pandas: Eliminate repeated character

Time:06-29

I have a pandas data frame with a text column, in the text, many personal data has been replaced by XX, so there is a lot of X,XX, XXX... How I can eliminate all the repeated X? I'm trying with the following code, but I had to put all the possibilities of X, so it doesn't look like a practical approach.

def cleanning(Complaint):

 Complaint = re.sub(r'# /', ' ', Complaint)
 Complaint = re.sub("\d", "\s", Complaint)
 Complaint = re.sub("XX", "XXXX", Complaint)
 Complaint = re.sub("xx", "xxxx", Complaint)
 Complaint = re.sub("@", "XXXXXXXX", Complaint)
 Complaint = Complaint.replace('\n', ' ')
 Complaint = Complaint.replace('\r', ' ')

return Complaint

CodePudding user response:

You could replace "X" with an empty string. So for example:

df[column].str.replace('X', '')

CodePudding user response:

I am not sure you expecting for replacing lenght of X upto 10 with empty string. For replace X combination data with empty string, you can use this,

df.replace(to_replace=r'[xX]{1,10}', value='', regex=True)

If you provide your dataframe, it might be helpful for further case

  • Related