Home > Software design >  Why replace substring does not work in Pandas dataframe?
Why replace substring does not work in Pandas dataframe?

Time:05-27

I try to replace everywhere the symbols " - in the start line and end line:

dtnew.applymap(lambda x: x.replace('^-', ''))
dtnew.applymap(lambda x: x.replace('^"', ''))

But the output dataframe has these symbols

CodePudding user response:

well, if performance is NOT an issue you can iterate over columns and rows and use a simple replace (see below). Again, I would only use this if the dataframe is not enormous and you have no concern for performance.

for column in df.columns:
    for i in df.index:    
        df[column][i] = df[column][i].replace('-','').replace('"','')

CodePudding user response:

Assuming this example and that you only want to replace the leading character(s):

df = pd.DataFrame([['- abc', 'def -'], ['" ghi-', '--jkl']])

        0      1
0   - abc  def -
1  " ghi-  --jkl

Use str.lstrip.

df2 = df.apply(lambda c: c.str.lstrip('- "'))

output:

      0      1
0   abc  def -
1  ghi-    jkl

# as list: [['abc', 'def -'], ['ghi-', 'jkl']]

For only the first character, use str.replace:

df2 = df.apply(lambda c: c.str.replace('^[- "]', '', regex=True))

output:

       0      1
0    abc  def -
1   ghi-   -jkl

# as list: [[' abc', 'def -'], [' ghi-', '-jkl']]

generalization:

  • to strip both start and end, use str.strip

  • to remove all characters (anywhere): df.apply(lambda c: c.str.replace('[- "]', '', regex=True))

  • to remove first or last matching character: df.apply(lambda c: c.str.replace('(^[- "]|[- "]$)', '', regex=True))

  • Related