Home > Back-end >  Multiple character replacement in a column
Multiple character replacement in a column

Time:12-29

I am trying to replace some string characters with a single character, I can do it with a multiple lines of code but I was wondering if there is something like this to do it in a single line?

df['Column'].str.replace(['_','-','/'], ' ')

I can write 3 lines of code for normal str.replace() and change those strings one by one but I don't think that would be efficient.

CodePudding user response:

Pandas Dataframe Str replace takes regex pattern or string as first argument. So you can provide a regex to change multiple patterns

code:

import pandas as pd
check_df = pd.DataFrame({"Column":["abc", "A_bC", "A_b-C/d"]})
check_df['Column'].str.replace("_|-|/", " ")

Output:

0        abc
1       A bC
2    A b C d
Name: Column, dtype: object

CodePudding user response:

you can use a regular expression with an alternating group:

df['Column'].str.replace(r"_|-|/", " ", regex=True)

| means "either of these".

or you can use str.maketrans to make a translation table and use .str.translate:

df['Column'].str.translate(str.maketrans(dict.fromkeys("_-|", " ")))

Note that this is for 1-length characters' translation.


If characters are dynamically produced, e.g., within a list, then re.escape("|".join(chars)) can be used for the first way, and "".join(chars) for the second way. re.escape for the first one is for special characters' escaping, e.g., if "$" is to be replaced, since it is the end-of-string anchor in regexes, we need to have written "\$" instead, which re.escape will take care.

CodePudding user response:

You could use a character class [/_-] listing the characters that you want to replace.

Note that if you have multiple consecutive characters and you replace them with a space, you will get space gaps. If you don't want that, you can repeat the character class with a to match 1 or more characters and replace that match with a single space.

If you don't want the leading and trailing spaces, you can use .str.strip()

Example

import pandas as pd

df = pd.DataFrame({"Column":["       a//b_c__-d", "a//////b      "]})
df['Column'] = df['Column'].str.replace(r"[/_-]", ' ')
print(df)

print("\n---------v2---------\n")

df_v2 = pd.DataFrame({"Column":["       a//b_c__-d", "a//////b      "]})
df_v2['Column'] = df_v2['Column'].str.replace(r"[/_-] ", ' ').str.strip()
print(df_v2)

Output

              Column
0         a  b c   d
1     a      b      

---------v2---------

    Column
0  a b c d
1      a b
  • Related