I am trying to replace some string characters with a single character, I can do it with a multiple lines of code but I was wondering if there is something like this to do it in a single line?
df['Column'].str.replace(['_','-','/'], ' ')
I can write 3 lines of code for normal str.replace() and change those strings one by one but I don't think that would be efficient.
CodePudding user response:
Pandas Dataframe Str replace takes regex pattern or string as first argument. So you can provide a regex to change multiple patterns
code:
import pandas as pd
check_df = pd.DataFrame({"Column":["abc", "A_bC", "A_b-C/d"]})
check_df['Column'].str.replace("_|-|/", " ")
Output:
0 abc
1 A bC
2 A b C d
Name: Column, dtype: object
CodePudding user response:
you can use a regular expression with an alternating group:
df['Column'].str.replace(r"_|-|/", " ", regex=True)
|
means "either of these".
or you can use str.maketrans
to make a translation table and use .str.translate
:
df['Column'].str.translate(str.maketrans(dict.fromkeys("_-|", " ")))
Note that this is for 1-length characters' translation.
If characters are dynamically produced, e.g., within a list, then re.escape("|".join(chars))
can be used for the first way, and "".join(chars)
for the second way. re.escape
for the first one is for special characters' escaping, e.g., if "$" is to be replaced, since it is the end-of-string anchor in regexes, we need to have written "\$" instead, which re.escape
will take care.
CodePudding user response:
You could use a character class [/_-]
listing the characters that you want to replace.
Note that if you have multiple consecutive characters and you replace them with a space, you will get space gaps. If you don't want that, you can repeat the character class with a
to match 1 or more characters and replace that match with a single space.
If you don't want the leading and trailing spaces, you can use .str.strip()
Example
import pandas as pd
df = pd.DataFrame({"Column":[" a//b_c__-d", "a//////b "]})
df['Column'] = df['Column'].str.replace(r"[/_-]", ' ')
print(df)
print("\n---------v2---------\n")
df_v2 = pd.DataFrame({"Column":[" a//b_c__-d", "a//////b "]})
df_v2['Column'] = df_v2['Column'].str.replace(r"[/_-] ", ' ').str.strip()
print(df_v2)
Output
Column
0 a b c d
1 a b
---------v2---------
Column
0 a b c d
1 a b