Home > Software design >  replace text string in entire column after first occurance
replace text string in entire column after first occurance

Time:08-30

I'm trying to replace all but the first occurrence of a text string in an entire column. My specific case is replacing underscores with periods in data that looks like client_19_Aug_21_22_2022 and I need this to be client_19.Aug.21.22.2022

if I use [1], I get this error: string index out of range
but [:1] does all occurrences (it doesn't skip the first one)
[1:] inserts . after every character but doesn't find _ and replace 

df1['Client'] = df1['Client'].str.replace('_'[:1],'.')

CodePudding user response:

Not the simplest, but solution:

import re
df.str.apply(lambda s: re.sub(r'^(.*?)\.', r'\1_', s.replace('_', '.')))

Here in the lambda function we firstly replace all _ with .. Then we replace the first occurrence of . back with _. And finally, we apply lambda to each value in a column.

CodePudding user response:

Pandas Series have a .map method that you can use to apply an arbitrary function to every row in the Series.

In your case you can write your own replace_underscores_except_first function, looking something like:

def replace_underscores_except_first(s):
    newstring = ''
    # Some logic here to handle replacing all but first.
    # You probably want a for loop with some conditional checking
    return newstring

and then pass that to .map like:

df1['Client'] = df1['Client'].map(replace_underscores_except_first)

CodePudding user response:

An example using map, and in the function check if the string contain an underscore. If it does, split on it, and join back all parts except the first with a dot.

import pandas as pd

items = [
    "client_19_Aug_21_22_2022",
    "client123"
]


def replace_underscore_with_dot_except_first(s):
    if "_" in s:
        parts = s.split("_")
        return f"{parts[0]}_{'.'.join(parts[1:])}"
    return s


df1 = pd.DataFrame(items, columns=["Client"])

df1['Client'] = df1['Client'].map(replace_underscore_with_dot_except_first)
print(df1)

Output

                     Client
0  client_19.Aug.21.22.2022
1                 client123
  • Related