I'm trying to replace all but the first occurrence of a text string in an entire column. My specific case is replacing underscores with periods in data that looks like client_19_Aug_21_22_2022 and I need this to be client_19.Aug.21.22.2022
if I use [1], I get this error: string index out of range
but [:1] does all occurrences (it doesn't skip the first one)
[1:] inserts . after every character but doesn't find _ and replace
df1['Client'] = df1['Client'].str.replace('_'[:1],'.')
CodePudding user response:
Not the simplest, but solution:
import re
df.str.apply(lambda s: re.sub(r'^(.*?)\.', r'\1_', s.replace('_', '.')))
Here in the lambda function we firstly replace all _
with .
. Then we replace the first occurrence of .
back with _
. And finally, we apply lambda to each value in a column.
CodePudding user response:
Pandas Series
have a .map
method that you can use to apply an arbitrary function to every row in the Series.
In your case you can write your own replace_underscores_except_first
function, looking something like:
def replace_underscores_except_first(s):
newstring = ''
# Some logic here to handle replacing all but first.
# You probably want a for loop with some conditional checking
return newstring
and then pass that to .map
like:
df1['Client'] = df1['Client'].map(replace_underscores_except_first)
CodePudding user response:
An example using map, and in the function check if the string contain an underscore. If it does, split on it, and join back all parts except the first with a dot.
import pandas as pd
items = [
"client_19_Aug_21_22_2022",
"client123"
]
def replace_underscore_with_dot_except_first(s):
if "_" in s:
parts = s.split("_")
return f"{parts[0]}_{'.'.join(parts[1:])}"
return s
df1 = pd.DataFrame(items, columns=["Client"])
df1['Client'] = df1['Client'].map(replace_underscore_with_dot_except_first)
print(df1)
Output
Client
0 client_19.Aug.21.22.2022
1 client123