I have a data frame like this.
document_group
A12J3/381
A02J3/40
B12P4/2536
C10P234/3569
and I would like to get like this
document_group
A12J3/38
A02J3/40
B12P4/25
C10P234/35
I have tried to adapt a function for single string like this
def remove_str_start(s, start):
return s[:start] s[start]
and work with this sample
s='H02J3/381'
s.find('/')
remove_str_start(s,s.find('/') 2)
it returns 'H02J3/38', what I want to do while s is the input data frame and start is cutting the char start from the position char.
but when I tried with data frame
remove_str_start(df['document_group'],df['document_group'].str.find('/') 2)
the result returns an error
could everyone help me with this kind of situation?
CodePudding user response:
We can use str.replace
here:
df["document_group"] = df["document_group"].str.replace(r'/(\d{2})\d $', r'\1', regex=True)
Here is a Python regex demo showing that the replacement logic is working.
CodePudding user response:
You can also str.split
remove the unwanted parts and put together:
s = df.document_group.str.split('/')
df['document_group'] = s.str[0] "/" s.str[1].str[:2]
prints:
document_group
0 A12J3/38
1 A02J3/40
2 B12P4/25
3 C10P234/35
CodePudding user response:
You are trying too hard, just:
Create the column you want: for each value, the same value till the character where you find "/" plus 3 (because you want the / and the next 2)
df['new_column'] = [e[:e.find('/') 3] for e in filt['your_initial_column']]
Regards,