I have some dataframes where its id colum like
A12-B-56
E1234B115
It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before '-B-' and 'B'. One way that I came up with is using a for loop and re.split('(\d )', some_text). Is there a faster way to do this?
CodePudding user response:
Use a lookahead assertion to get all the alphanumerics from start that are followed by `B`. Would be wise to do this before you replace `-`. code below
df=pd.DataFrame({'column':['A12-B-56','A123B567']})
df= df.assign(column=(df['column'].str.replace('\-','', regex=True).str.extract('(^\w (?=B))')))