Python Dataframe Get Substring-CodePudding

I have some dataframes where its id colum like

A12-B-56
E1234B115

It is always some letters and then several numbers, then -B- or B, and I want to keep substrings before '-B-' and 'B'. One way that I came up with is using a for loop and re.split('(\d )', some_text). Is there a faster way to do this?

CodePudding user response：

Use a lookahead assertion to get all the alphanumerics from start that are followed by `B`. Would be wise to do this before you replace `-`. code below

df=pd.DataFrame({'column':['A12-B-56','A123B567']})

df= df.assign(column=(df['column'].str.replace('\-','', regex=True).str.extract('(^\w (?=B))')))