Hello I have a dataframe with column A, number/invoice number and i wish to create a new column C by stripping off the number only which is a 6 digit number often starting with 2, i can achieve that by stripping the first 6 digits using df['C'] = df['A'].str[6:] but often times there a cases where it interchanges as invoice number/number and in that case my result would be wrong, i need help fixing this. my code
import pandas as pd
df = pd.DataFrame({'A': ['208953/2005016337RRH', '209265/03983RH', '00468RH/209408', '209664/2076585rrh'], 'B' : ['208953', '108953', '347', '209664']})
df['C'] = df['A'].str[:6]
this wont work cos i would get some correct and some wrong numbers.
CodePudding user response:
You could split
on "/" and keep only the part that has 6 characters:
splitA = df["A"].str.split("/", expand=True)
df["C"] = splitA[0].where(splitA[0].str.len()==6, splitA[1])
>>> df
A B C
0 208953/2005016337RRH 208953 208953
1 209265/03983RH 108953 209265
2 00468RH/209408 347 209408
3 209664/2076585rrh 209664 209664
4 200501633721/208953 208953 208953