Home > Back-end >  Strip integers from a row with conditions
Strip integers from a row with conditions

Time:10-12

Hello I have a dataframe with column A, number/invoice number and i wish to create a new column C by stripping off the number only which is a 6 digit number often starting with 2, i can achieve that by stripping the first 6 digits using df['C'] = df['A'].str[6:] but often times there a cases where it interchanges as invoice number/number and in that case my result would be wrong, i need help fixing this. my code

import pandas as pd

df = pd.DataFrame({'A': ['208953/2005016337RRH', '209265/03983RH', '00468RH/209408', '209664/2076585rrh'], 'B' : ['208953', '108953', '347', '209664']})

df['C'] = df['A'].str[:6]

this wont work cos i would get some correct and some wrong numbers.

CodePudding user response:

You could split on "/" and keep only the part that has 6 characters:

splitA = df["A"].str.split("/", expand=True)
df["C"] = splitA[0].where(splitA[0].str.len()==6, splitA[1])

>>> df
                      A       B       C
0  208953/2005016337RRH  208953  208953
1        209265/03983RH  108953  209265
2        00468RH/209408     347  209408
3     209664/2076585rrh  209664  209664
4   200501633721/208953  208953  208953
  • Related