Python :How to get the position number of the token in a cell dataframe that seperate by symbol (|)-CodePudding

I have a dataframe as below:

Car        Code          DOMAINS_616
Proton     CFC_HIC_U1    PRA_U1|PRD_U1|CFC_HIC_U1|CFC_HPC_U1|CFC_HIC_U1
Perudua    CFC_HVC_U1    PRA_U1|PRD_U1|CFC_HVC_U1|CFC_HPC_U1|CFC_HIC_U1
Toyota     CFC_HIC_U1    PRA_D1|PRD_D1|CFC_HKC_U1|CFC_HPC_U1|CFC_HOC_U1
Proton     CFC_HPC_U1    PRE_U1|PRD_U1|CFC_HPC_U1|CFC_HPC_U1|CFC_HOC_U1

Based on the code column I wish to know the position number at DOMAINS_616 which will seperate by | , expected output as below:

Car        Code          DOMAINS_616                                       Position
Proton     CFC_HIC_U1    PRA_U1|PRD_U1|CFC_HIC_U1|CFC_HPC_U1|CFC_HIC_U2    3
Perudua    CFC_HVC_U1    PRA_U1|PRD_U1|CFC_HVC_U1|CFC_HPC_U1|CFC_HIC_U1    3
Toyota     CFC_HIC_U1    PRA_D1|PRD_D1|CFC_HKC_U1|CFC_HPC_U1|CFC_HIC_U1    5   
Proton     CFC_HPC_U1    PRE_U1|PRD_U1|CFC_HLC_U1|CFC_HPC_U1|CFC_HOC_U1    4

Anyone have ideas?

CodePudding user response：

if Code appears once and only once in DOMAINS_616,

first we split DOMAINS_616 into a list and explode it

step1 = df.assign(dummy=df['DOMAINS_616'].apply(lambda x: list(enumerate(x.split('|'), 1)))).explode('dummy')

then we expand the dummy column into the Position and domain column

step1[['Position', 'domain']] = step1['dummy'].apply(pd.Series)

finally keep rows where Code matches with domain

step1[step1['Code']==step1['domain']].drop(columns=['dummy', 'domain'])

CodePudding user response：

df['Position'] = df.apply(lambda row: row['DOMAINS_616'].split(|).index(row['Code']) 1, axis = 1)