Home > Enterprise >  How to extract the data at specific occurance of a separator
How to extract the data at specific occurance of a separator

Time:03-11

I hava a dataframe as follows:

data = {'ID':[44,65,23,40,67,90,64,92],
    'Title': ['abc // ghj // kbc // asd.g // 234 // gg',
              'bhx // adj // mkf // fg.bhx',
              '',
              'bhs // jk',
              '---',
              'aghd',
              'abd // ghh // 786',
              'Ak8'],
}
df = pd.DataFrame(data)

df

    ID  Title
0   44  abc // ghj // kbc // asd.g // 234 // gg
1   65  bhx // adj // mkf // fg.bhx
2   23  
3   40  bhs // jk
4   67  ---
5   90  aghd
6   64  abd // ghh // 786
7   92  Ak8

I want to extract only second element if available in the dataframe. The expected output is:

    ID  Extracted
0   44  ghj
1   65  adj
2   23  
3   40  jk
4   67  
5   90  
6   64  ghh
7   92     

CodePudding user response:

split by the special character. This will give a list. slice the second element in the list

df["Extracted"] =df['Title'].str.split('\//').str[1].fillna('')



  ID                                    Title   Extracted
0  44  abc // ghj // kbc // asd.g // 234 // gg      ghj 
1  65              bhx // adj // mkf // fg.bhx      adj 
2  23                                                   
3  40                                bhs // jk        jk
4  67                                      ---          
5  90                                     aghd          
6  64                        abd // ghh // 786      ghh 
7  92                                      Ak8          

CodePudding user response:

We can use str.extract here:

data["Extracted"] = data["Title"].str.extract(r'^\S  // (\S )')

Here is a regex demo showing that the extraction logic is working.

CodePudding user response:

If you wanted to use pandas without regexs

df['Title'] = df['Title'].apply(lambda x : x.split('//'))
df['Title'] = df['Title'].apply(lambda x : x[1] if len(x) >= 2 else '')
df
  • Related