I hava a dataframe as follows:
data = {'ID':[44,65,23,40,67,90,64,92],
'Title': ['abc // ghj // kbc // asd.g // 234 // gg',
'bhx // adj // mkf // fg.bhx',
'',
'bhs // jk',
'---',
'aghd',
'abd // ghh // 786',
'Ak8'],
}
df = pd.DataFrame(data)
df
ID Title
0 44 abc // ghj // kbc // asd.g // 234 // gg
1 65 bhx // adj // mkf // fg.bhx
2 23
3 40 bhs // jk
4 67 ---
5 90 aghd
6 64 abd // ghh // 786
7 92 Ak8
I want to extract only second element if available in the dataframe. The expected output is:
ID Extracted
0 44 ghj
1 65 adj
2 23
3 40 jk
4 67
5 90
6 64 ghh
7 92
CodePudding user response:
split by the special character. This will give a list. slice the second element in the list
df["Extracted"] =df['Title'].str.split('\//').str[1].fillna('')
ID Title Extracted
0 44 abc // ghj // kbc // asd.g // 234 // gg ghj
1 65 bhx // adj // mkf // fg.bhx adj
2 23
3 40 bhs // jk jk
4 67 ---
5 90 aghd
6 64 abd // ghh // 786 ghh
7 92 Ak8
CodePudding user response:
We can use str.extract
here:
data["Extracted"] = data["Title"].str.extract(r'^\S // (\S )')
Here is a regex demo showing that the extraction logic is working.
CodePudding user response:
If you wanted to use pandas without regexs
df['Title'] = df['Title'].apply(lambda x : x.split('//'))
df['Title'] = df['Title'].apply(lambda x : x[1] if len(x) >= 2 else '')
df