I am getting NaN for non mathcing pattern w.r.t to split in pandas.
Source Data:
Attr
[ABC].[xyz]
CDE
Code Used:
df['Extr_Attr'] = np.where((df.Attr.str.contains('.')),df['Attr'].str.split('.',1).str[1], df.Attr)
This returns NaN for data that does not have a match of '.' in source data.
Expected output:
Attr Extr_Attr
[ABC].[xyz] [xyz]
CDE CDE
CodePudding user response:
Assuming you want the last chunk after a dot (if any, else the full string).
If you want to split, use rsplit
and slice the last item:
df['Extr_Attr'] = df['Attr'].str.rsplit('.', 1).str[-1]
Or more efficiently, with extract
(get all non-.
characters at the end of the string):
df['Extr_Attr'] = df['Attr'].str.extract(r'([^.] )$')
Output:
Attr Extr_Attr
0 [ABC].[xyz] [xyz]
1 CDE CDE
CodePudding user response:
I think we can skip the str.contains
and use .split
and .fillna
df['Extr_Attr'] = df['Attr'].str.split('.').str[1].fillna(df['Attr'])
Attr Extr_Attr
0 [ABC].[xyz] [xyz]
1 CDE CDE