Home > Software engineering >  Split function returning NaN for non matching patterns in pandas
Split function returning NaN for non matching patterns in pandas

Time:11-18

I am getting NaN for non mathcing pattern w.r.t to split in pandas.

Source Data:

Attr          
[ABC].[xyz]    
CDE

Code Used:

df['Extr_Attr'] = np.where((df.Attr.str.contains('.')),df['Attr'].str.split('.',1).str[1], df.Attr)

This returns NaN for data that does not have a match of '.' in source data.

Expected output:

Attr           Extr_Attr
[ABC].[xyz]    [xyz]
CDE             CDE

CodePudding user response:

Assuming you want the last chunk after a dot (if any, else the full string).

If you want to split, use rsplit and slice the last item:

df['Extr_Attr'] = df['Attr'].str.rsplit('.', 1).str[-1]

Or more efficiently, with extract (get all non-. characters at the end of the string):

df['Extr_Attr'] = df['Attr'].str.extract(r'([^.] )$')

Output:

          Attr Extr_Attr
0  [ABC].[xyz]     [xyz]
1          CDE       CDE

CodePudding user response:

I think we can skip the str.contains and use .split and .fillna

df['Extr_Attr'] = df['Attr'].str.split('.').str[1].fillna(df['Attr'])

          Attr Extr_Attr
0  [ABC].[xyz]     [xyz]
1          CDE       CDE
  • Related