Home > Software design >  Split and type cast columns values using Pandas
Split and type cast columns values using Pandas

Time:12-29

How do i add an extra column in a dataframe, so it could split and convert to integer types but np.nan for string types

Col1   
1|2|3
"string"

so

Col1      ExtraCol
1|2|3     [1,2,3]
"string"  nan

I tried long contorted way but failed

df['extracol'] = df["col1"].str.strip().str.split("|").str[0].apply(lambda x: x.astype(np.float) if x.isnumeric() else np.nan).astype("Int32")

CodePudding user response:

Another possible solution:

import re

df['ExtraCol'] = df['Col1'].apply(lambda x: [int(y) for y in re.split(
    r'\|', x)] if x.replace('|', '').isnumeric() else np.nan)

Output:

     Col1   ExtraCol
0   1|2|3  [1, 2, 3]
1  string        NaN

CodePudding user response:

You can use regex and Series.str.match to find the rows whose value can be split into integer lists

df['ExtraCol'] = df.loc[df['Col1'].str.match(r'\|?\d \|?'), 'Col1'].str.split('|')
  • Related