pandas making a new column with multiple strings from another column-CodePudding

I have a column with values that look like A_B_C_D. I want to create a new column that grabs the first and the last values to create A_D. I used split and but that didn't work. It created the new cols but gave me all nans. There are nans present in the first column.

dfNC['collapsedntdom'] = np.where(dfNC[ntdom].isnull(), dfNC[ntdom],
                                  (dfNC[ntdom].str.split('_')[0]) and "_" and (dfNC[ntdom].str.split('_')[3]))

what am I missing?

CodePudding user response：

This is a simple solution, and it will work in your case:

dfNC['collapsedntdom'] = dfNC['ntdom'].apply(lambda x: x if str(x)=='nan' else f"{x.split('_')[0]}_{x.split('_')[3]}")

The output being:

    ntdom   collapsedntdom
0   A_B_C_D A_D
1   E_F_G_H E_H
2   NaN NaN
3   J_K_L_M J_M

CodePudding user response：

Here is a efficient method (apply is slow):

df['new_col'] = (  df['col'].str.split('_', 1, expand=True)[0]
                   '_'
                   df['col'].str.rsplit('_', 1, expand=True)[1]
                )

example:

            col  new_col
0           NaN      NaN
1       1_2_3_4      1_4
2       A_B_C_D      A_D
3  abc_def__ghi  abc_ghi

generic method

Now is a generic method to combine arbitrary positions (here 0/2/-1):

from functools import reduce
df2 = df['col'].str.split('_', expand=True).iloc[:, [0, 2, -1]]
df['new_col'] = reduce(lambda a,b: (None, a[1] '_' b[1]), df2.iteritems())[1]

output:

            col   new_col
0           NaN       NaN
1       1_2_3_4     1_3_4
2       A_B_C_D     A_C_D
3  abc_def__ghi  abc__ghi