I have a column with values that look like A_B_C_D
. I want to create a new column that grabs the first and the last values to create A_D
. I used split and but that didn't work. It created the new cols but gave me all nans. There are nans present in the first column.
dfNC['collapsedntdom'] = np.where(dfNC[ntdom].isnull(), dfNC[ntdom],
(dfNC[ntdom].str.split('_')[0]) and "_" and (dfNC[ntdom].str.split('_')[3]))
what am I missing?
CodePudding user response:
This is a simple solution, and it will work in your case:
dfNC['collapsedntdom'] = dfNC['ntdom'].apply(lambda x: x if str(x)=='nan' else f"{x.split('_')[0]}_{x.split('_')[3]}")
The output being:
ntdom collapsedntdom
0 A_B_C_D A_D
1 E_F_G_H E_H
2 NaN NaN
3 J_K_L_M J_M
CodePudding user response:
Here is a efficient method (apply
is slow):
df['new_col'] = ( df['col'].str.split('_', 1, expand=True)[0]
'_'
df['col'].str.rsplit('_', 1, expand=True)[1]
)
example:
col new_col
0 NaN NaN
1 1_2_3_4 1_4
2 A_B_C_D A_D
3 abc_def__ghi abc_ghi
generic method
Now is a generic method to combine arbitrary positions (here 0/2/-1):
from functools import reduce
df2 = df['col'].str.split('_', expand=True).iloc[:, [0, 2, -1]]
df['new_col'] = reduce(lambda a,b: (None, a[1] '_' b[1]), df2.iteritems())[1]
output:
col new_col
0 NaN NaN
1 1_2_3_4 1_3_4
2 A_B_C_D A_C_D
3 abc_def__ghi abc__ghi