the task is simple, but I'm a little confused. Let's say I have a dataset where there are 3 columns. Sometimes one of them may contain an empty value. I need to combine these values into one row in a separate column.
Sample source dataframe:
df = pd.DataFrame({'col1':[np.NaN,'b_1','c_1'], 'col2':['a_2', np.NaN, 'c_2'],
'col3':['a_3', 'b_3', np.NaN]})
col1 col2 col3
NaN b_1 c_1
a_2 NaN c_2
a_3 b_3 NaN
It is necessary that after the merging, the following is obtained (that is, non-empty values were combined and there was a comma between them) :
col1 col2 col3 сol4
NaN b_1 c_1 b_1, c_1
a_2 NaN c_2 a_2, c_2
a_3 b_3 NaN a_3, b_3
I tried this earlier, but if the value is empty, there was just a space there. How can this command be improved?
df['col4'] = df['col1'].map(str) ', ' df['col2'].map(str) ', ' df['col3'].map(str)
col1 col2 col3 сol4
NaN b_1 c_1 , b_1, c_1
a_2 NaN c_2 a_2, ,c_2
a_3 b_3 NaN a_3, b_3,
CodePudding user response:
Try this,
df['col4'] = df.apply(lambda d: ', '.join([x for x in d if x == x]), axis=1)
Sample Input:
col1 col2 col3
0 NaN a_2 a_3
1 b_1 NaN b_3
2 c_1 c_2 NaN
Sample Output:
col1 col2 col3 col4
0 NaN a_2 a_3 a_2, a_3
1 b_1 NaN b_3 b_1, b_3
2 c_1 c_2 NaN c_1, c_2
What is suppose specific columns will be required?
df['col4'] = df[['col1', 'col2', 'col3']].apply(lambda d: ', '.join([x for x in d if x == x]), axis=1)
Sample Input:
col1 col2 col3 colz
0 NaN a_2 a_3 z_1
1 b_1 NaN b_3 NaN
2 c_1 c_2 NaN z_2
Sample Output:
col1 col2 col3 colz col4
0 NaN a_2 a_3 z_1 a_2, a_3
1 b_1 NaN b_3 NaN b_1, b_3
2 c_1 c_2 NaN z_2 c_1, c_2
NOTE: colz
values are not added col4
result