Home > Back-end >  Concatenation of multiple column values dataframe into one row if empty values may occur (pandas)
Concatenation of multiple column values dataframe into one row if empty values may occur (pandas)

Time:08-03

the task is simple, but I'm a little confused. Let's say I have a dataset where there are 3 columns. Sometimes one of them may contain an empty value. I need to combine these values into one row in a separate column.

Sample source dataframe:

df = pd.DataFrame({'col1':[np.NaN,'b_1','c_1'], 'col2':['a_2', np.NaN, 'c_2'],
                                             'col3':['a_3', 'b_3', np.NaN]})

col1    col2    col3
NaN     b_1     c_1
a_2     NaN     c_2
a_3     b_3     NaN

It is necessary that after the merging, the following is obtained (that is, non-empty values were combined and there was a comma between them) :

col1    col2    col3    сol4
NaN     b_1     c_1     b_1, c_1
a_2     NaN     c_2     a_2, c_2
a_3     b_3     NaN     a_3, b_3

I tried this earlier, but if the value is empty, there was just a space there. How can this command be improved?

df['col4'] = df['col1'].map(str)   ', '   df['col2'].map(str)   ', '   df['col3'].map(str)

col1    col2    col3    сol4
NaN     b_1     c_1     , b_1, c_1
a_2     NaN     c_2     a_2, ,c_2
a_3     b_3     NaN     a_3, b_3, 

CodePudding user response:

Try this,

df['col4'] = df.apply(lambda d: ', '.join([x for x in d if x == x]), axis=1)

Sample Input:

  col1 col2 col3
0  NaN  a_2  a_3
1  b_1  NaN  b_3
2  c_1  c_2  NaN

Sample Output:

  col1 col2 col3      col4
0  NaN  a_2  a_3  a_2, a_3
1  b_1  NaN  b_3  b_1, b_3
2  c_1  c_2  NaN  c_1, c_2

What is suppose specific columns will be required?

df['col4'] = df[['col1', 'col2', 'col3']].apply(lambda d: ', '.join([x for x in d if x == x]), axis=1)

Sample Input:

  col1 col2 col3 colz
0  NaN  a_2  a_3  z_1
1  b_1  NaN  b_3  NaN
2  c_1  c_2  NaN  z_2

Sample Output:

  col1 col2 col3 colz      col4
0  NaN  a_2  a_3  z_1  a_2, a_3
1  b_1  NaN  b_3  NaN  b_1, b_3
2  c_1  c_2  NaN  z_2  c_1, c_2

NOTE: colz values are not added col4 result

  • Related