Home > database >  Concatenate multiple columns of dataframe with a seperating character for Non-null values
Concatenate multiple columns of dataframe with a seperating character for Non-null values

Time:09-03

I have a data frame like this:

df:
C1   C2  C3
1    4    6
2   NaN   9
3    5   NaN
NaN  7    3

I want to concatenate the 3 columns to a single column with comma as a seperator. But I want the comma(",") only in case of non-null value.

I tried this but this doesn't work for non-null values:

df['New_Col'] = df[['C1','C2','C3']].agg(','.join, axis=1)

This gives me the output:

New_Col
1,4,6
2,,9
3,5,
,7,3

This is my ideal output:

New_Col
1,4,6
2,9
3,5
7,3

Can anyone help me with this?

CodePudding user response:

Judging by your (wrong) output, you have a dataframe of strings and NaN values are actually empty strings (otherwise it would throw TypeError: expected str instance, float found because NaN is a float).

Since you're dealing with strings, pandas is not optimized for it, so a vanilla Python list comprehension is probably the most efficient choice here.

df['NewCol'] = [','.join([e for e in x if e]) for x in df.values]

result

CodePudding user response:

In your case do stack

df['new'] = df.stack().astype(int).astype(str).groupby(level=0).agg(','.join)
Out[254]: 
0    1,4,6
1      2,9
2      3,5
3      7,3
dtype: object

CodePudding user response:

You can use filter to get rid of NaNs:

df['New_Col'] = df.apply(lambda x: ','.join(filter(lambda x: x is not np.nan,list(x))), axis=1)
  • Related