Merge multiple columns in Pandas but skip certain items-CodePudding

I have a dataframe like this:

Fruit   | Drink | Napkin
------------------------
Banana  | Coke  | No
Apple   | No    | Red
Kumquat | Beer  | White
No      | No    | Green

And I want to combine the columns into a new column but, and this is the bit I'm having trouble with, I have to skip the "Nos", i.e. to produce:

Fruit   | Drink | Napkin | Combined
-----------------------------------
Banana  | Coke  | No     | Banana, Coke
Apple   | No    | Red    | Apple, Red
Kumquat | Beer  | White  | Kumquat, Beer, White
No      | No    | Green  | Green

I can find lots of advice on combining the columns but nothing on how to do that whilst skipping the nos.

CodePudding user response：

You can replace the No to NaNs, stack to get rid of them and aggregate with GroupBy.agg:

df['combined'] = (df
 .replace('No', pd.NA)
 .stack()
 .groupby(level=0)
 .agg(','.join)
 )

Output:

     Fruit Drink Napkin            combined
0   Banana  Coke     No         Banana,Coke
1    Apple    No    Red           Apple,Red
2  Kumquat  Beer  White  Kumquat,Beer,White
3       No    No  Green               Green

CodePudding user response：

Use sets.

df['combined'] = df.agg(set,1).apply(lambda x: ','.join(x-{'No'}))

     Fruit Drink Napkin            combined
0   Banana  Coke     No         Coke,Banana
1    Apple    No    Red           Red,Apple
2  Kumquat  Beer  White  Beer,White,Kumquat
3       No    No  Green               Green

CodePudding user response：

Not sure if you are looking for a 'pythonic' solution here, but why don't you just remove that bit of string from your 'Combined' column ?

Suggestion:

#Preparing data
import pandas as pd

df=pd.DataFrame()
df['Fruit']=pd.Series(['Banana','Apple','Kumquat','No'])
df['Drink']=pd.Series(['Coke','No','Beer','No'])
df['Napkin']=pd.Series(['No','Red','White','Green'])

df['Combined']=df['Fruit'] ',' df['Drink'] ',' df['Napkin']
df['Combined']=df['Combined'].str.replace(',No','').str.replace('No,','')

The only issue I see with this, is if you encounter a string in your columns that would start with 'No*' (Nostradamus or something). But in that case, you could just remove the values from the original columns ?