Home > Blockchain >  Merge multiple columns in Pandas but skip certain items
Merge multiple columns in Pandas but skip certain items

Time:03-30

I have a dataframe like this:

Fruit   | Drink | Napkin
------------------------
Banana  | Coke  | No
Apple   | No    | Red
Kumquat | Beer  | White
No      | No    | Green

And I want to combine the columns into a new column but, and this is the bit I'm having trouble with, I have to skip the "Nos", i.e. to produce:

Fruit   | Drink | Napkin | Combined
-----------------------------------
Banana  | Coke  | No     | Banana, Coke
Apple   | No    | Red    | Apple, Red
Kumquat | Beer  | White  | Kumquat, Beer, White
No      | No    | Green  | Green

I can find lots of advice on combining the columns but nothing on how to do that whilst skipping the nos.

CodePudding user response:

You can replace the No to NaNs, stack to get rid of them and aggregate with GroupBy.agg:

df['combined'] = (df
 .replace('No', pd.NA)
 .stack()
 .groupby(level=0)
 .agg(','.join)
 )

Output:

     Fruit Drink Napkin            combined
0   Banana  Coke     No         Banana,Coke
1    Apple    No    Red           Apple,Red
2  Kumquat  Beer  White  Kumquat,Beer,White
3       No    No  Green               Green

CodePudding user response:

Use sets.

df['combined'] = df.agg(set,1).apply(lambda x: ','.join(x-{'No'}))

     Fruit Drink Napkin            combined
0   Banana  Coke     No         Coke,Banana
1    Apple    No    Red           Red,Apple
2  Kumquat  Beer  White  Beer,White,Kumquat
3       No    No  Green               Green

CodePudding user response:

Not sure if you are looking for a 'pythonic' solution here, but why don't you just remove that bit of string from your 'Combined' column ?

Suggestion:

#Preparing data
import pandas as pd

df=pd.DataFrame()
df['Fruit']=pd.Series(['Banana','Apple','Kumquat','No'])
df['Drink']=pd.Series(['Coke','No','Beer','No'])
df['Napkin']=pd.Series(['No','Red','White','Green'])

df['Combined']=df['Fruit'] ',' df['Drink'] ',' df['Napkin']
df['Combined']=df['Combined'].str.replace(',No','').str.replace('No,','')

The only issue I see with this, is if you encounter a string in your columns that would start with 'No*' (Nostradamus or something). But in that case, you could just remove the values from the original columns ?

  • Related