I have a dataframe like this:
Fruit | Drink | Napkin
------------------------
Banana | Coke | No
Apple | No | Red
Kumquat | Beer | White
No | No | Green
And I want to combine the columns into a new column but, and this is the bit I'm having trouble with, I have to skip the "Nos", i.e. to produce:
Fruit | Drink | Napkin | Combined
-----------------------------------
Banana | Coke | No | Banana, Coke
Apple | No | Red | Apple, Red
Kumquat | Beer | White | Kumquat, Beer, White
No | No | Green | Green
I can find lots of advice on combining the columns but nothing on how to do that whilst skipping the nos.
CodePudding user response:
You can replace
the No to NaNs, stack
to get rid of them and aggregate with GroupBy.agg
:
df['combined'] = (df
.replace('No', pd.NA)
.stack()
.groupby(level=0)
.agg(','.join)
)
Output:
Fruit Drink Napkin combined
0 Banana Coke No Banana,Coke
1 Apple No Red Apple,Red
2 Kumquat Beer White Kumquat,Beer,White
3 No No Green Green
CodePudding user response:
Use sets.
df['combined'] = df.agg(set,1).apply(lambda x: ','.join(x-{'No'}))
Fruit Drink Napkin combined
0 Banana Coke No Coke,Banana
1 Apple No Red Red,Apple
2 Kumquat Beer White Beer,White,Kumquat
3 No No Green Green
CodePudding user response:
Not sure if you are looking for a 'pythonic' solution here, but why don't you just remove that bit of string from your 'Combined' column ?
Suggestion:
#Preparing data
import pandas as pd
df=pd.DataFrame()
df['Fruit']=pd.Series(['Banana','Apple','Kumquat','No'])
df['Drink']=pd.Series(['Coke','No','Beer','No'])
df['Napkin']=pd.Series(['No','Red','White','Green'])
df['Combined']=df['Fruit'] ',' df['Drink'] ',' df['Napkin']
df['Combined']=df['Combined'].str.replace(',No','').str.replace('No,','')
The only issue I see with this, is if you encounter a string in your columns that would start with 'No*' (Nostradamus or something). But in that case, you could just remove the values from the original columns ?