how do I incorporate and if statement in my lambda function to exclude blank values?-CodePudding

I would like to try to exclude any blank values when using my lambda function below which will prevent extra commas coming out in my output. If I run the code without an if statement, I get extra commas in the values of the comb_words column. How can I incorporate the if statement to exclude blank values and prevent any extra commas in my output?

code:

# dataframe
df = pd.DataFrame(data ={'col1':[123,123, 456, 456, 789, 789],'col2':["",'I eat cake.','We run fast.', 
'We eat cake?','I run faster!','I eat candy.'],'col2_new':["",'i eat cake','we run fast','we eat cake',
'i run faster','i eat candy']})

# words to search on
search_words1 = ['run fast','eat cake','faster','candy']

# create columns based on search words found                
for n in search_words1:
        df[n] = np.where(df['col2_new'].str.contains(n),n,"")

# combine words into a single column only if value is not blank
cols = ['run fast','eat cake','faster','candy']

df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
df

original dataframe:

col1     col2           col2_new
0   123     
1   123  I eat cake.    i eat cake
2   456  We run fast.   we run fast
3   456  We eat cake?   we eat cake
4   789  I run faster!  i run faster
5   789  I eat candy.   i eat candy

error message:

 ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-117bb81b84df> in <module>
     10 cols = ['run fast','eat cake','faster','candy']
     11 
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
     13 
     14 # df = df.drop_duplicates(subset =['call_id','comb_words'])

~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879 
   6880     def applymap(self, func) -> "DataFrame":

~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    294             try:
    295                 result = libreduction.compute_reduction(
--> 296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )
    298             except ValueError as err:

pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()

<ipython-input-28-117bb81b84df> in <lambda>(row)
     10 cols = ['run fast','eat cake','faster','candy']
     11 
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
     13 
     14 # df = df.drop_duplicates(subset =['call_id','comb_words'])

~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1477     def __nonzero__(self):
   1478         raise ValueError(
-> 1479             f"The truth value of a {type(self).__name__} is ambiguous. "
   1480             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1481         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Desired output:

col1      col2          col2_new     run fast    eat cake   faster  candy   comb_words
0   123                                                                         
1   123   I eat cake.   i eat cake               eat cake                   eat cake
2   456   We run fast.  we run fast  run fast                               run fast
3   456   We eat cake?  we eat cake              eat cake                   eat cake
4   789   I run faster! i run faster run fast               faster          run fast , faster
5   789   I eat candy.  i eat candy                                 candy   candy

CodePudding user response：

Without conditional statement, you can use:

df['comb_words'] = df[cols].stack().loc[lambda x: x != ''] \
                           .groupby(level=0).apply(lambda x: ' , '.join(x))
print(df)

# Output
   col1           col2      col2_new  run fast  eat cake  faster  candy         comb_words
0   123                                                                                NaN
1   123    I eat cake.    i eat cake            eat cake                          eat cake
2   456   We run fast.   we run fast  run fast                                    run fast
3   456   We eat cake?   we eat cake            eat cake                          eat cake
4   789  I run faster!  i run faster  run fast            faster         run fast , faster
5   789   I eat candy.   i eat candy                              candy              candy

CodePudding user response：

Without using a complicated lambda, you can just write a function and then pass it to apply:

# ...

def func(row):
    if not row:
        return ""
    else:
        return ' , '.join(row.values.astype(str))


df['comb_words'] = df[cols].apply(func, axis=1)