I would like to try to exclude any blank values when using my lambda function
below which will prevent extra commas coming out in my output. If I run the code without an if
statement, I get extra commas in the values of the comb_words
column. How can I incorporate the if
statement to exclude blank values and prevent any extra commas in my output?
code:
# dataframe
df = pd.DataFrame(data ={'col1':[123,123, 456, 456, 789, 789],'col2':["",'I eat cake.','We run fast.',
'We eat cake?','I run faster!','I eat candy.'],'col2_new':["",'i eat cake','we run fast','we eat cake',
'i run faster','i eat candy']})
# words to search on
search_words1 = ['run fast','eat cake','faster','candy']
# create columns based on search words found
for n in search_words1:
df[n] = np.where(df['col2_new'].str.contains(n),n,"")
# combine words into a single column only if value is not blank
cols = ['run fast','eat cake','faster','candy']
df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
df
original dataframe:
col1 col2 col2_new
0 123
1 123 I eat cake. i eat cake
2 456 We run fast. we run fast
3 456 We eat cake? we eat cake
4 789 I run faster! i run faster
5 789 I eat candy. i eat candy
error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-117bb81b84df> in <module>
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
~\anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
294 try:
295 result = libreduction.compute_reduction(
--> 296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
298 except ValueError as err:
pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()
<ipython-input-28-117bb81b84df> in <lambda>(row)
10 cols = ['run fast','eat cake','faster','candy']
11
---> 12 df['comb_words'] = df[cols].apply(lambda row: ' , '.join(row.values.astype(str)) if row else "", axis=1)
13
14 # df = df.drop_duplicates(subset =['call_id','comb_words'])
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1477 def __nonzero__(self):
1478 raise ValueError(
-> 1479 f"The truth value of a {type(self).__name__} is ambiguous. "
1480 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1481 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Desired output:
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
CodePudding user response:
Without conditional statement, you can use:
df['comb_words'] = df[cols].stack().loc[lambda x: x != ''] \
.groupby(level=0).apply(lambda x: ' , '.join(x))
print(df)
# Output
col1 col2 col2_new run fast eat cake faster candy comb_words
0 123 NaN
1 123 I eat cake. i eat cake eat cake eat cake
2 456 We run fast. we run fast run fast run fast
3 456 We eat cake? we eat cake eat cake eat cake
4 789 I run faster! i run faster run fast faster run fast , faster
5 789 I eat candy. i eat candy candy candy
CodePudding user response:
Without using a complicated lambda, you can just write a function and then pass it to apply
:
# ...
def func(row):
if not row:
return ""
else:
return ' , '.join(row.values.astype(str))
df['comb_words'] = df[cols].apply(func, axis=1)