Am having dataframe, I want to use apply function or lambda function for string column values in a dataframe to apply if-else conditions for columns. i have tried with for loop iterations
Input Dataframe
text1 output_column
['bread','bread','bread'] ['bread] --> [ if count values >2 ]
['bread','butter','jam'] ['butter']--> [if all 3 values are unique select 1st element value as output]
['bread','jam','jam'] ['jam']--> [if count values >2]
['unknown'] ['unknown'] --> [if any of the value came as blank or null mark it as 'unknown']
################## I tried below lines of code#########
output_column=[]
df_value = df[['text_col1','text_col2','text_col3']].values.tolist()
if np.all(df_value <= 1):
output_column.append(df_value[1])
else:
output_column.append(max_count[np.argmax(df_value)])
output Dataframe
text1 output_column
['bread','bread','bread'] ['bread']
['bread','butter','jam'] ['butter']
['bread','jam','jam'] ['jam']
['unknown'] ['unknown']
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
['bread', 'butter', 'jam'],
['bread', 'jam', 'jam'],
['unknown']]})
List cells aren't good, so let's explode
them:
df = df.explode('text1')
>>> df.head()
text1
0 bread
0 bread
0 bread
1 bread
1 butter
Now you can use groupby
to apply a function to each document (by grouping by index level 0).
The details of the heuristic are up to you, but here's something to start with:
def get_values(s):
counts = s.value_counts()
if "unknown" in counts:
return "unknown"
if counts.eq(1).all():
return s.iloc[1]
if counts.max() >= 2:
return counts.idxmax()
Apply to each group:
>>> df.groupby(level=0).text1.apply(get_values)
0 bread
1 butter
2 jam
3 unknown
Name: text1, dtype: object