Home > Software engineering >  pandas if else conditions for multiple columns using dataframe
pandas if else conditions for multiple columns using dataframe

Time:03-21

Am having dataframe, I want to use apply function or lambda function for string column values in a dataframe to apply if-else conditions for columns. i have tried with for loop iterations

      Input Dataframe
      text1                                        output_column
     ['bread','bread','bread']                      ['bread] --> [ if count values >2 ]
     ['bread','butter','jam']                       ['butter']--> [if all 3 values are unique select 1st element value as output]
     ['bread','jam','jam']                          ['jam']--> [if count values >2]
     ['unknown']                                    ['unknown'] --> [if any of the value came as blank or null mark it as 'unknown']
     

         ################## I tried below lines of code#########

         output_column=[]
         df_value = df[['text_col1','text_col2','text_col3']].values.tolist()
          if np.all(df_value <= 1):
             output_column.append(df_value[1])
          else:
             output_column.append(max_count[np.argmax(df_value)])   


       output Dataframe
      text1                                        output_column
     ['bread','bread','bread']                      ['bread'] 
     ['bread','butter','jam']                       ['butter']
     ['bread','jam','jam']                          ['jam']
     ['unknown']                                    ['unknown']

CodePudding user response:

import pandas as pd

df = pd.DataFrame({'text1': [['bread', 'bread', 'bread'],
                             ['bread', 'butter', 'jam'],
                             ['bread', 'jam', 'jam'],
                             ['unknown']]})

List cells aren't good, so let's explode them:

df = df.explode('text1')

>>> df.head()
     text1
0    bread
0    bread
0    bread
1    bread
1   butter

Now you can use groupby to apply a function to each document (by grouping by index level 0).

The details of the heuristic are up to you, but here's something to start with:

def get_values(s):
    counts = s.value_counts()
     
    if "unknown" in counts:
        return "unknown"
    
    if counts.eq(1).all():
        return s.iloc[1]

    if counts.max() >= 2:
        return counts.idxmax()

Apply to each group:

>>> df.groupby(level=0).text1.apply(get_values)
0      bread
1     butter
2        jam
3    unknown
Name: text1, dtype: object
  • Related