Home > Software design >  Using Janitor based on other columns in the same row to output conditional results
Using Janitor based on other columns in the same row to output conditional results

Time:01-07

What I'm trying to do is have if the row from ColX is in the row ColZ I want a new column to be ColZ if not its colA.

I kept looking around but couldn't find a solution.

My data is a bit more in depth but this example should do it.

Perhaps there is a way out of janitor to do it which I am open to.

Edit:

I put in the wrong example code. Totally my fault. Updating it now.

df  = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
    }
)


# Desired Output:

output_df = pd.DataFrame(
    {
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
'result' :["zang", "1", "1", "1", "zang"]
    }
)

Here is what I have tried.

output_df = jn.case_when(df,

                  df['colZ'].str.contains(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')

# Also tried this and many others

output_df = jn.case_when(df,

                  df['colZ'].isin(df['colX']),  df['colZ'],
                  df['colA'],

                  column_name='result')


CodePudding user response:

We can break it into 2 parts:

  1. Create variables for each columns:
col_a = df['colA']
col_x = df['colX']
col_z = df['colZ']
  1. Iterate over the rows and check if the word in the ColX is included in the colZ:
df['result'] = [col_z[col_index] if col_x[col_index].upper() in col_z[col_index].upper() else col_a[col_index] for col_index in range(df.shape[0])]

Or you can do a bigger one-liner by not initializing new variables on step 1, but it's getting too heavy...
Hope it helps!

Another way would be to zip:

df['result'] = [colz 
                if colx.lower() in colz 
                else cola 
                for colz, colx, cola 
                in zip(df.colZ, df.colX, df.colA)]
  • Related