What I'm trying to do is have if the row from ColX is in the row ColZ I want a new column to be ColZ if not its colA.
I kept looking around but couldn't find a solution.
My data is a bit more in depth but this example should do it.
Perhaps there is a way out of janitor to do it which I am open to.
Edit:
I put in the wrong example code. Totally my fault. Updating it now.
df = pd.DataFrame(
{
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
}
)
# Desired Output:
output_df = pd.DataFrame(
{
'colZ' :["zang", "zang", "zang", "z", "zang"],
'colX' :["A", "B", "B", "A", "Z"],
'colA' :["1", "1", "1", "1", "1"],
'result' :["zang", "1", "1", "1", "zang"]
}
)
Here is what I have tried.
output_df = jn.case_when(df,
df['colZ'].str.contains(df['colX']), df['colZ'],
df['colA'],
column_name='result')
# Also tried this and many others
output_df = jn.case_when(df,
df['colZ'].isin(df['colX']), df['colZ'],
df['colA'],
column_name='result')
CodePudding user response:
We can break it into 2 parts:
- Create variables for each columns:
col_a = df['colA']
col_x = df['colX']
col_z = df['colZ']
- Iterate over the rows and check if the word in the
ColX
is included in thecolZ
:
df['result'] = [col_z[col_index] if col_x[col_index].upper() in col_z[col_index].upper() else col_a[col_index] for col_index in range(df.shape[0])]
Or you can do a bigger one-liner by not initializing new variables on step 1, but it's getting too heavy...
Hope it helps!
Another way would be to zip:
df['result'] = [colz
if colx.lower() in colz
else cola
for colz, colx, cola
in zip(df.colZ, df.colX, df.colA)]