I have the following dataframe:
d = {'col1': [1, "Avoid", 3, "Avoid"], 'col2': ["AA", "BB", "Avoid", "Avoid"]}
df = pd.DataFrame(data=d)
df
col1 col2
0 1 AA
1 Avoid BB
2 3 Avoid
3 Avoid Avoid
I have to conditionally concat col1 and col2 into col3. Conditions:
- Only concat 2 columns as long as none of them is Avoid.
- If any of col1 and col2 is Avoid, col3 will be equal to Avoid as well.
- When performing concatination, " & " needs to be added between column values in col3. For instance, first row of col3 will be "1 & AA".
The end result is supposed to look as the following:
col1 col2 col3
0 1 AA 1 & AA
1 Avoid BB Avoid
2 3 Avoid Avoid
3 Avoid Avoid Avoid
How can I do this without dealing with for loops?
CodePudding user response:
try this:
df["col3"]=df.apply(lambda x:"Avoid" if x["col1"]=="Avoid" or x["col2"]=="Avoid" else f"{x['col1']} & {x['col2']}",axis=1)
CodePudding user response:
df["col3"] = df["col1"] " & " df["col2"]
df["col3"] = df["col3"].apply(lambda x: "Avoid" if x.contains("Avoid") else x)
CodePudding user response:
Use boolean operations, this enables you to use an arbitrary number of columns:
# is any value in the row "Avoid"?
m = df.eq('Avoid').any(1)
# concatenate all columns unless there was a "Avoid"
df['col3'] = df.astype(str).agg(' & '.join, axis=1).mask(m, 'Avoid')
Alternative that should be faster if you have many rows and few with "Avoid":
m = df.ne('Avoid').all(1)
df.loc[m, 'col3'] = df[m].astype(str).agg(' & '.join, axis=1)
df['col3'] = df['col3'].fillna('Avoid')
output:
col1 col2 col3
0 1 AA 1 & AA
1 Avoid BB Avoid
2 3 Avoid Avoid
3 Avoid Avoid Avoid
CodePudding user response:
Is not an efficient way to work with pandas, but if you can't change the data structure these are solutions::
Solution 1:
def custom_merge(cols):
if cols["col1"]=="Avoid" or cols["col2"]=="Avoid":
return "Avoid"
else:
return f"{cols['col1']} & {cols['col2']}"
df['col3'] = df.apply(custom_merge, axis=1)
Solution 2:
df['col3'] = (df["col1"].astype(str) " & " df["col2"].astype(str)).apply(lambda x: "Avoid" if 'Avoid' in x else x)
Both solutions results in the following:
col1 col2 col3
0 1 AA 1 & AA
1 Avoid BB Avoid
2 3 Avoid Avoid
3 Avoid Avoid Avoid
Execution Time comparison
[in progress]