I would like to make a new column in my df that is based off of values in other columns. I have read endless tutorials, but nothing has worked for me yet. I would like a new column "treatment" that is assigned a value of 0 or 1 based off if the value from the column "week" is between the values from columns week_begin
and week_end
.
This what I did:
def conditions(row):
if row['week'] >= 'week_begin" & row['week'] <= 'week_end':
return 1
else:
return 0
union_accident['treatment'] = union_accident.apply(conditions, axis=1)
union_accident.head()
This returns the error:
'>=' not supported between instances of 'int' and 'list'
CodePudding user response:
your immediate error is because you're using single quotes everywhere and have a typo double quote at the end of week_begin
on line 2.
But, fixing that, you can easily do this by directly comparing the columns to each other! No need for .apply
The sneaky .astype(int)
will change this column from a boolean type (True/False) to a numeric type with values of (1/0)
union_accident['treatment'] = (
union_accident['week_begin'] <= union_accident['week']
& union_accident['week'] <= union_accident['week_end']
).astype(int)
But that's a lot of repetition of union_accident
right there- you can also use the .eval
method to do this in a much less verbose manner:
union_accident['treatment'] = union_accident.eval(
'(week_begin <= week) & (week <= week_end)'
).astype(int)
CodePudding user response:
You can use slicing like this,Just Change the
Condition B>=3
to your Condition, i have Written it like this for generalization purposes
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
df.loc[df["B"]>=3,"NewColumn"]=0
df.loc[df["B"]<=3,"NewColumn"]=1
A B NewColumn
0 0 1 1.0
1 2 3 1.0
2 4 5 0.0
3 6 7 0.0
4 8 9 0.0
For More Information https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy