I am new to Python and I am trying to understand how vectorisation works in pandas dataframes.
Let's take this dataframe as example:
df = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])
And let's say I want to add a new column flag
with value 0 if the entry of the first column is below the df.mean()
value and value 1 otherwise.
The result would be:
df["flag"] = [0,0,0,0,0,1,1,1,1,1]
Could anyone explain me how to apply this element wise check to the dataframe and what would be the difference compared to using a loop?
CodePudding user response:
You can simply write the comparison and pandas will take care of the rest. Take a look at this:
mean_val = df[0].mean() # The mean of your first column
df["flag"] = (df[0] >= mean_val).astype(int)
# the 'astype(int)' is used to get 1/0 instead of True/False
The advantages of doing this are:
- Simpler to read and write.
- Pandas can do many optimizations since it knows it will be doing element-wise comparison instead of a generic loop.