Simple example to understand vectorisation in Pandas-CodePudding

I am new to Python and I am trying to understand how vectorisation works in pandas dataframes.

Let's take this dataframe as example:

df = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])

And let's say I want to add a new column flag with value 0 if the entry of the first column is below the df.mean() value and value 1 otherwise. The result would be:

df["flag"] = [0,0,0,0,0,1,1,1,1,1]

Could anyone explain me how to apply this element wise check to the dataframe and what would be the difference compared to using a loop?

CodePudding user response：

You can simply write the comparison and pandas will take care of the rest. Take a look at this:

mean_val = df[0].mean() # The mean of your first column
df["flag"] = (df[0] >= mean_val).astype(int)
# the 'astype(int)' is used to get 1/0 instead of True/False

The advantages of doing this are:

Simpler to read and write.
Pandas can do many optimizations since it knows it will be doing element-wise comparison instead of a generic loop.