Imagine the following dataset df
:
Row | Population_density | Distance |
---|---|---|
1 | 400 | 50 |
2 | 500 | 30 |
3 | 300 | 40 |
4 | 200 | 120 |
5 | 500 | 60 |
6 | 1000 | 50 |
7 | 3300 | 30 |
8 | 500 | 90 |
9 | 700 | 100 |
10 | 1000 | 110 |
11 | 900 | 200 |
12 | 850 | 30 |
How can I make a new dummy column that represents a 1 when values of df['Population_density']
are above the third quantile (>75%) AND the df['Distance']
is < 100, while a 0 is given to the remainder of the data? Consequently, rows 6 and 7 should have a 1 while the other rows should have a 0.
Creating a dummy variable with only one criterium can be fairly easy. For instance, the following condition works for creating a new dummy variable that contains a 1 when the Distance is <100 and a 0 otherwise: df['Distance_Below_100'] = np.where(df['Distance'] < 100, 1, 0)
. However, I do not know how to combine conditions whereby one of the conditions includes a quantile selection (in this case, the upper 25% of the variable Population_density
.
import pandas as pd
# assign data of lists.
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}
# Create DataFrame
df = pd.DataFrame(data)
CodePudding user response:
You can use &
or |
to join the conditions
import numpy as np
df['Distance_Below_100'] = np.where(df['Population_density'].gt(df['Population_density'].quantile(0.75)) & df['Distance'].lt(100), 1, 0)
print(df)
Row Population_density Distance Distance_Below_100
0 1 400 50 0
1 2 500 30 0
2 3 300 40 0
3 4 200 120 0
4 5 500 60 0
5 6 1000 50 1
6 7 3300 30 1
7 8 500 90 0
8 9 700 100 0
9 10 1000 110 0
10 11 900 200 0
11 12 850 30 0
CodePudding user response:
he, to make a function on data frame i recommended to use lambda.
for example this is your function:
def myFunction(value):
pass
to create a new column 'new_column', (pick_cell) is which cell you want to make a function on:
df['new_column']= df.apply(lambda x : myFunction(x.pick_cell))