Home > Back-end >  How to use multiple conditions, including selecting on quantile in Python
How to use multiple conditions, including selecting on quantile in Python

Time:05-22

Imagine the following dataset df:

Row Population_density Distance
1 400 50
2 500 30
3 300 40
4 200 120
5 500 60
6 1000 50
7 3300 30
8 500 90
9 700 100
10 1000 110
11 900 200
12 850 30

How can I make a new dummy column that represents a 1 when values of df['Population_density'] are above the third quantile (>75%) AND the df['Distance'] is < 100, while a 0 is given to the remainder of the data? Consequently, rows 6 and 7 should have a 1 while the other rows should have a 0.

Creating a dummy variable with only one criterium can be fairly easy. For instance, the following condition works for creating a new dummy variable that contains a 1 when the Distance is <100 and a 0 otherwise: df['Distance_Below_100'] = np.where(df['Distance'] < 100, 1, 0). However, I do not know how to combine conditions whereby one of the conditions includes a quantile selection (in this case, the upper 25% of the variable Population_density.

import pandas as pd  
  
# assign data of lists.  
data = {'Row': range(1,13,1), 'Population_density': [400, 500, 300, 200, 500, 1000, 3300, 500, 700, 1000, 900, 850],
        'Distance': [50, 30, 40, 120, 60, 50, 30, 90, 100, 110, 200, 30]}  
  
# Create DataFrame  
df = pd.DataFrame(data) 

CodePudding user response:

You can use & or | to join the conditions

import numpy as np

df['Distance_Below_100'] = np.where(df['Population_density'].gt(df['Population_density'].quantile(0.75)) & df['Distance'].lt(100), 1, 0)
print(df)

    Row  Population_density  Distance  Distance_Below_100
0     1                 400        50                   0
1     2                 500        30                   0
2     3                 300        40                   0
3     4                 200       120                   0
4     5                 500        60                   0
5     6                1000        50                   1
6     7                3300        30                   1
7     8                 500        90                   0
8     9                 700       100                   0
9    10                1000       110                   0
10   11                 900       200                   0
11   12                 850        30                   0

CodePudding user response:

he, to make a function on data frame i recommended to use lambda.

for example this is your function:

def myFunction(value):
 pass

to create a new column 'new_column', (pick_cell) is which cell you want to make a function on:

df['new_column']= df.apply(lambda x : myFunction(x.pick_cell))
  • Related