Home > Back-end >  How do I populate a padas dataframe based on filtering by value range of other columns
How do I populate a padas dataframe based on filtering by value range of other columns

Time:10-12

I have a dataframe, datetime indexed, with one column containing numbers, like so:

             No
2020-01-01   200
2020-01-02   15000
2020-01-03   950
2020-01-04   170000
2020-01-05   29000
2020-01-06   46500  
...

I need to create another column of categories, let's call it 'cat', which is populated by 0,1 or 2 depending on whether the value of 'No' is <10000, >=10000 & <100000, or >=100000, respectively. So I want it to be like this:

             No     cat
2020-01-01   200    0  
2020-01-02   15000  1
2020-01-03   950    0 
2020-01-04   170000 2
2020-01-05   29000  1
2020-01-06   46500  1
...

So far I have tried to populate an empty list and then add this as the values for df.cat using the following code:

    cat = []
    for i in df.No:
        if i >= 100000:
            cat = 2
        elif (i>10000) & (i<100000):
            cat = 1
        else:
            cat = 0  
    df.cat = cat

Thanks!

CodePudding user response:

Use np.select. Check gt and lt, and let default be 1 (in between):

import pandas as pd
import numpy as np

data = {'No': {'2020-01-01': 200, '2020-01-02': 15000, '2020-01-03': 950, 
               '2020-01-04': 170000, '2020-01-05': 29000, '2020-01-06': 46500}}
df = pd.DataFrame(data)

condlist = [df.No.gt(100000),
            df.No.lt(10000)]

choicelist = [2, 0]

df['cat'] = np.select(condlist, choicelist, default=1)

print(df)

                No  cat
2020-01-01     200    0
2020-01-02   15000    1
2020-01-03     950    0
2020-01-04  170000    2
2020-01-05   29000    1
2020-01-06   46500    1

CodePudding user response:

Use pandas.cut:

bins = [0, 10000, 100000, float('inf')]
df['cat'] = pd.cut(df['No'], bins=bins, labels=range(len(bins)-1))

Variant with cat.codes:

bins = [0, 10000, 100000, float('inf')]
df['cat'] = pd.cut(df['No'], bins=bins).cat.codes

output:

                No cat
2020-01-01     200   0
2020-01-02   15000   1
2020-01-03     950   0
2020-01-04  170000   2
2020-01-05   29000   1
2020-01-06   46500   1
  • Related