I have a dataframe, datetime indexed, with one column containing numbers, like so:
No
2020-01-01 200
2020-01-02 15000
2020-01-03 950
2020-01-04 170000
2020-01-05 29000
2020-01-06 46500
...
I need to create another column of categories, let's call it 'cat', which is populated by 0,1 or 2 depending on whether the value of 'No' is <10000, >=10000 & <100000, or >=100000, respectively. So I want it to be like this:
No cat
2020-01-01 200 0
2020-01-02 15000 1
2020-01-03 950 0
2020-01-04 170000 2
2020-01-05 29000 1
2020-01-06 46500 1
...
So far I have tried to populate an empty list and then add this as the values for df.cat using the following code:
cat = []
for i in df.No:
if i >= 100000:
cat = 2
elif (i>10000) & (i<100000):
cat = 1
else:
cat = 0
df.cat = cat
Thanks!
CodePudding user response:
Use np.select
. Check gt
and lt
, and let default
be 1 (in between):
import pandas as pd
import numpy as np
data = {'No': {'2020-01-01': 200, '2020-01-02': 15000, '2020-01-03': 950,
'2020-01-04': 170000, '2020-01-05': 29000, '2020-01-06': 46500}}
df = pd.DataFrame(data)
condlist = [df.No.gt(100000),
df.No.lt(10000)]
choicelist = [2, 0]
df['cat'] = np.select(condlist, choicelist, default=1)
print(df)
No cat
2020-01-01 200 0
2020-01-02 15000 1
2020-01-03 950 0
2020-01-04 170000 2
2020-01-05 29000 1
2020-01-06 46500 1
CodePudding user response:
Use pandas.cut
:
bins = [0, 10000, 100000, float('inf')]
df['cat'] = pd.cut(df['No'], bins=bins, labels=range(len(bins)-1))
Variant with cat.codes
:
bins = [0, 10000, 100000, float('inf')]
df['cat'] = pd.cut(df['No'], bins=bins).cat.codes
output:
No cat
2020-01-01 200 0
2020-01-02 15000 1
2020-01-03 950 0
2020-01-04 170000 2
2020-01-05 29000 1
2020-01-06 46500 1