I have a column in a dataframe called 'STREET_NO' and I'm trying to add a new column in the dataframe called 'BIN'. I am trying to use pd.cut() to take the 'STREET_NO' (such as 1290) and change it to 1200. In other words, for street numbers from 0-300, I would like the new 'BIN' column to show 0. For street numbers from 301-600, I would like the 'BIN' column to show 300, and so on. My 'STREET_NO' values range from 1 - 99999. Currently I have:
df['BIN'] = pd.cut(x=df['STREET_NO'], bins=[0,300,600,900,1200,1500,1800,2100], labels=['0','300','600','900','1200','1500','1800']
I can simply keep adding more and more numbers to my 'bins' and 'labels' until I reach the final 99999, but is there an easier way to do this?
CodePudding user response:
You can use range()
function for both bins and labels, just make sure the ranges overlap the total number of rows:
import pandas as pd
df = pd.DataFrame({'STREET_NO': range(1, 100000)}) # range end is n 1 to reproduce 1-99999
df['BIN'] = pd.cut(x=df['STREET_NO'],
bins=list(range(0, 100500, 300)), # so 99901 would be in the last bin
labels=list(range(0, 100200, 300))) # labels are bins-1
print(df.tail(300))
# Output:
STREET_NO BIN
99599 99600 99300
99600 99601 99600
99601 99602 99600
99602 99603 99600
99603 99604 99600
... ... ...
99994 99995 99900
99995 99996 99900
99996 99997 99900
99997 99998 99900
99998 99999 99900
400 rows × 2 columns