Home > Enterprise >  Pandas - How can I iterate through a column to put respondents into appropriate bins?
Pandas - How can I iterate through a column to put respondents into appropriate bins?

Time:10-17

I have a DataFrame called df3 with 2 columns - 'fan' and 'Household Income' as seen below. I'm trying to iterate through the 'Household Income' column and if the value of the column is '$0 - $24,999', add it to bin 'low_inc'. If the value of the column is '$25,000 - $49,999', add it to bin 'lowmid_inc', etc. But I'm getting an error saying 'int' object is not iterable.

df3 = df_hif.dropna(subset=['Household Income', 'fan'],how='any')

low_inc = []
lowmid_inc = []
mid_inc = []
midhigh_inc = []
high_inc = []

for inc in df3['Household Income']:
    if inc == '$0 - $24,999':
        low_inc  = 1
    elif inc == '$25,000 - $49,999':
        lowmid_inc  = 1
    elif inc == '$50,000 - $99,999':
        mid_inc  = 1
    elif inc == '$100,000 - $149,999':
        midhigh_inc  = 1
    else:
        high_inc  = 1
        
#print(low_inc)

Here is a sample of 5 rows of the df used:

Household  Income           fan
774        25,000− 49,999   Yes
290        50,000− 99,999   No
795        50,000− 99,999   Yes
926        $150,000         No
1017       $150,000         Yes

The left column (774, 290, etc.) is the index, showing the respondents ID. The 5 ranges of the different 'Household Income' columns are listed above in my if/else statement, but I'm receiving an error when I try to print out the bins.

For each respondent, I'm trying to add 1 to the buckets 'low_bin', 'high_bin', etc. So I'm trying to count the number of respondents that have a household income between 0-24999, 25000-49000, etc. How can I iterate through a column to count the number of respondents into the appropriate bins?

CodePudding user response:

Iterating in Pandas is not preferable. You can separate them to different dataframes:

low_inc = df3[df3['Household Income'] == '$0 - $24,999'
lowmid_inc = df3[df3['Household Income'] == '$25,000 - $49,999'

etc...

The len(low_inc) for example will give you the number of rows in each dataframe

Alternatively, try groupby:

df3.grouby('Household Income').count()

CodePudding user response:

I would simply use

df3 = df3['Household Income']
bins = int(max(df3)-min(df3)/25000)
out = df3.hist(bins=10)

finally take the sum of out results in related bins. ex. 25000-50000 will be related to 1 bin whereas 50000-100000 will be 2 bins.

  • Related