I have a DataFrame called df3 with 2 columns - 'fan' and 'Household Income' as seen below. I'm trying to iterate through the 'Household Income' column and if the value of the column is '$0 - $24,999', add it to bin 'low_inc'. If the value of the column is '$25,000 - $49,999', add it to bin 'lowmid_inc', etc. But I'm getting an error saying 'int' object is not iterable.
df3 = df_hif.dropna(subset=['Household Income', 'fan'],how='any')
low_inc = []
lowmid_inc = []
mid_inc = []
midhigh_inc = []
high_inc = []
for inc in df3['Household Income']:
if inc == '$0 - $24,999':
low_inc = 1
elif inc == '$25,000 - $49,999':
lowmid_inc = 1
elif inc == '$50,000 - $99,999':
mid_inc = 1
elif inc == '$100,000 - $149,999':
midhigh_inc = 1
else:
high_inc = 1
#print(low_inc)
Here is a sample of 5 rows of the df used:
Household Income fan
774 25,000− 49,999 Yes
290 50,000− 99,999 No
795 50,000− 99,999 Yes
926 $150,000 No
1017 $150,000 Yes
The left column (774, 290, etc.) is the index, showing the respondents ID. The 5 ranges of the different 'Household Income' columns are listed above in my if/else statement, but I'm receiving an error when I try to print out the bins.
For each respondent, I'm trying to add 1 to the buckets 'low_bin', 'high_bin', etc. So I'm trying to count the number of respondents that have a household income between 0-24999, 25000-49000, etc. How can I iterate through a column to count the number of respondents into the appropriate bins?
CodePudding user response:
Iterating in Pandas is not preferable. You can separate them to different dataframes:
low_inc = df3[df3['Household Income'] == '$0 - $24,999'
lowmid_inc = df3[df3['Household Income'] == '$25,000 - $49,999'
etc...
The len(low_inc)
for example will give you the number of rows in each dataframe
Alternatively, try groupby
:
df3.grouby('Household Income').count()
CodePudding user response:
I would simply use
df3 = df3['Household Income']
bins = int(max(df3)-min(df3)/25000)
out = df3.hist(bins=10)
finally take the sum of out results in related bins. ex. 25000-50000 will be related to 1 bin whereas 50000-100000 will be 2 bins.