I am trying to define a function that will take any dataframe with an 'Age' column, bin the ages, and return how many Xs are in each age category.
Consider the following:
def age_range():
x = input("Enter Dataframe Name: ")
df = x
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return print("Age Ranges:", result)
I keep getting a Type Error: string indices must be integers.
I thought that by calling the df['Age'], it would return a one-column series from which the binning and labelling would work effectively. But it isn't working for me.
CodePudding user response:
the problem lies here
x = input("Enter Dataframe Name: ") # type of x is a string
df = x # now type of df is also a string
df['Age'] # python uses [] as a slicing operation for string, hence generate error
this would resolve your problem
def age_range(df):
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return result
for example, you can run it like:
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
df["AgeRange"] = age_range(df)
or
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})
CodePudding user response:
Assuming you want the total bin counts over the dataFrame:
from numpy import random
import pandas as pd
df1 = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(100)]})
def age_range(df):
import pandas as pd
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
result = pd.DataFrame(df['AgeGroup'].groupby(df['AgeGroup']).count())
return result
print(age_range(df1))
This returns a single column DataFrame