Home > OS >  How to define a function that will check any data frame for Age column and return bins?
How to define a function that will check any data frame for Age column and return bins?

Time:10-28

I am trying to define a function that will take any dataframe with an 'Age' column, bin the ages, and return how many Xs are in each age category.

Consider the following:

def age_range():
        x = input("Enter Dataframe Name: ")
        df = x
        df['Age']
        bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
        labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
        pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
        return print("Age Ranges:", result)

I keep getting a Type Error: string indices must be integers.

I thought that by calling the df['Age'], it would return a one-column series from which the binning and labelling would work effectively. But it isn't working for me.

CodePudding user response:

the problem lies here

x = input("Enter Dataframe Name: ") # type of x is a string
df = x # now type of df is also a string
df['Age'] # python uses [] as a slicing operation for string, hence generate error

this would resolve your problem

def age_range(df):
        bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
        labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
        result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
        return result

for example, you can run it like:

df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
df["AgeRange"] = age_range(df)

or

df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})

CodePudding user response:

Assuming you want the total bin counts over the dataFrame:

from numpy import random
import pandas as pd

df1 = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(100)]})

def age_range(df):
    import pandas as pd
    df['Age']
    bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
    labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
    df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)

    result = pd.DataFrame(df['AgeGroup'].groupby(df['AgeGroup']).count())
    return result
print(age_range(df1))

This returns a single column DataFrame

  • Related