KeyError: "None of [Index([(84, 90, 50, 29, 49, 44, 30, 98, 31, 66), (68, 78, 28, 80, 45, 56, 5-CodePudding

I know almost nothing about this stuff. But still, my teacher says she can't help. I have tried looking at this error, but everything is over my head. What am I doing wrong?

import pandas as pd


data = {"student": ["Anayo","Brandon","Claudia","Dave","Evelyn","Finn","Gloria","Hank","Isla", "Julia" ],
        "test_one": [84, 90, 50, 29, 49, 44, 30, 98, 31, 66],
        "test_two": [68, 78, 28, 80, 45, 56, 53, 93, 31, 66],
        "test_three": [42, 35, 30, 40, 28, 85, 80, 99, 38, 48]
    }


test_data = pd.DataFrame(data)

def max_two(one, two):
    return test_data[[one, two]].max(axis=1)

test_data["max_1_and_2"] = max_two(test_data["test_one"], test_data["test_two"])

CodePudding user response：

The problem is that your function expects you the column names, but you are sending the actual columns instead. Use:

import pandas as pd

data = {"student": ["Anayo","Brandon","Claudia","Dave","Evelyn","Finn","Gloria","Hank","Isla", "Julia" ],
        "test_one": [84, 90, 50, 29, 49, 44, 30, 98, 31, 66],
        "test_two": [68, 78, 28, 80, 45, 56, 53, 93, 31, 66],
        "test_three": [42, 35, 30, 40, 28, 85, 80, 99, 38, 48]
    }


test_data = pd.DataFrame(data)

def max_two(one, two):
    return test_data[[one, two]].max(axis=1)

test_data["max_1_and_2"] = max_two("test_one", "test_two")

CodePudding user response：

It may be implemented easier, without separate function:

import pandas as pd

data = {"student": ["Anayo","Brandon","Claudia","Dave","Evelyn","Finn","Gloria","Hank","Isla", "Julia" ],
        "test_one": [84, 90, 50, 29, 49, 44, 30, 98, 31, 66],
        "test_two": [68, 78, 28, 80, 45, 56, 53, 93, 31, 66],
        "test_three": [42, 35, 30, 40, 28, 85, 80, 99, 38, 48]
    }

test_data = pd.DataFrame(data)

test_data["max_1_and_2"] = test_data.apply(lambda row: max(row["test_one"], row["test_two"]), axis=1 )

Or, if you need as example to use an arbitrary function, then from architectural standpoint is better to separate column processing logic from dataframe itself. In function we should define only custom processing logic, and then apply function to dataframe:

def max_two(one, two):
    return max(one, two)

test_data["max_1_and_2"] = test_data.apply(lambda row: max_two(row["test_one"], row["test_two"]), axis=1 )