Pandas condition does not work on a selected row value by (.loc)-CodePudding

I have this example df:

df6 = pd.DataFrame({
                   'answer1': ['UK', 'Paris', 'Toronto'],
                   'answer2': ['Paris', 'Paris', 'Paris'],
                   'answer3': ['CA', 'CA', 'CA'],
                   'correct': [0.4, '3.1', 'Answer3']
                   })

df6:

    answer1    answer2  answer3 correct
0    UK         Paris     CA    0.4
1    Paris      Paris     CA    3.1
2    Toronto    Paris     CA    Answer3

I want to replace the text "Answer3" in the correct column with just 3 based on a condition of Answer1 column is Toronto.

So, I created a function, then used apply when the answer1 == Toronto:

def replace_answer(text):
    return text.replace("Answer", "")

df6.loc[df6['answer1'] == 'Toronto', 'correct'] = df6['correct'].apply(lambda x : replace_answer(x))

I get the following error: AttributeError: 'float' object has no attribute 'replace'

why my code is handling all the correct column while I am only choosing cells that contain Toronto as a condition?

CodePudding user response：

The code df6['correct'].apply(...) is applying on the whole column, so you get the error

That df6.loc[df6['answer1'] == 'Toronto', 'correct'] is only where the results are going to go regarding the index

Use the filter in both sides

df6.loc[df6['answer1'] == 'Toronto', 'correct'] = \
    df6.loc[df6['answer1'] == 'Toronto', 'correct'].apply(lambda x: replace_answer(x))

If you want to convert all to float, pass all to the method, you could let the method handle it

def replace_answer(text):
    """ Remove all non-digit/non-dot """
    return float(re.sub(r"[^\d.]", "", str(text)))

df6['correct'] = df6['correct'].apply(replace_answer)
print(df6)

CodePudding user response：

Its because you are applying the apply function of the entire df6.

You should be instead doing like this:

df6.loc[df6['answer1'] == 'Toronto', 'correct'] = df6.loc[df6['answer1'] == 'Toronto', 'correct'].apply(lambda x : replace_answer(x))