I have this example df:
df6 = pd.DataFrame({
'answer1': ['UK', 'Paris', 'Toronto'],
'answer2': ['Paris', 'Paris', 'Paris'],
'answer3': ['CA', 'CA', 'CA'],
'correct': [0.4, '3.1', 'Answer3']
})
df6:
answer1 answer2 answer3 correct
0 UK Paris CA 0.4
1 Paris Paris CA 3.1
2 Toronto Paris CA Answer3
I want to replace the text "Answer3" in the correct column with just 3 based on a condition of Answer1 column is Toronto.
So, I created a function, then used apply when the answer1 == Toronto:
def replace_answer(text):
return text.replace("Answer", "")
df6.loc[df6['answer1'] == 'Toronto', 'correct'] = df6['correct'].apply(lambda x : replace_answer(x))
I get the following error: AttributeError: 'float' object has no attribute 'replace'
why my code is handling all the correct column while I am only choosing cells that contain Toronto as a condition?
CodePudding user response:
The code df6['correct'].apply(...)
is applying on the whole column, so you get the error
That df6.loc[df6['answer1'] == 'Toronto', 'correct']
is only where the results are going to go regarding the index
Use the filter in both sides
df6.loc[df6['answer1'] == 'Toronto', 'correct'] = \
df6.loc[df6['answer1'] == 'Toronto', 'correct'].apply(lambda x: replace_answer(x))
If you want to convert all to float, pass all to the method, you could let the method handle it
def replace_answer(text):
""" Remove all non-digit/non-dot """
return float(re.sub(r"[^\d.]", "", str(text)))
df6['correct'] = df6['correct'].apply(replace_answer)
print(df6)
CodePudding user response:
Its because you are applying the apply
function of the entire df6.
You should be instead doing like this:
df6.loc[df6['answer1'] == 'Toronto', 'correct'] = df6.loc[df6['answer1'] == 'Toronto', 'correct'].apply(lambda x : replace_answer(x))