I am a newbie in data science, and I encountered a problem about pandas in Python. Basically, I want to substitute the value lower than 0 in a column with 0, and I wonder why this does not work:
Original:
submit[submit.score<0].score = 0
Fixed:
submit.loc[submit.score<0, 'score'] = 0
I have already solved this problem by using iloc, but it really confuses me. Any explanation would be great.
CodePudding user response:
Your first attempt is equivalent to submit[submit['score'] < 0]['score'] = 0
. Whenever you see multiple [
and ]
pairs in your pandas code, it might be a bad sign. In this case, with submit[submit['score'] < 0]
you're creating a copy of your dataframe, so you're basically assigning 0
to the score
column on that copy, which isn't going to do anything.
By using loc
, you eliminate the copy and assign directly to the dataframe.
CodePudding user response:
Using .loc
is good, like the sibling answer says.
Even better, sometimes, is to use chaining operations where you create new objects instead of mutating another in-place. This leads to code that is easy to read and follow.
I would suggest the following:
submit = submit.assign(score=submit.score.clip(0, None))
It's still just one line, but it makes a new dataframe with the score
column replaced. The .clip()
method is used to clamp the values into an interval, in this case so that anything less than 0 will be zero.
This style makes it easy to add more operations in a chain (a style seen other places).