Home > OS >  Why doesn't this Python pandas code work on my dataset?
Why doesn't this Python pandas code work on my dataset?

Time:06-05

I am a newbie in data science, and I encountered a problem about pandas in Python. Basically, I want to substitute the value lower than 0 in a column with 0, and I wonder why this does not work:

Image of my dataset: dataset:
dataset

Original:

submit[submit.score<0].score = 0

Fixed:

submit.loc[submit.score<0, 'score'] = 0

I have already solved this problem by using iloc, but it really confuses me. Any explanation would be great.

CodePudding user response:

Your first attempt is equivalent to submit[submit['score'] < 0]['score'] = 0. Whenever you see multiple [ and ] pairs in your pandas code, it might be a bad sign. In this case, with submit[submit['score'] < 0] you're creating a copy of your dataframe, so you're basically assigning 0 to the score column on that copy, which isn't going to do anything.

By using loc, you eliminate the copy and assign directly to the dataframe.

CodePudding user response:

Using .loc is good, like the sibling answer says.

Even better, sometimes, is to use chaining operations where you create new objects instead of mutating another in-place. This leads to code that is easy to read and follow.

I would suggest the following:

submit = submit.assign(score=submit.score.clip(0, None))

It's still just one line, but it makes a new dataframe with the score column replaced. The .clip() method is used to clamp the values into an interval, in this case so that anything less than 0 will be zero.

This style makes it easy to add more operations in a chain (a style seen other places).

  • Related