Home > database >  Python Pandas. Endless cycle
Python Pandas. Endless cycle

Time:07-04

Why does this part of the code have an infinite loop? It can't be so, because where I stop this part of code (in Jupyter Notebook), all 99999 values have changed to oil_mean_by_year[data.loc[i]['year']]

for i in data.index:
    if data.loc[i]['dcoilwtico'] == 99999:
         data.loc[i, 'dcoilwtico'] = oil_mean_by_year[data.loc[i]['year']]

CodePudding user response:

Use merge to align the oil mean of a year with the given row:

Merge on data['year'] vs oil_mean_by_year's index

data_with_oil_mean = pd.merge(data, oil_mean_by_year.rename("oil_mean"),
                              left_on="year", right_index=True, how="left")
data_with_oil_mean['dcoilwtico'] = data_with_oil_mean['dcoilwtico'].mask(lambda xs: xs.eq(99999), data_with_oil_mean['oil_mean'])

CodePudding user response:

This is a common mistake when using Pandas and it happens due to some misunderstanding about how Python works with lists. Let's take a look at what actually happens here.

We are trying to change dcoilwtico value for each row that has year equal to 99999. We do that by assigning new value to this column only if current value equals 99999. That means we need to check every single element of our list against 99999 and then assign new value to dcoilwtico only if needed. But there is no way to perform such operation on a list like this one without knowing its length beforehand! So, as soon as you try to access any item from this list that doesn't exist yet - e.g., data.loc(i, 'dcoilwtico') - your program will crash. And since you don't know anything about size of this list before running the script, it'll keep crashing until either memory runs out or you manually terminate the process.

The solution to this problem is simple. Just make sure that your condition checks whether index exists first:

if data.loc(i, 'dcoilwtico') == 99999:

data.loc(i, 'dcoilwtico') = oil_mean_by_year.get(data.loc(i, 'year'), 0)

else:

#...

Now your script should work fine.

  • Related