Home > Enterprise >  Return a pandas series from a loop
Return a pandas series from a loop

Time:07-22

I have a pandas dataframe nike that looks like this:

    rise1      run1       position
    1          0.82       1
    3          1.64       2
    5          3.09       3
    7          5.15       4
    8          7.98       5
    15         11.12      6

I am trying to make a function that calculates grade (rise/run) and returns it as a pandas series. I want to use X points ahead of the current position minus X points behind the current position to calculate grade (i.e. if X = 2, the grade at position 4 is (15-3)/(11.12-1.64)).

def get_grade(dataf, X=n):
    grade = pd.Series(data = None, index = range(dataf.shape[0]))

    for i in range(X, dataf.shape[0] - X):
        rise = dataf.loc[i   X, 'rise1'] - dataf.loc[i - X,'rise1']
        run = dataf.loc[i   X, 'run1'] - dataf.loc[i - X, 'run1']
        if np.isclose(rise, 0) or np.isclose(run, 0):
            grade[i] = 0
        elif rise / run > 1:
            grade[i] = 1
        elif rise / run < -1:
            grade[i] = -1
        else:
            grade[i] = rise / run

    return grade
   
get_grade(nike, X= 2)

When I call the function, nothing happens. The code executes but nothing appears. What might I be doing wrong? Apologies if this is unclear, I am very new to coding in general with limited vocab in this area.

CodePudding user response:

You have to set a variable equal to the function (so setting the variable equal to your return value) and then print/display that variable. Like df = get_grade(nike, X= 2) print(df). Or put a print call in your function

def test_function():
    df = pd.DataFrame({"col1":[1,2,3,4], "col2":[4,3,2,1]})
    return df
df = test_function()
print(df)

Or

def test_print_function():
    df = pd.DataFrame({"col1":[1,2,3,4], "col2":[4,3,2,1]})
    print(df)
test_print_function()

CodePudding user response:

The way you are working is suboptimal. In general, a for loop .loc in pandas repeatedly is a signal that you're not taking advantage for the framework.

My suggestion is to use a rolling window, and apply your calculations:

WINDOW = 2
rolled = df[['rise1', 'run1']].rolling(2*WINDOW   1, center=True)\
                              .apply(lambda s: s.iloc[0] - s.iloc[-1])
print(rolled['rise1'] / rolled['run1'])

0         NaN
1         NaN
2    0.977654
3    1.265823
4         NaN
5         NaN
dtype: float64

Now, as to your specific problem, I cannot reproduce. Copying and pasting your code in a brand new notebook works fine, but apparently it doesn't yield the results you want (i.e. you don't find (15-3)/(11.12-1.64) as you intended).

  • Related