Home > other >  How to insert a new column into a dataframe and access rows with different indices?
How to insert a new column into a dataframe and access rows with different indices?

Time:04-18

I have a dataframe with one column "Numbers" and I want to add a second column "Result". The values should be the sum of the previous two values in the "Numbers" column, otherwise NaN.

import pandas as pd
import numpy as np

data = {
    "Numbers": [100,200,400,0]
}

df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])

def add_prev_two_elems_to_DF(df):
    numbers = "Numbers" # alias
    result = "Result"   # alias
    df[result] = np.nan # empty column
    result_index = list(df.columns).index(result)
    for i in range(len(df)):
        #row = df.iloc[i]
        if i < 2: df.iloc[i,result_index] = np.nan
        else: df.iloc[i,result_index] = df.iloc[i-1][numbers]   df.iloc[i-2][numbers]

add_prev_two_elems_to_DF(df)
display(df)

The output is:

            Numbers Result
whatever1   100     NaN
whatever2   200     NaN
whatever3   400     300.0
whatever4   0       600.0

But this looks quite complicated. Can this be done easier and maybe faster? I am not looking for a solution with sum(). I want a general solution for any kind of function that can fill a column using values from other rows.

Edit 1: I forgot to import numpy.

Edit 2: I changed one line to this:

if i < 2: df.iloc[i,result_index] = np.nan

CodePudding user response:

Looks like you could use rolling.sum together with shift. Since rollling.sum sums until the current row, we have to shift it down one row, so that each row value matches to the sum of the previous 2 rows:

df['Result'] = df['Numbers'].rolling(2).sum().shift()

Output:

           Numbers  Result
whatever1      100     NaN
whatever2      200     NaN
whatever3      400   300.0
whatever4        0   600.0

CodePudding user response:

This is the shortest code I could develop. It outputs exactly the same table.

import numpy as np
import pandas as pd
#import swifter # apply() gets swifter

data = {
    "Numbers": [100,200,400,0]
}

df = pd.DataFrame(data,index = ["whatever1", "whatever2", "whatever3", "whatever4"])

def func(a: np.ndarray) -> float: # we expect 3 elements, but we don't check that
    a.reset_index(inplace=True,drop=True) # the index now starts with 0, 1,...
    return a[0]   a[1] # we use the first two elements, the 3rd is unnecessary

df["Result"] = df["Numbers"].rolling(3).apply(func)
#df["Result"] = df["Numbers"].swifter.rolling(3).apply(func)
display(df)
  • Related