Home > Blockchain >  Use previous row value to calculate next value in Pandas MultiIndex DataFrame
Use previous row value to calculate next value in Pandas MultiIndex DataFrame

Time:03-16

This SO answer comes close to what I am looking for, but I cannot apply it to a MultiIndex DataFrame.

I define a hierarchically-indexed Dataframe like this:

arrays = [["bar", "bar"], ["one", "two"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4, 2), index=["A", "B", "C", "D"], columns=index)

Output:

first        bar             foo
second       one       two   one
A       1.764052  0.400157  None
B       0.978738  2.240893  None
C       1.867558 -0.977278  None
D       0.950088 -0.151357  None

I want to populate a new df['foo', 'one'] column as follows:

  • I want the first row n to be 100.
  • I want row n 1 to be: n * (df['bar', 'one'][n] 1)
  • And so on.

The final table should look like:

first        bar             foo
second       one       two   one
A       1.764052  0.400157  100
B       0.978738  2.240893  276.4
C       1.867558 -0.977278  546.9
D       0.950088 -0.151357  1568.4

The main issue I am facing is accessing the row idx of a MultiIndex DataFrame. df['foo', 'one'][0]=100 works but throws a warning: "A value is trying to be set on a copy of a slice from a DataFrame".

And df.loc[0, 'bar', 'one'] throws "Too many indexers"

CodePudding user response:

IIUC, what you want is a cumprod where you initialize the value to 100. The rest is just indexing:

START = 100
df[('foo', 'one')] = (df[('bar', 'one')]
                      .add(1)
                      .shift(fill_value=START)
                      .cumprod()
                      )

output:

first        bar                    foo
second       one       two          one
A       1.764052  0.400157   100.000000
B       0.978738  2.240893   276.405235
C       1.867558 -0.977278   546.933537
D       0.950088 -0.151357  1568.363633
indexing

Independently of your goal, to index a MultiIndex you would need to use:

df.loc['A', ('bar', 'one')]

or, for a mix of names and relative indexing:

df[('bar', 'one')].iloc[0]

CodePudding user response:

Your index doesn't include 0, also, put the multiindex as a tuple

df.loc['A', ('bar', 'one')]
  • Related