This SO answer comes close to what I am looking for, but I cannot apply it to a MultiIndex DataFrame
.
I define a hierarchically-indexed Dataframe
like this:
arrays = [["bar", "bar"], ["one", "two"]]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=["first", "second"])
np.random.seed(0)
df = pd.DataFrame(np.random.randn(4, 2), index=["A", "B", "C", "D"], columns=index)
Output:
first bar foo
second one two one
A 1.764052 0.400157 None
B 0.978738 2.240893 None
C 1.867558 -0.977278 None
D 0.950088 -0.151357 None
I want to populate a new df['foo', 'one'] column as follows:
- I want the first row n to be 100.
- I want row n 1 to be: n * (df['bar', 'one'][n] 1)
- And so on.
The final table should look like:
first bar foo
second one two one
A 1.764052 0.400157 100
B 0.978738 2.240893 276.4
C 1.867558 -0.977278 546.9
D 0.950088 -0.151357 1568.4
The main issue I am facing is accessing the row idx of a MultiIndex DataFrame
. df['foo', 'one'][0]=100
works but throws a warning: "A value is trying to be set on a copy of a slice from a DataFrame"
.
And df.loc[0, 'bar', 'one']
throws "Too many indexers"
CodePudding user response:
IIUC, what you want is a cumprod
where you initialize the value to 100. The rest is just indexing:
START = 100
df[('foo', 'one')] = (df[('bar', 'one')]
.add(1)
.shift(fill_value=START)
.cumprod()
)
output:
first bar foo
second one two one
A 1.764052 0.400157 100.000000
B 0.978738 2.240893 276.405235
C 1.867558 -0.977278 546.933537
D 0.950088 -0.151357 1568.363633
indexing
Independently of your goal, to index a MultiIndex you would need to use:
df.loc['A', ('bar', 'one')]
or, for a mix of names and relative indexing:
df[('bar', 'one')].iloc[0]
CodePudding user response:
Your index doesn't include 0
, also, put the multiindex as a tuple
df.loc['A', ('bar', 'one')]