Home > Blockchain >  Setting to second level of Pandas multi-index results in NaN
Setting to second level of Pandas multi-index results in NaN

Time:07-13

I am trying to set values to a DataFrame for a specific subset of a multi-index and instead of the values being set I am just getting NaN values.

Here is an example:

df_test = pd.DataFrame(np.ones((10,2)),index = pd.MultiIndex.from_product([['even','odd'],[0,1,2,3,4]],names = ['parity','mod5']))
df_test.loc[('even',),1] = pd.DataFrame(np.arange(5) 5,index = np.arange(5))
df_test
               0    1
parity mod5          
even   0     1.0  NaN
       1     1.0  NaN
       2     1.0  NaN
       3     1.0  NaN
       4     1.0  NaN
odd    0     1.0  1.0
       1     1.0  1.0
       2     1.0  1.0
       3     1.0  1.0
       4     1.0  1.0

whereas I expected the following output:

               0    1
parity mod5          
even   0     1.0  5.0
       1     1.0  6.0
       2     1.0  7.0
       3     1.0  8.0
       4     1.0  9.0
odd    0     1.0  1.0
       1     1.0  1.0
       2     1.0  1.0
       3     1.0  1.0
       4     1.0  1.0

What do I need to do differently to get the expected result? I have tried a few other things like df_test.loc['even']['1'] but that doesn't even affect the DataFrame at all.

CodePudding user response:

In this example, your indices are specially ordered. If you need to do something like this when index matching matters but the ordering of your DataFrame indices is not guaranteed, then this may be accomplished via DataFrame.update like this:

index = np.arange(5)
np.random.shuffle(index)
df_other = pd.DataFrame(np.arange(5)   5, index=index).squeeze()
df_test.loc[('even',), 1].update(df_other)

The .squeeze() is needed to convert the DataFrame into a Series (whose shape and indices match those of df_test.loc[('even',), 1]).

CodePudding user response:

You have:

df_test.loc[('even',),1] = pd.DataFrame(np.arange(5) 5,index = np.arange(5))

This assignment causes NaNs in df_test.loc[('even',),1] for 2 reasons. First, you are trying to assign a pd.DataFrame() to a single column. For this to work, you need the same index, as well as the same column name (which defaults to 0 below, but we need 1). It would be easier to use pd.Series(), in which case we don't need to worry about the name. Second, even with the pd.Series, you need to match the index (and index = np.arange(5) does not).

Try as follows:

pd.Series(np.arange(5,10),index = pd.MultiIndex.from_product([['even'],[0,1,2,3,4]]))

# or: pd.DataFrame(np.arange(5,10),columns=[1],
# index = pd.MultiIndex.from_product([['even'],[0,1,2,3,4]]))

# or, if you don't want to bother with the correct index, 
# you could of course simply do: 
# df_test.loc[('even',),1] = np.arange(5,10)

print(df_test)
               0    1
parity mod5          
even   0     1.0  5.0
       1     1.0  6.0
       2     1.0  7.0
       3     1.0  8.0
       4     1.0  9.0
odd    0     1.0  1.0
       1     1.0  1.0
       2     1.0  1.0
       3     1.0  1.0
       4     1.0  1.0

  • Related