Home > Blockchain >  Pandas: overwrite values in a multiindex dataframe based on a non-multiindex mask
Pandas: overwrite values in a multiindex dataframe based on a non-multiindex mask

Time:10-07

This is the first time I'm posting a question myself here. So far I've almost always found solutions to my problems in existing questions (What a great forum and community!). Please bear with me, though, if this question (or a very similar one) has been asked and answered elsewhere on stackoverflow.

I have a multiindex dataframe (test_data) that contains different variables (outer level) for the same set of cities (inner level) and the same range of years (columns) that looks like this:

             1990  1991  1992  1993  1994
VAR CITY
1   Berlin    40    41    42    43    44
    Paris     36    35    34    33    32
    London    30    30    30    30    30
2   Berlin    35    34    33    32    31
    Paris     39    38    39    40    41
    London    45    44    43    42    41
3   Berlin    24    25    26    27    28
    Paris     24    24    25    26    27
    London    29    29    29    30    31
2m  Berlin     1     2     3     4     5
    Paris      2     3     4     5     6
    London     3     4     5     6     7

which one could obtain from this code:

test_dict = {(1,'Berlin'): [40,41,42,43,44],
             (1,'Paris'):  [36,35,34,33,32],
             (1,'London'): [30,30,30,30,30],
             (2,'Berlin'): [35,34,33,32,31],
             (2,'Paris'):  [39,38,39,40,41],
             (2,'London'): [45,44,43,42,41],
             (3,'Berlin'): [24,25,26,27,28],
             (3,'Paris'):  [24,24,25,26,27],
             (3,'London'): [29,29,29,30,31],
             ('2m','Berlin'): [1,2,3,4,5],
             ('2m','Paris'):  [2,3,4,5,6],
             ('2m','London'): [3,4,5,6,7]}
test_data = pd.DataFrame(test_dict, index=[1990,1991,1992,1993,1994]).transpose()

Now I want to set all values for variables 1 and 2 to NaN where the sum of variables 1 to 3 is less than 98 or greater than 102, i.e. Berlin 1994, Paris 1991, as well as London 1990 and 1991 (see below).

I have assigned a new DataFrame

df_sum = test_data.loc[[1,2,3]].sum(level=1)
df_sum
        1990  1991  1992  1993  1994
Berlin    99   100   101   102   103
Paris     99    97    98    99   100
London   104   103   102   102   102

and set

mask = (df_sum < 98) | (df_sum > 102)
mask
         1990   1991   1992   1993   1994
Berlin  False  False  False  False   True
Paris   False   True  False  False  False
London   True   True  False  False  False

df_sum and mask are non-multiindex DataFrames obviously and have the same dimensions as test_data.loc[1], ... Now I would like to do something like

for var in [1,2]: test_data.loc[var][mask] = np.nan

I understand why this doesn't work and yields a SettingWithCopy warning. However, so far I didn't manage to figure out a(n elegant) way to do this. I found this thread (Pandas: Apply mask to multiindex dataframe) and thought this was probably the right direction but the difference is that there the mask has the same dimensions as the original multiindex dataframe.

Any help is much appreciated.

edit: I wouldn't think that this would be an elegant solution but it doesn't even work and I really don't understand this behaviour:

for var in range(1,4):
    tmp = test_data.loc[var].copy()
    tmp[test_mask] = np.nan
    test_data.loc[var] = tmp.copy()

This leads to test_data.loc[1], ...loc[2] and ...loc[3] being all NaNs although tmp has only 4 NaNs after applying test_mask.

CodePudding user response:

You're getting the warning because you're working with views, not copies, but if I understand correctly all you need is this:

# you had this bit, but groupby syntax is preferred
df_sum = test_data.loc[[1,2,3]].groupby(level=1).sum()

for city, years in ((df_sum < 98) | (df_sum > 102)).iterrows():
    # get the years for which condition is True
    for year in years[years].index:
        test_data.loc[(slice(1,2), city), year] = np.nan

This uses the slice syntax for Multi-index selection, you can read more about it here.

  • Related