Home > Software engineering >  Pandas: Multiple indices in a dataframe: drop some, keep others
Pandas: Multiple indices in a dataframe: drop some, keep others

Time:11-20

My data has the following structure:

>>> df.head()
                                   value
Date       FIPS_state Date              
2001-01-01 1          2001-03-31  6.4621
           2          2001-03-31 11.3259
           4          2001-03-31  6.3467
           5          2001-03-31  6.0613
           6          2001-03-31  7.5069

[I'd like to post this dataframe here for convenience, but I can't even figure that out now. However see data and the steps outlined further down to recreate it.]

The desired output is:

>>> df.head()                                   
   FIPS_state Date        value      
0  1          2001-03-31  6.4621
1  2          2001-03-31 11.3259
2  4          2001-03-31  6.3467
3  5          2001-03-31  6.0613
4  6          2001-03-31  7.5069

where I want to drop the first Date index but keep the second Date index, and also have the FIPS_state index as a variable.

Maybe I shouldn't be here in the first place. The Date index was created while running the following:

import pandas
from pandas import Timestamp

data = pandas.DataFrame.from_dict({'FIPS_state': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, 'FIPS_county': {0: 3, 1: 3, 2: 3, 3: 3, 4: 3}, 'value': {0: 3.1, 1: 3.4, 2: 3.9, 3: 5.9, 4: 6.4}, 'Date': {0: Timestamp('2020-12-01 00:00:00'), 1: Timestamp('2020-11-01 00:00:00'), 2: Timestamp('2020-10-01 00:00:00'), 3: Timestamp('2020-09-01 00:00:00'), 4: Timestamp('2020-08-01 00:00:00')}, 'Month/Year': {0: '12/2020', 1: '11/2020', 2: '10/2020', 3: '9/2020', 4: '8/2020'}})

df = data.set_index('Date').groupby(['Date','FIPS_state']).resample('Q')['value'].mean().to_frame()

>>> df.head()
#   FIPS_state FIPS_county  value       Date Month/Year
# 0          1           3 3.1000 2020-12-01    12/2020
# 1          1           3 3.4000 2020-11-01    11/2020
# 2          1           3 3.9000 2020-10-01    10/2020
# 3          1           3 5.9000 2020-09-01     9/2020
# 4          1           3 6.4000 2020-08-01     8/2020

EDIT: This is not even doing the correct calculation, is it? Oh my... Anyways, my question about the index has been answered below by @user17242583, thanks!

CodePudding user response:

I feel like you need

df.groupby([pd.Grouper(key='Date', freq='Q'), 'FIPS_state'])['value'].mean().reset_index()

        Date  FIPS_state     value
0 2020-09-30           1  6.150000
1 2020-12-31           1  3.466667

CodePudding user response:

You can do it by removing the the first Date column from the index (or any Date column - there just shouldn't be duplicate column names):

df.index = df.index.droplevel(0)

Then reset the index:

df = df.reset_index()

Output:

>>> df
   FIPS_state        Date    value
0           1  2001-03-31   6.4621
1           2  2001-03-31  11.3259
2           4  2001-03-31   6.3467
3           5  2001-03-31   6.0613
4           6  2001-03-31   7.5069
  • Related