Home > database >  Filling in multiple missing date values in pandas dataframe
Filling in multiple missing date values in pandas dataframe

Time:03-29

My dataframe looks like this:

data1 = {'date': '2018-09-14', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [11.7778,1.6244,9.1755] }
data2 = {'date': '2018-09-17', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [1.7742,1.6258,9.1579] }
data3 = {'date': '2018-09-18', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [1.7752,1.6260,9.1725] }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df = pd.concat([df1,df2, df3], ignore_index=True)
df

date            base target      exchange_rate
0   2018-09-14  EUR  NZD         11.7778
1   2018-09-14  EUR  AUD         1.6244
2   2018-09-14  EUR  HKD         9.1755
3   2018-09-17  EUR  NZD         1.7742
4   2018-09-17  EUR  AUD         1.6258
5   2018-09-17  EUR  HKD         9.1579
6   2018-09-18  EUR  NZD         1.7752
7   2018-09-18  EUR  AUD         1.6260
8   2018-09-18  EUR  HKD         9.1725

I want to fill in 2018-09-15 and 2018-09-16 with the values for the previous date, 2018-09-14 for each currency.

I have converted the dates to the datetime type, created an index and generated a currency list.

df['date'] =  pd.to_datetime(df['date'], format='%Y/%m/%d')
idx = pd.date_range('09-14-2018', '09-18-2018')
curr_lst = df.target_currency.unique().tolist()

Can you help me with a loop from here? I probably need to loop through each currency, and do a ffill().

I can do it with one currency only:

df_nzd = df.loc[df.target_currency == 'NZD']
df_nzd.set_index('date', inplace=True)
df_nzd = df_nzd.resample('D').ffill().reset_index()
df_nzd

    date       base target  exchange_rate
0   2018-09-14  EUR NZD     11.7778
1   2018-09-15  EUR NZD     11.7778
2   2018-09-16  EUR NZD     11.7778
3   2018-09-17  EUR NZD     1.7742
4   2018-09-18  EUR NZD     1.7752

Of course, there may be a better way to do it.

CodePudding user response:

You've already found almost all methods you can use to achieve what you want. You can use unstack/stack to apply on all currencies:

data1 = {'date': '2018-09-14', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [11.7778,1.6244,9.1755] }
data2 = {'date': '2018-09-17', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [1.7742,1.6258,9.1579] }
data3 = {'date': '2018-09-18', 
        'base_currency': 'EUR', 
        'target_currency': ['NZD', 'AUD', 'HKD'], 
        'exchange_rate': [1.7752,1.6260,9.1725] }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df = pd.concat([df1,df2, df3], ignore_index=True)
df['date'] =  pd.to_datetime(df['date'])

df.set_index(['date', 'target_currency'], inplace=True)
df = df.unstack(level=-1)
df = df.resample('D').ffill()
df = df.stack()
print(df)

Output:

                           base_currency  exchange_rate
date       target_currency                             
2018-09-14 AUD                       EUR         1.6244
           HKD                       EUR         9.1755
           NZD                       EUR        11.7778
2018-09-15 AUD                       EUR         1.6244
           HKD                       EUR         9.1755
           NZD                       EUR        11.7778
2018-09-16 AUD                       EUR         1.6244
           HKD                       EUR         9.1755
           NZD                       EUR        11.7778
2018-09-17 AUD                       EUR         1.6258
           HKD                       EUR         9.1579
           NZD                       EUR         1.7742
2018-09-18 AUD                       EUR         1.6260
           HKD                       EUR         9.1725
           NZD                       EUR         1.7752

CodePudding user response:

You can craft a MultiIndex and reindex, then ffill per target:

date_lst = pd.date_range('09-14-2018', '09-18-2018')
curr_lst = df.target_currency.unique().tolist()
idx = pd.MultiIndex.from_product([date_lst, curr_lst], names=['date', 'target_currency'])

df2 = (df
 .set_index(['date', 'target_currency'])
 .reindex(idx)
 .groupby(level='target_currency').ffill()
 .reset_index()
 )

Output:

         date target_currency base_currency  exchange_rate
0  2018-09-14             NZD           EUR        11.7778
1  2018-09-14             AUD           EUR         1.6244
2  2018-09-14             HKD           EUR         9.1755
3  2018-09-15             NZD           EUR        11.7778
4  2018-09-15             AUD           EUR         1.6244
5  2018-09-15             HKD           EUR         9.1755
6  2018-09-16             NZD           EUR        11.7778
7  2018-09-16             AUD           EUR         1.6244
8  2018-09-16             HKD           EUR         9.1755
9  2018-09-17             NZD           EUR         1.7742
10 2018-09-17             AUD           EUR         1.6258
11 2018-09-17             HKD           EUR         9.1579
12 2018-09-18             NZD           EUR         1.7752
13 2018-09-18             AUD           EUR         1.6260
14 2018-09-18             HKD           EUR         9.1725
  • Related