Home > front end >  Pandas efficiently interpolate sections of a larger dataframe
Pandas efficiently interpolate sections of a larger dataframe

Time:01-20

I have data coming in via the bloomberg API, the returned dataframe is a large singular dataframe (66 rows, 5 cols) and then sent out in one block via websocket as a singular JSON str.

I need to linearly/ simple interpolate this 66 row dataframe, however this interpolation must be performed separately for each currency (e.g. KWN = Korean Won, the price is around 1190, whereas for Chinese yuan is only around 6, so we can't interpolated between currencies).

I am currently very inefficiently filtering my dataframe on the index.str so that the first 3 chars match the iterated chosen currency.

I will be very very grateful if anyone has any ideas to help speed this all up/ tips. thanks very much :)

        self.ccy_prefix = ['KWN', 'IRN', 'NTN', 'IHN', 'PPN', 'CCN']
        for ccy in self.ccy_prefix:
            #interp for each section
            df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(method='linear', limit_direction='forward', axis=0)
      

Filtered for Korean Wont, note the NAN at the  4Y tenor

This does successfully work, however it is very slow and inefficient, is there a way to use a map or some other clever Pandas method, I'd attempted to find an alternative but couldn't find anything thus far.

Filtered interpolated DF: enter image description here

Large singular DataFrame: enter image description here

Raw DataFrame:

{'BID': {'KWN 1W BGN Curncy': 1192.83, 'KWN 1M BGN Curncy': 1193.46, 'KWN 2M BGN Curncy': 1194.2, 'KWN 3M BGN Curncy': 1194.68, 'KWN 6M BGN Curncy': 1195.74, 'KWN 9M BGN Curncy': 1196.15, 'KWN 12M BGN Curncy': 1195.99, 'KWN 2Y BGN Curncy': 1195.57, 'KWN 3Y BGN Curncy': 1194.0, 'KWN 4Y BGN Curncy': nan, 'KWN 5Y BGN Curncy': 1188.95, 'IRN 1W BGN Curncy': 74.61, 'IRN 1M BGN Curncy': 74.83, 'IRN 2M BGN Curncy': 75.07, 'IRN 3M BGN Curncy': 75.51, 'IRN 6M BGN Curncy': 76.37, 'IRN 9M BGN Curncy': 77.22, 'IRN 12M BGN Curncy': 78.07, 'IRN 2Y BGN Curncy': 81.63, 'IRN 3Y BGN Curncy': nan, 'IRN 4Y BGN Curncy': 87.98, 'IRN 5Y BGN Curncy': 91.65, 'NTN 1W BGN Curncy': 27.576, 'NTN 1M BGN Curncy': 27.517, 'NTN 2M BGN Curncy': 27.442, 'NTN 3M BGN Curncy': 27.372, 'NTN 6M BGN Curncy': 27.174, 'NTN 9M BGN Curncy': 26.98, 'NTN 12M BGN Curncy': 26.784, 'NTN 2Y BGN Curncy': nan, 'NTN 3Y BGN Curncy': nan, 'NTN 4Y BGN Curncy': nan, 'NTN 5Y BGN Curncy': nan, 'IHN 1W BGN Curncy': 14337.8, 'IHN 1M BGN Curncy': 14369.2, 'IHN 2M BGN Curncy': 14417.0, 'IHN 3M BGN Curncy': 14448.8, 'IHN 6M BGN Curncy': 14595.9, 'IHN 9M BGN Curncy': 14703.8, 'IHN 12M BGN Curncy': 14896.0, 'IHN 2Y BGN Curncy': 15504.8, 'IHN 3Y BGN Curncy': nan, 'IHN 4Y BGN Curncy': nan, 'IHN 5Y BGN Curncy': nan, 'PPN 1W BGN Curncy': 51.58, 'PPN 1M BGN Curncy': 51.81, 'PPN 2M BGN Curncy': 52.01, 'PPN 3M BGN Curncy': 52.15, 'PPN 6M BGN Curncy': 52.56, 'PPN 9M BGN Curncy': 52.89, 'PPN 12M BGN Curncy': 53.17, 'PPN 2Y BGN Curncy': 54.32, 'PPN 3Y BGN Curncy': 55.68, 'PPN 4Y BGN Curncy': 56.46, 'PPN 5Y BGN Curncy': 57.72, 'CCN 1W BGN Curncy': 6.361, 'CCN 1M BGN Curncy': 6.373, 'CCN 2M BGN Curncy': 6.3853, 'CCN 3M BGN Curncy': 6.3976, 'CCN 6M BGN Curncy': 6.428, 'CCN 9M BGN Curncy': 6.4541, 'CCN 12M BGN Curncy': 6.4776, 'CCN 2Y BGN Curncy': 6.5653, 'CCN 3Y BGN Curncy': 6.6229, 'CCN 4Y BGN Curncy': 6.7332, 'CCN 5Y BGN Curncy': 6.8305}, 'ASK': {'KWN 1W BGN Curncy': 1193.65, 'KWN 1M BGN Curncy': 1194.46, 'KWN 2M BGN Curncy': 1195.2, 'KWN 3M BGN Curncy': 1195.72, 'KWN 6M BGN Curncy': 1197.06, 'KWN 9M BGN Curncy': 1197.48, 'KWN 12M BGN Curncy': 1197.81, 'KWN 2Y BGN Curncy': 1197.28, 'KWN 3Y BGN Curncy': 1195.0, 'KWN 4Y BGN Curncy': nan, 'KWN 5Y BGN Curncy': 1189.95, 'IRN 1W BGN Curncy': 74.65, 'IRN 1M BGN Curncy': 74.88, 'IRN 2M BGN Curncy': 75.12, 'IRN 3M BGN Curncy': 75.56, 'IRN 6M BGN Curncy': 76.42, 'IRN 9M BGN Curncy': 77.28, 'IRN 12M BGN Curncy': 78.14, 'IRN 2Y BGN Curncy': 81.68, 'IRN 3Y BGN Curncy': nan, 'IRN 4Y BGN Curncy': 89.98, 'IRN 5Y BGN Curncy': 91.99, 'NTN 1W BGN Curncy': 27.606, 'NTN 1M BGN Curncy': 27.533, 'NTN 2M BGN Curncy': 27.472, 'NTN 3M BGN Curncy': 27.402, 'NTN 6M BGN Curncy': 27.204, 'NTN 9M BGN Curncy': 27.014, 'NTN 12M BGN Curncy': 26.829, 'NTN 2Y BGN Curncy': nan, 'NTN 3Y BGN Curncy': nan, 'NTN 4Y BGN Curncy': nan, 'NTN 5Y BGN Curncy': nan, 'IHN 1W BGN Curncy': 14378.0, 'IHN 1M BGN Curncy': 14401.0, 'IHN 2M BGN Curncy': 14439.0, 'IHN 3M BGN Curncy': 14499.7, 'IHN 6M BGN Curncy': 14652.1, 'IHN 9M BGN Curncy': 14803.2, 'IHN 12M BGN Curncy': 14965.0, 'IHN 2Y BGN Curncy': 15545.2, 'IHN 3Y BGN Curncy': nan, 'IHN 4Y BGN Curncy': nan, 'IHN 5Y BGN Curncy': nan, 'PPN 1W BGN Curncy': 51.63, 'PPN 1M BGN Curncy': 51.86, 'PPN 2M BGN Curncy': 52.07, 'PPN 3M BGN Curncy': 52.22, 'PPN 6M BGN Curncy': 52.6, 'PPN 9M BGN Curncy': 52.99, 'PPN 12M BGN Curncy': 53.29, 'PPN 2Y BGN Curncy': 54.4, 'PPN 3Y BGN Curncy': 55.8, 'PPN 4Y BGN Curncy': 57.06, 'PPN 5Y BGN Curncy': 58.1, 'CCN 1W BGN Curncy': 6.366, 'CCN 1M BGN Curncy': 6.3781, 'CCN 2M BGN Curncy': 6.3911, 'CCN 3M BGN Curncy': 6.4026, 'CCN 6M BGN Curncy': 6.433, 'CCN 9M BGN Curncy': 6.4591, 'CCN 12M BGN Curncy': 6.4846, 'CCN 2Y BGN Curncy': 6.5753, 'CCN 3Y BGN Curncy': 6.6441, 'CCN 4Y BGN Curncy': 6.7483, 'CCN 5Y BGN Curncy': 6.8405}, 'MID': {'KWN 1W BGN Curncy': 1193.24, 'KWN 1M BGN Curncy': 1193.96, 'KWN 2M BGN Curncy': 1194.7, 'KWN 3M BGN Curncy': 1195.2, 'KWN 6M BGN Curncy': 1196.4, 'KWN 9M BGN Curncy': 1196.82, 'KWN 12M BGN Curncy': 1196.9, 'KWN 2Y BGN Curncy': 1196.42, 'KWN 3Y BGN Curncy': 1194.5, 'KWN 4Y BGN Curncy': nan, 'KWN 5Y BGN Curncy': 1189.45, 'IRN 1W BGN Curncy': 74.63, 'IRN 1M BGN Curncy': 74.85, 'IRN 2M BGN Curncy': 75.09, 'IRN 3M BGN Curncy': 75.53, 'IRN 6M BGN Curncy': 76.4, 'IRN 9M BGN Curncy': 77.25, 'IRN 12M BGN Curncy': 78.1, 'IRN 2Y BGN Curncy': 81.65, 'IRN 3Y BGN Curncy': nan, 'IRN 4Y BGN Curncy': 88.98, 'IRN 5Y BGN Curncy': 91.82, 'NTN 1W BGN Curncy': 27.591, 'NTN 1M BGN Curncy': 27.525, 'NTN 2M BGN Curncy': 27.457, 'NTN 3M BGN Curncy': 27.387, 'NTN 6M BGN Curncy': 27.189, 'NTN 9M BGN Curncy': 26.997, 'NTN 12M BGN Curncy': 26.806, 'NTN 2Y BGN Curncy': nan, 'NTN 3Y BGN Curncy': nan, 'NTN 4Y BGN Curncy': nan, 'NTN 5Y BGN Curncy': nan, 'IHN 1W BGN Curncy': 14357.9, 'IHN 1M BGN Curncy': 14385.1, 'IHN 2M BGN Curncy': 14428.0, 'IHN 3M BGN Curncy': 14474.2, 'IHN 6M BGN Curncy': 14624.0, 'IHN 9M BGN Curncy': 14753.5, 'IHN 12M BGN Curncy': 14930.5, 'IHN 2Y BGN Curncy': 15525.0, 'IHN 3Y BGN Curncy': nan, 'IHN 4Y BGN Curncy': nan, 'IHN 5Y BGN Curncy': nan, 'PPN 1W BGN Curncy': 51.6, 'PPN 1M BGN Curncy': 51.83, 'PPN 2M BGN Curncy': 52.04, 'PPN 3M BGN Curncy': 52.18, 'PPN 6M BGN Curncy': 52.58, 'PPN 9M BGN Curncy': 52.94, 'PPN 12M BGN Curncy': 53.23, 'PPN 2Y BGN Curncy': 54.36, 'PPN 3Y BGN Curncy': 55.74, 'PPN 4Y BGN Curncy': 56.76, 'PPN 5Y BGN Curncy': 57.91, 'CCN 1W BGN Curncy': 6.3635, 'CCN 1M BGN Curncy': 6.3755, 'CCN 2M BGN Curncy': 6.3882, 'CCN 3M BGN Curncy': 6.4001, 'CCN 6M BGN Curncy': 6.4305, 'CCN 9M BGN Curncy': 6.4566, 'CCN 12M BGN Curncy': 6.4811, 'CCN 2Y BGN Curncy': 6.5703, 'CCN 3Y BGN Curncy': 6.6335, 'CCN 4Y BGN Curncy': 6.7408, 'CCN 5Y BGN Curncy': 6.8355}, 'LAST_BID_TIME_TODAY_REALTIME': {'KWN 1W BGN Curncy': datetime.time(20, 34, 41), 'KWN 1M BGN Curncy': datetime.time(20, 34, 35), 'KWN 2M BGN Curncy': datetime.time(20, 34, 56), 'KWN 3M BGN Curncy': datetime.time(20, 34, 56), 'KWN 6M BGN Curncy': datetime.time(20, 34, 56), 'KWN 9M BGN Curncy': datetime.time(20, 34, 56), 'KWN 12M BGN Curncy': datetime.time(20, 34, 56), 'KWN 2Y BGN Curncy': datetime.time(20, 34, 56), 'KWN 3Y BGN Curncy': datetime.time(20, 34, 34), 'KWN 4Y BGN Curncy': nan, 'KWN 5Y BGN Curncy': datetime.time(20, 31, 31), 'IRN 1W BGN Curncy': datetime.time(19, 50, 20), 'IRN 1M BGN Curncy': datetime.time(20, 34, 49), 'IRN 2M BGN Curncy': datetime.time(20, 34, 48), 'IRN 3M BGN Curncy': datetime.time(20, 34, 48), 'IRN 6M BGN Curncy': datetime.time(20, 34, 48), 'IRN 9M BGN Curncy': datetime.time(20, 34, 43), 'IRN 12M BGN Curncy': datetime.time(20, 34, 48), 'IRN 2Y BGN Curncy': datetime.time(20, 32, 12), 'IRN 3Y BGN Curncy': nan, 'IRN 4Y BGN Curncy': datetime.time(8, 45, 3), 'IRN 5Y BGN Curncy': datetime.time(20, 31, 35), 'NTN 1W BGN Curncy': datetime.time(20, 31, 17), 'NTN 1M BGN Curncy': datetime.time(20, 34, 35), 'NTN 2M BGN Curncy': datetime.time(20, 31, 30), 'NTN 3M BGN Curncy': datetime.time(20, 31, 30), 'NTN 6M BGN Curncy': datetime.time(20, 31, 30), 'NTN 9M BGN Curncy': datetime.time(18, 0, 42), 'NTN 12M BGN Curncy': datetime.time(18, 0, 42), 'NTN 2Y BGN Curncy': nan, 'NTN 3Y BGN Curncy': NaT, 'NTN 4Y BGN Curncy': NaT, 'NTN 5Y BGN Curncy': NaT, 'IHN 1W BGN Curncy': datetime.time(20, 34, 26), 'IHN 1M BGN Curncy': datetime.time(20, 33, 51), 'IHN 2M BGN Curncy': datetime.time(20, 34, 11), 'IHN 3M BGN Curncy': datetime.time(20, 33, 51), 'IHN 6M BGN Curncy': datetime.time(20, 33, 51), 'IHN 9M BGN Curncy': datetime.time(20, 0, 21), 'IHN 12M BGN Curncy': datetime.time(20, 33, 51), 'IHN 2Y BGN Curncy': datetime.time(19, 27, 44), 'IHN 3Y BGN Curncy': nan, 'IHN 4Y BGN Curncy': NaT, 'IHN 5Y BGN Curncy': NaT, 'PPN 1W BGN Curncy': datetime.time(20, 34, 11), 'PPN 1M BGN Curncy': datetime.time(20, 33, 54), 'PPN 2M BGN Curncy': datetime.time(20, 33, 54), 'PPN 3M BGN Curncy': datetime.time(20, 33, 54), 'PPN 6M BGN Curncy': datetime.time(20, 33, 54), 'PPN 9M BGN Curncy': datetime.time(19, 46, 19), 'PPN 12M BGN Curncy': datetime.time(20, 33, 54), 'PPN 2Y BGN Curncy': datetime.time(16, 5, 40), 'PPN 3Y BGN Curncy': datetime.time(20, 34, 56), 'PPN 4Y BGN Curncy': datetime.time(20, 34, 56), 'PPN 5Y BGN Curncy': datetime.time(20, 34, 56), 'CCN 1W BGN Curncy': datetime.time(20, 34, 28), 'CCN 1M BGN Curncy': datetime.time(20, 34, 28), 'CCN 2M BGN Curncy': datetime.time(20, 34, 28), 'CCN 3M BGN Curncy': datetime.time(20, 34, 28), 'CCN 6M BGN Curncy': datetime.time(20, 34, 28), 'CCN 9M BGN Curncy': datetime.time(20, 34, 28), 'CCN 12M BGN Curncy': datetime.time(20, 34, 28), 'CCN 2Y BGN Curncy': datetime.time(20, 32, 13), 'CCN 3Y BGN Curncy': datetime.time(20, 32, 40), 'CCN 4Y BGN Curncy': datetime.time(20, 32, 13), 'CCN 5Y BGN Curncy': datetime.time(20, 23, 29)}, 'SETTLEMENT_DATE_RT': {'KWN 1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'KWN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'KWN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'KWN 3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'KWN 6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'KWN 9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'KWN 12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'KWN 2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'KWN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'KWN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'KWN 5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'IRN 1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'IRN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'IRN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'IRN 3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'IRN 6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'IRN 9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'IRN 12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'IRN 2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'IRN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'IRN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'IRN 5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'NTN 1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'NTN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'NTN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'NTN 3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'NTN 6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'NTN 9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'NTN 12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'NTN 2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'NTN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'NTN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'NTN 5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'IHN 1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'IHN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'IHN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'IHN 3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'IHN 6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'IHN 9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'IHN 12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'IHN 2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'IHN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'IHN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'IHN 5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0), 'PPN 1W BGN Curncy': datetime.datetime(2022, 1, 26, 0, 0), 'PPN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'PPN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'PPN 3M BGN Curncy': datetime.datetime(2022, 4, 19, 0, 0), 'PPN 6M BGN Curncy': datetime.datetime(2022, 7, 19, 0, 0), 'PPN 9M BGN Curncy': datetime.datetime(2022, 10, 19, 0, 0), 'PPN 12M BGN Curncy': datetime.datetime(2023, 1, 19, 0, 0), 'PPN 2Y BGN Curncy': datetime.datetime(2024, 1, 19, 0, 0), 'PPN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'PPN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'PPN 5Y BGN Curncy': datetime.datetime(2027, 1, 19, 0, 0), 'CCN 1W BGN Curncy': datetime.datetime(2022, 1, 27, 0, 0), 'CCN 1M BGN Curncy': datetime.datetime(2022, 2, 22, 0, 0), 'CCN 2M BGN Curncy': datetime.datetime(2022, 3, 21, 0, 0), 'CCN 3M BGN Curncy': datetime.datetime(2022, 4, 20, 0, 0), 'CCN 6M BGN Curncy': datetime.datetime(2022, 7, 20, 0, 0), 'CCN 9M BGN Curncy': datetime.datetime(2022, 10, 20, 0, 0), 'CCN 12M BGN Curncy': datetime.datetime(2023, 1, 20, 0, 0), 'CCN 2Y BGN Curncy': datetime.datetime(2024, 1, 22, 0, 0), 'CCN 3Y BGN Curncy': datetime.datetime(2025, 1, 21, 0, 0), 'CCN 4Y BGN Curncy': datetime.datetime(2026, 1, 20, 0, 0), 'CCN 5Y BGN Curncy': datetime.datetime(2027, 1, 20, 0, 0)}}

CodePudding user response:

It seems like it should be quicker to groupby and then interpolate. Unfortunately, when I run your code I don't actually get the "filtered interpolated DF" that you list (perhaps you've left out some part of the interpolate where you specify that it should be 15 minute intervals?). You get a slight speedup if you use str.startswith instead of str[:3]:

%%timeit
for ccy in ccy_prefix:
  df[df.index.str[:3]==ccy] = df[df.index.str[:3]==ccy].interpolate(limit_direction='forward')
% 10 loops, best of 5: 25.9 ms per loop

As opposed to:

%%timeit
for ccy in ccy_prefix:
  df[df.index.str.startswith(ccy)] = df[df.index.str.startswith(ccy)].interpolate(limit_direction='forward')
% 10 loops, best of 5: 24.1 ms per loop

Perhaps a better solution is to create a new column with the currency prefixes and then groupby and interpolate, going from a comment provided here.

df['ccy_prefix'] = df.index.str[:3]

def interpolator(df):
  return(df.interpolate(limit_direction='forward'))

Then this should be quickest of them all:

df = df.groupby('ccy_prefix').apply(interpolator)
  •  Tags:  
  • Related