Home > Blockchain >  How do interpolate values between two date columns in my pandas dataframe?
How do interpolate values between two date columns in my pandas dataframe?

Time:08-03

I would like to add a new column between each of the date columns, with the interpolated value using the previous and next value to get a month value.

data = [['Jane', 10,11,45,66,21], ['John',11,55,34,44,22],['Tom',23,43,12,11,44]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec'])

This returns the following:

enter image description here

In between each column after the first one, I would like to add one which contains the month preceding it, and the interpolated value based on the preceding and next column.

So eg:

enter image description here

I tried to first add a column between each one using the following code:

N = len(df.columns) #
for i in range(0,N): #
    df.insert(i,'','',allow_duplicates=True)

But this only adds columns to the left of the table, not between each one. Once I had added the columns, I was thinking of writing a function to perform the linear interpolation.

Does anyone have a suggestion on the correct way around to tackle this?

CodePudding user response:

# Make Name your index, and Transpose:
df = df.set_index('Name').T

# Convert index to datetime:
df.index = pd.to_datetime(df.index, format='%d-%b')

# Create new values for each month:
new_index_vals = pd.date_range(df.index.min(), df.index.max(), freq='MS')

# Reindex, including these new values:
df = df.reindex(df.index.union(new_index_vals))

# Apply interpolation, accounting for time:
df = df.interpolate('time') # You can also choose just `linear` here~

# Convert back to original format, formatting month_starts differently:
df.index = np.where(df.index.is_month_start, 
                    df.index.strftime('%B'), 
                    df.index.strftime('%d-%b'))

# Transpose back to original format:
df = df.T.reset_index()
print(df.round(2))

Output:

   Name  09-Aug  September  02-Sep  October  18-Oct  November  02-Nov  December  14-Dec
0  Jane    10.0      10.96    11.0    32.43    45.0     64.60    66.0     34.93    21.0
1  John    11.0      53.17    55.0    41.76    34.0     43.33    44.0     28.81    22.0
2   Tom    23.0      42.17    43.0    23.46    12.0     11.07    11.0     33.79    44.0

CodePudding user response:

from numpy import nan
import pandas


# Adding a few values
data = [['Jane', 10,11,45,66,21, 45, 70, 60, 80], ['John',11,55,34,44,22, 32, 24, 25, 45],['Tom',23,43,12,11,44, 77, 108, 24, 45]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec', 'TestDate1', 'TestDate2', 'TestDate3', 'TestDate4'])

df_c = df.drop('Name', axis=1)
for i in range(1, len(df.columns)   len(df.columns)-3, 2):
    df_c.insert(i, 'Interpolated', pd.Series([nan] * len(df.index)), allow_duplicates=True)

df[['Name']].join(df_c.interpolate(axis=1))

Output

Now you could iterate over all 'Interpolated' columns and use your existing month short form from the left but without the number prefix. The first column would then be called 'Aug'.

  • Related