Home > Software design >  How to use interpn to interpolate in a dataframe?
How to use interpn to interpolate in a dataframe?

Time:01-03

I am trying to interpolate a dataframe but am having no luck. I have a dataframe with a distance header and a wind component header that I am working with.

The wind components are split with a 20 unit difference and the distance by 10. I would like to be able to interpolate to within 1 of each unit but I'm stuck.

I haven't used Scipy before this and I can't see much in the way of explanations in their docs (that I can understand).

I have a table that I converted to_dict and use that for the dataframe:

data = {'dist': [100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250, 260, 270, 280, 290, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420],
     '-60': [520, 600, 670, 740, 810, 880, 950, 1020, 1100, 1170, 1240, 1310, 1380, 1450, 1520, 1600, 1670, 1740, 1810, 1880, 1950, 2020, 2100, 2170, 2240, 2310, 2380, 2450, 2530, 2600, 2670, 2740, 2810],
     '-40': [440, 500, 570, 630, 690, 760, 820, 880, 950, 1010, 1070, 1140, 1200, 1260, 1330, 1390, 1450, 1510, 1580, 1640, 1700, 1770, 1830, 1890, 1960, 2020, 2080, 2150, 2210, 2270, 2340, 2400, 2460],
     '-20': [380, 430, 490, 550, 600, 660, 720, 770, 830, 880, 940, 1000, 1050, 1110, 1170, 1220, 1280, 1340, 1390, 1450, 1510, 1560, 1620, 1680, 1730, 1790, 1850, 1900, 1960, 2020, 2070, 2130, 2190],
     '0': [320, 370, 420, 480, 530, 580, 630, 680, 730, 780, 830, 890, 940, 990, 1040, 1090, 1140, 1190, 1240, 1300, 1350, 1400, 1450, 1500, 1550, 1600, 1650, 1710, 1760, 1810, 1860, 1910, 1960],
     '20': [280, 320, 370, 420, 470, 510, 560, 610, 650, 700, 750, 790, 840, 890, 930, 980, 1030, 1070, 1120, 1170, 1210, 1260, 1310, 1350, 1400, 1450, 1500, 1540, 1590, 1640, 1680, 1730, 1780],
     '40': [240, 280, 330, 370, 410, 460, 500, 540, 590, 630, 670, 720, 760, 800, 840, 890, 930, 970, 1020, 1060, 1100, 1150, 1190, 1230, 1280, 1320, 1360, 1400, 1450, 1490, 1530, 1580, 1620],
     '60': [210, 250, 290, 330, 370, 410, 450, 490, 530, 570, 610, 650, 690, 730, 770, 810, 850, 890, 930, 970, 1010, 1050, 1090, 1130, 1170, 1210, 1250, 1290, 1330, 1370, 1410, 1450, 1490]}
df = pd.DataFrame(data).set_index(['dist'])
df.columns = df.columns.map(float)
df.columns.name = 'wind'
print(df)

Printing this gives me:

wind  -60.0  -40.0  -20.0   0.0    20.0   40.0   60.0
dist                                                 
100     520    440    380    320    280    240    210
110     600    500    430    370    320    280    250
120     670    570    490    420    370    330    290
130     740    630    550    480    420    370    330
140     810    690    600    530    470    410    370
150     880    760    660    580    510    460    410
160     950    820    720    630    560    500    450
170    1020    880    770    680    610    540    490
180    1100    950    830    730    650    590    530
190    1170   1010    880    780    700    630    570
200    1240   1070    940    830    750    670    610
210    1310   1140   1000    890    790    720    650
220    1380   1200   1050    940    840    760    690
230    1450   1260   1110    990    890    800    730
240    1520   1330   1170   1040    930    840    770
250    1600   1390   1220   1090    980    890    810
260    1670   1450   1280   1140   1030    930    850
270    1740   1510   1340   1190   1070    970    890
280    1810   1580   1390   1240   1120   1020    930
290    1880   1640   1450   1300   1170   1060    970
300    1950   1700   1510   1350   1210   1100   1010
310    2020   1770   1560   1400   1260   1150   1050
320    2100   1830   1620   1450   1310   1190   1090
330    2170   1890   1680   1500   1350   1230   1130
340    2240   1960   1730   1550   1400   1280   1170
350    2310   2020   1790   1600   1450   1320   1210
360    2380   2080   1850   1650   1500   1360   1250
370    2450   2150   1900   1710   1540   1400   1290
380    2530   2210   1960   1760   1590   1450   1330
390    2600   2270   2020   1810   1640   1490   1370
400    2670   2340   2070   1860   1680   1530   1410
410    2740   2400   2130   1910   1730   1580   1450
420    2810   2460   2190   1960   1780   1620   1490

Which is all fine so far. Now what I'm stuck on is how to interpolate so that I can get accurate figures from it. I'm trying to use interpn but I'm obviously doing it wrong. Here is what I'm doing to try and get an interpolated figure for a wind component of -35 and a distance of 103:

arr = np.dstack(np.array_split(df.to_numpy(), 1))
wind = df.columns.to_numpy()
dist = df.index.get_level_values(0).unique().to_numpy()

print(interpn((wind, dist), arr, [float(-35), int(103)]))

To which I get an error of:

ValueError: There are 7 points and 33 values in dimension 0

I have tried reading through the docs but can't seem to get my head around it and all the examples I find elsewhere are for graphical data.

Can someone please help me figure this out, I'm pretty new to this kind of work. Thank you :)

CodePudding user response:

There's no need to transform your data, you already have a 2D array and can use it as-is. You got the axes wrong: the first axis (axis 0) is the rows of the dataframe, the second axis (axis 1) the columns.

arr = df.to_numpy()
dist = df.index.to_numpy()
wind = df.columns.to_numpy()

x, y = np.meshgrid(wind, dist)
print(interpn((dist, wind), arr, [103, -35]))
# array([442.25])

As an alternative, you can also use itnerp2d, here are the axes just the other way round:

f = interp2d(wind, dist, arr)
print(f(-35, 103))
#array([442.25])
  • Related