Home > other >  Performing calculations on DataFrames of different lengths
Performing calculations on DataFrames of different lengths

Time:10-27

I have two different DataFrames that look something like this:

Lat Lon
28.13 -87.62
28.12 -87.65
...... ......
Calculated_Dist_m
34.5
101.7
..............

The first DataFrame (name=df) (consisting of the Lat and Lon columns) has just over 1000 rows (values) in it. The second DataFrame (name=new_calc_dist) (consisting of the Calculated_Dist_m column) has over 30000 rows (values) in it. I want to determine the new longitude and latitude coordinates using the Lat, Lon, and Calculated_Dist_m columns. Here is the code I've tried:

r_earth = 6371000
new_lat = df['Lat']   (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi)
new_lon = df['Lon']   (new_calc_dist['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)

When I run the code, however, it only gives me new calculations for certain index values, and gives me NaNs for the rest. I'm not entirely sure how I should go about writing the code so that new longitude and latitude points are calculated for each of over 30000 row values based on the initial 1000 longitude and latitude points. Any suggestions?

EDIT

Here would be some sample outputs. Note that these are not exact figures, but give the idea.

Lat Lon
28.13 -87.62
28.12 -87.65
28.12 -87.63
..... ......
Calculated_Dist_m
34.5
101.7
28.6
30.8
76.5
.................

And so the sample out put would be:

Lat Lon
28.125 -87.625
28.15 -87.61
28.127 -87.623
28.128 -87.623
28.14 -87.615
28.115 -87.655
28.14 -87.64
28.117 -87.653
28.118 -87.653
28.15 -87.645
28.115 -87.635
28.14 -87.62
28.115 -87.613
28.117 -87.633
28.118 -87.633
...... .......

Again, these are just random outputs (I tried getting the exact calculations, but could not get it to work). But overall, this gives an idea of what would be wanted: taking the coordinates from the first dataframe and calculating new coordinates based on each of the calculated distances from the second dataframe.

CodePudding user response:

If I understood correctly and assuming df1 and df2 as input, you can perform a cross merge to get all combinations of df1 and df2 rows, then apply your computation (here as new columns Lat2/Lon2):

df = df1.merge(df2, how='cross')
r_earth = 6371000
df['Lat2'] = df['Lat']   (df['Calculated_Dist_m'] / r_earth) * (180/np.pi)
df['Lon2'] = df['Lon']   (df['Calculated_Dist_m'] / r_earth) * (180/np.pi) / np.cos(df['Lat'] * np.pi/180)

output:

      Lat    Lon  Calculated_Dist_m       Lat2       Lon2
0   28.13 -87.62               34.5  28.130310 -87.619648
1   28.13 -87.62              101.7  28.130915 -87.618963
2   28.13 -87.62               28.6  28.130257 -87.619708
3   28.13 -87.62               30.8  28.130277 -87.619686
4   28.13 -87.62               76.5  28.130688 -87.619220
5   28.12 -87.65               34.5  28.120310 -87.649648
6   28.12 -87.65              101.7  28.120915 -87.648963
7   28.12 -87.65               28.6  28.120257 -87.649708
8   28.12 -87.65               30.8  28.120277 -87.649686
9   28.12 -87.65               76.5  28.120688 -87.649220
10  28.12 -87.63               34.5  28.120310 -87.629648
11  28.12 -87.63              101.7  28.120915 -87.628963
12  28.12 -87.63               28.6  28.120257 -87.629708
13  28.12 -87.63               30.8  28.120277 -87.629686
14  28.12 -87.63               76.5  28.120688 -87.629220

CodePudding user response:

In case you just want the result as two 2D arrays (without repeats of the input, so also O[m*n] in memory but 2/5 of the requirement from the result of cross-join):

r_earth = 6371000
z = 180 / np.pi * new_calc_dist['Calculated_Dist_m'].values / r_earth
lat = df['Lat'].values
lon = df['Lon'].values

new_lat = lat[:, None]   z
new_lon = lon[:, None]   z / lat[:, None]

Example:

df = pd.DataFrame([[28.13, -87.62], [28.12, -87.65]], columns=['Lat', 'Lon'])
new_calc_dist = pd.DataFrame([[34.5], [101.7], [60.0]], columns=['Calculated_Dist_m'])

# result of above
>>> new_lat
array([[28.13031027, 28.13091461, 28.13053959],
       [28.12031027, 28.12091461, 28.12053959]])

>>> new_lon
array([[-87.61998897, -87.61996749, -87.61998082],
       [-87.64998897, -87.64996747, -87.64998081]])

If you do want those results as DataFrames:

kwargs = dict(index=df.index, columns=new_calc_dist.index)
new_lat = pd.DataFrame(new_lat, **kwargs)
new_lon = pd.DataFrame(new_lon, **kwargs)
  • Related