My goal is to output an organized file from a tuple of geographic data, including new columns with transformations of the data. The input tuple includes the following arrays: lats (latitudes), lons (longitudes), els (elevations), and ts (times in seconds). The rows of the tuples correspond to individual samples of data. I converted this tuple into a dataframe. Now I am looking to add columns to the dataframe using functions I already have written.
The problem is that these functions' arguments use multiple rows of the dataframe in order to work. Example:
def stepsize(lat1, long1, lat2, long2):
I want to add a column of "stepsizes" to my dataframe but that involves referencing both the current row and the previous row of data. How should I do this?
I have tried using df.apply(), passing a lambda function. In order to make that work, I defined local variables to hold previous latitude, longitude, and time:
def newdist_row(row):
if row["time"] < 72889:
rowprevlat = row['latitude']
rowprevlong = row['longitude']
rowprevtime = row['time']
return np.nan
else:
dist = stepsize_feet(row['latitude'], row['longitude'], rowprevlat, rowprevlong)
rowprevlat = row['latitude']
rowprevlong = row['longitude']
rowprevtime = row['time']
return dist
The apply method I wrote:
contents.apply(lambda row: newdist_row(row), axis=1)
This doesn't work due to reference before assignment; I tinkered with it and couldn't get it to work. I also tried loops, down to simply trying to add a new column with the correct "last-row" data, but got a "value is trying to be set on a copy of a slice from a DataFrame" warning.
CodePudding user response:
You can create prev_latitude
and prev_longitude
by doing:
Given df
:
latitude longitude
0 607385 435618
1 430603 435618
2 430603 435618
3 430603 435618
4 519445 435618
Doing:
df[['prev_latitude', 'prev_longitude']] = df[['latitude', 'longitude']].shift()
Output:
latitude longitude prev_latitude prev_longitude
0 607385 435618 <NA> <NA>
1 430603 435618 607385 435618
2 430603 435618 430603 435618
3 430603 435618 430603 435618
4 519445 435618 430603 435618
Now, you should be able to use your existing function:
df.apply(lambda x: stepsize(x.latitude,
x.longitude,
x.prev_latitude,
x.prev_longitude), axis=1)
(after you decided what to do with the first empty row...)
If you're not already using it... you should look into GeoPandas
, you may be re-inventing the wheel. Idk what stepsize means for you, but if it's anything like distance between the two points...
import geopandas as gp
geometry = gp.GeoSeries.from_xy(df.longitude, df.latitude)
df = gp.GeoDataFrame(df, geometry=geometry)
df['distance'] = df.geometry.distance(df.geometry.shift())
Then this does that for you.