I have the following csv file which I am reading using pandas dataframe:
Timestamp, UTC, id, loc, spd
001, 12z, q20, "52, 13", 320
002, 13z, a32, "53, 12", 321
003, 14z, q32, "54, 11", 321
004, 15`, a43, "55, 10", 330
I am extracting the data as follows:
import pandas as pd
import matplotlib.pyplot as plt
fname = "data.csv"
data = pd.read_csv(fname,sep=",", header=None, skiprows=1)
data.columns = ["Timestamp", "UTC", "Callsign", "Position", "Speed", "Direction"]
t = data["Timestamp"]
utc = data["UTC"]
acid = data["Callsign"]
pos = data["Position"]
spd = ["Speed"]
However, for the position column, this results in 2 values per row in this column. I would like to have the first value of position in a separate list as well as the second value in a separate list as follows:
latitude = [52,53,54,55]
longitude = [13,12,11,10]
How do I select this using the pandas dataframe?
CodePudding user response:
Use Series.str.strip
with Series.str.split
if need 2 new columns, then cast to floats:
data[['lat','lon']] = (data["Position"].str.strip('"')
.str.split(',\s ', expand=True)
.astype(float))
print (data)
Timestamp UTC Callsign Position Speed lat lon
0 1 12z q20 "52, 13" 320 52.0 13.0
1 2 13z a32 "53, 12" 321 53.0 12.0
2 3 14z q32 "54, 11" 321 54.0 11.0
3 4 15` a43 "55, 10" 330 55.0 10.0
If need 2 lists:
lat, lon = (data["Position"].str.strip('"')
.str.split(',\s ', expand=True)
.astype(float)
.to_numpy()
.T.tolist())
print (lat, lon)
[52.0, 53.0, 54.0, 55.0] [13.0, 12.0, 11.0, 10.0]
CodePudding user response:
We can use str.extract
here, followed by a cast:
data[["lat", "lng"]] = data["Position"].str.extract(r'(-?\d (?:\.\d )?),\s*(-?\d (?:\.\d )?)')
data["lat"] = data["lat"].astype(float)
data["lng"] = data["lng"].astype(float)