I am using this dataset: https://opendataportal-lasvegas.opendata.arcgis.com/datasets/restaurant-inspections-open-data/explore
In this dataset, there is a feature 'Location_1'
which involves the coordinates of locations of restaurants:
Location_1
0 (36.12976350, -115.31507320)
1 (36.10882500, -115.32125900)
2 (36.07411000, -115.08284000)
3 (36.07411000, -115.08284000)
4 (36.12734880, -115.14185020)
I want to create two new columns in my DataFrame, one for latitude and one for longitude.
I naively tried to generate the column vectors first with the commands
lat = df['Location_1'][:,0]
long = df['Location_1'][:,1]
but it did not work. It looks like that the entries in df['Location_1']
are strings.
For instance, I tried
In [5]: df['Location_1'][1][1:30]
Out[5]: '36.10882500, -115.32125900)'
But I want to get floats in the latitude and longitude columns.
CodePudding user response:
You can use .apply
to map pd.Series
to each column after processing the strings:
In [2]: df
Out[2]:
Location_1
0 (36.12976350, -115.31507320)
1 (36.10882500, -115.32125900)
2 (36.07411000, -115.08284000)
3 (36.07411000, -115.08284000)
4 (36.12734880, -115.14185020)
In [3]: df = df.Location_1.str.strip("()").str.split(", ")
In [4]: df = df.apply(pd.Series).astype(float).rename(columns={0: "lat", 1: "lng"})
In [5]: df
Out[5]:
lat lng
0 36.129764 -115.315073
1 36.108825 -115.321259
2 36.074110 -115.082840
3 36.074110 -115.082840
4 36.127349 -115.141850
Or in a single line:
df = df.Location_1.str.strip("()").str.split(", ").apply(pd.Series).astype(float)
Tack on the rename if you want: .rename(columns={0: "lat", 1: "lng"})
CodePudding user response:
This also works:
df=df['Location_1'].str.split(pat=None, n=-1, expand=True)
df[0] = df[0].str[1:]
df[0] = df[0].str[:-1]
df[1] = df[1].str[:-1]