For a university work, we have some geographic coordinates of world locations in a pandas dataframe:
df = pd.DataFrame({'NAME': ['Paris', 'New York', 'Rio', 'Airport GRU', 'ORLY'],
'GEO': ['POINT (2.31647 48.85)',
'POINT (-73.993457389558 40.731499671618)',
'POINT (-43.2 -22.9)',
'POINT (-46.47313507388693 -23.429382262746415)',
'GEOMETRYCOLLECTION EMPTY']})
print(df)
NAME GEO
Paris POINT (2.31647 48.85)
New York POINT (-73.993457389558 40.731499671618)
Rio POINT (-43.2 -22.9)
Airport GRU POINT (-46.47313507388693 -23.429382262746415)
ORLY GEOMETRYCOLLECTION EMPTY
I would like to edit the 'GEO' column. Initially, I would like to disregard the word 'POINT', then I would like to put it in (Latitude, Longitude) format, as the column is in POINT (Longitude Latitude) format (and it's also being a comma).
To solve this, I created two separate columns to store the LAT and LONG (this part is working):
df2 = df.join(df['GEO'].str.extract(r'(?P<LONG>-?\d \.\d ) (?P<LAT>-?\d \.\d )').astype(float))
print(df2)
NAME GEO LONG LAT
Paris POINT (2.31647 48.85) 2.316470 48.85000
New York POINT (-73.993457389558 40.731499671618) -73.993457 40.731500
Rio POINT (-43.2 -22.9) -43.200000 -22.900000
Airport GRU POINT (-46.47313507388693 -23.429382262746415) -46.473135 -23.429382
ORLY GEOMETRYCOLLECTION EMPTY NaN NaN
However, when I try to create a new column to receive the format: (LAT, LONG). The code doesn't work:
df2['Result'] = "(" df2['LAT'] "," df2['LONG'] ")"
The error appears: "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U1'), dtype('float64')) -> None"
I would like the output to be:
NAME GEO
Paris (48.85, 2.31647)
New York (40.731499671618, -73.993457389558)
Rio (-22.9, -43.2)
Airport GRU (-23.429382262746415, -46.47313507388693)
ORLY GEOMETRYCOLLECTION EMPTY]})
CodePudding user response:
You can use str.replace
:
df['GEO'] = df['GEO'].str.replace('POINT\s \((.*)\s (.*)\)', r'(\2, \1)', regex=True)
print(df)
# Output
NAME GEO
0 Paris (48.85, 2.31647)
1 New York (40.731499671618, -73.993457389558)
2 Rio (-22.9, -43.2)
3 Airport GRU (-23.429382262746415, -46.47313507388693)
4 ORLY GEOMETRYCOLLECTION EMPTY
CodePudding user response:
We can try using str.extract
here:
df[["LONG", "LAT"]] = df["GEO"].str.extract(r'POINT \((-?\d (?:\.\d )?) (-?\d (?:\.\d )?)\)')
For entries which don't match the POINT
format, the above would assign a missing value. You may replace this with whatever placeholder you want.