formatting a dataframe with coordinate points-CodePudding

This is a beginner question, but I have a set of coordinate points formatted like

[ [ -75.526844, 39.655713 ], [ -75.526344, 39.656413 ], [ -75.522343, 39.660813 ], [ -75.518343, 39.663913 ], [ -75.514643, 39.668613 ], [ -75.511743, 39.674313 ], [ -75.509342, 39.685313 ], [ -75.509742, 39.686113 ], [ -75.509042, 39.694513 ], [ -75.507162, 39.696961 ], [ -75.504042, 39.698313 ], [ -75.496241, 39.701413 ], [ -75.491341, 39.711113 ], [ -75.488553, 39.714833 ], [ -75.485241, 39.715813 ], [ -75.483141, 39.715513 ], [ -75.481741, 39.714546 ], [ -75.478940, 39.713813 ], [ -75.477640, 39.715013 ], [ -75.476888, 39.718337 ], [ -75.477432, 39.720561 ], [ -75.477240, 39.724713 ], [ -75.475440, 39.728713 ], [ -75.475384, 39.731057 ], [ -75.474168, 39.735473 ], [ -75.469239, 39.743613 ], [ -75.466263, 39.750737 ], [ -75.466249, 39.750769 ], [ -75.463039, 39.758313 ], [ -75.463339, 39.761213 ]]

I want to make a dataframe that has one column for longitude and one for latitude for this data. How should I go about doing this?

CodePudding user response：

Well, using the basics of R, you can make use of the following implementation. I read the coordinates as a string. You can use R's readChar()

text = "[
[ -75.526844, 39.655713 ], [ -75.526344, 39.656413 ],
[ -75.522343, 39.660813 ], [ -75.518343, 39.663913 ],
[ -75.514643, 39.668613 ], [ -75.511743, 39.674313 ],
[ -75.509342, 39.685313 ], [ -75.509742, 39.686113 ],
[ -75.509042, 39.694513 ], [ -75.507162, 39.696961 ],
[ -75.504042, 39.698313 ], [ -75.496241, 39.701413 ],
[ -75.491341, 39.711113 ], [ -75.488553, 39.714833 ],
[ -75.485241, 39.715813 ], [ -75.483141, 39.715513 ],
[ -75.481741, 39.714546 ], [ -75.478940, 39.713813 ],
[ -75.477640, 39.715013 ], [ -75.476888, 39.718337 ],
[ -75.477432, 39.720561 ], [ -75.477240, 39.724713 ],
[ -75.475440, 39.728713 ], [ -75.475384, 39.731057 ],
[ -75.474168, 39.735473 ], [ -75.469239, 39.743613 ],
[ -75.466263, 39.750737 ], [ -75.466249, 39.750769 ],
[ -75.463039, 39.758313 ], [ -75.463339, 39.761213 ]]"

s = str_split(gsub('\n', ' ', text), ', ')[[1]]
s = gsub('\\[|\\]', '', s)
library(stringr)
s = str_trim(s)
df = data.frame(matrix(s, nc = 2, byrow = T))
colnames(df) = c('longitude', 'latitude')
head(df)

CodePudding user response：

Here's another possibility with tidyverse, where I read in the data as a string, then I extract only numeric, ., and -, then I make the values numeric and turn into a dataframe column. Next, I create an index, ind, that has the same value every other row (this will be the 2 columns). Next, I create a row number column, then pivot the data wide to get into two columns, then rename.

text <- "[
[ -75.526844, 39.655713 ], [ -75.526344, 39.656413 ],
[ -75.522343, 39.660813 ], [ -75.518343, 39.663913 ],
[ -75.514643, 39.668613 ], [ -75.511743, 39.674313 ],
[ -75.509342, 39.685313 ], [ -75.509742, 39.686113 ],
[ -75.509042, 39.694513 ], [ -75.507162, 39.696961 ],
[ -75.504042, 39.698313 ], [ -75.496241, 39.701413 ],
[ -75.491341, 39.711113 ], [ -75.488553, 39.714833 ],
[ -75.485241, 39.715813 ], [ -75.483141, 39.715513 ],
[ -75.481741, 39.714546 ], [ -75.478940, 39.713813 ],
[ -75.477640, 39.715013 ], [ -75.476888, 39.718337 ],
[ -75.477432, 39.720561 ], [ -75.477240, 39.724713 ],
[ -75.475440, 39.728713 ], [ -75.475384, 39.731057 ],
[ -75.474168, 39.735473 ], [ -75.469239, 39.743613 ],
[ -75.466263, 39.750737 ], [ -75.466249, 39.750769 ],
[ -75.463039, 39.758313 ], [ -75.463339, 39.761213 ]]"

library(tidyverse)

data.frame(Column = as.numeric(str_extract_all(text, "[0-9.-] ")[[1]])) %>%
  group_by(ind = rep(1:2, length.out = n())) %>%
  mutate(rn = row_number()) %>%
  ungroup %>%
  pivot_wider(names_from = ind, values_from = Column) %>%
  select(-rn) %>%
  rename("longitude" = 1, "latitude" = 2)

Output

   longitude latitude
1  -75.52684 39.65571
2  -75.52634 39.65641
3  -75.52234 39.66081
4  -75.51834 39.66391
5  -75.51464 39.66861
6  -75.51174 39.67431
7  -75.50934 39.68531
8  -75.50974 39.68611
9  -75.50904 39.69451
10 -75.50716 39.69696
11 -75.50404 39.69831
12 -75.49624 39.70141
13 -75.49134 39.71111
14 -75.48855 39.71483
15 -75.48524 39.71581
16 -75.48314 39.71551
17 -75.48174 39.71455
18 -75.47894 39.71381
19 -75.47764 39.71501
20 -75.47689 39.71834
21 -75.47743 39.72056
22 -75.47724 39.72471
23 -75.47544 39.72871
24 -75.47538 39.73106
25 -75.47417 39.73547
26 -75.46924 39.74361
27 -75.46626 39.75074
28 -75.46625 39.75077
29 -75.46304 39.75831
30 -75.46334 39.76121

If you have access to the GEOJson, then it is a little easier to convert. For example, if the data is hosted on a URL, then you could do something like below. You can also convert the excel file with the data into a .txt, then use that to bring in the data (e.g., geojsonsf::geojson_sf("~/Downloads/Alaska.txt"))

library(geojsonsf)
library(sf)

sf <- geojsonsf::geojson_sf("https://raw.githubusercontent.com/glynnbird/usstatesgeojson/master/california.geojson")
# Or if you have a local file, then you could put that here instead, e.g., geojsonsf::geojson_sf("~/Downloads/Alaska.geojson")

as.data.frame( sf::st_coordinates( sf ) ) %>% 
  select(1:2) %>% 
  rename("longitude" = 1, "latitude" = 2) %>% 
  head()

  longitude latitude
1 -120.2485 33.99933
2 -120.2474 34.00191
3 -120.2387 34.00759
4 -120.2300 34.01014
5 -120.2213 34.01037
6 -120.2085 34.00565