I have to convert coordinates from DMS (degrees, minutes and seconds) to DegDec (real numbers). However, the DMS separators are messed up, so I would like to split the columns on non-numeric characters excluding .
.
My data looks as follows:
lat | long |
---|---|
22ª29'56.06" | 105º21'37.27" |
22°29`53.14" | 105°21'29.48" |
22º30'00.43" | 105ª21'37.46'' |
105'29'27.17" | 105°21'39.68" |
The only pattern I can spot in the separators is that they are non-numeric and different from .
. I think I could solve this relatively easily if I could split these columns on non-numeric or period characters.
Once I have all three elements, I can concatenate them together to form a predictable pattern with the correct separators.
Here's a replicable sample from the data:
df <- tibble(lat=c("22ª29'56.06\"","22°29`53.14\"","22º30'00.43\"","105'29'27.17\""),
long=c("105º21'37.27\"","105°21'29.48\"","105°21'37.46\''","105°21'39.68"))
CodePudding user response:
You want to do something like this?
library(dplyr)
library(tidyr)
df %>%
separate(lat,into = paste0("lat",1:4),sep = "[^0-9]",remove = FALSE) %>%
separate(long,into = paste0("long",1:4),sep = "[^0-9]",remove = FALSE)
# A tibble: 4 x 10
lat lat1 lat2 lat3 lat4 long long1 long2 long3 long4
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 "22ª29'56.06\"" 22 29 56 06 "105º21'37.27\"" 105 21 37 27
2 "22°29`53.14\"" 22 29 53 14 "105°21'29.48\"" 105 21 29 48
3 "22º30'00.43\"" 22 30 00 43 "105°21'37.46''" 105 21 37 46
4 "105'29'27.17\"" 105 29 27 17 "105°21'39.68" 105 21 39 68
CodePudding user response:
Here is a way with cSplit
from splitstackshape
.
splitstackshape::cSplit(df, c('lat', 'long'), sep = '[^0-9]', fixed = FALSE)
# lat_1 lat_2 lat_3 lat_4 lat_5 long_1 long_2 long_3 long_4 long_5
#1: 22 29 56 6 NA 105 21 37 27 NA
#2: 22 29 53 14 NA 105 21 29 48 NA
#3: 22 30 0 43 NA 105 21 37 46 NA
#4: 105 29 27 17 NA 105 21 39 68 NA
Since the string ends with non-numeric data it splits and creates a 5th column with NA
. You can remove them with Remove columns from dataframe where ALL values are NA