I am trying to build a dataframe of KML files. I have 52 different files in my dataset, and I have already uploaded them to R using the following code chunk:
#importing data
library(fs)
file_paths = fs::dir_ls("C:/Users/JoaoArbache/Desktop/Mestrado/carbono/dados")
file_contents = list()
for(i in seq_along(file_paths)) {
file_contents[[i]] = st_read(
dsn = file_paths[[i]]
)
}
#renaming the lists
numeros = list()
for(i in file_paths) {
numeros[[i]] = str_extract(i, "\\d ") %>%
as.numeric()
}
id = do.call(rbind.data.frame, numeros) %>%
filter(!row_number() %in% c(53))
colnames(id)[1] = "id"
file_contents = set_names(file_contents, id$id)
Ok, so far everything is alright. I have all of the 52 files uploaded in the file_contents
list.
This is the file_contents list
Now, I need to get each of the 52 lists in file_contents
, that contain one dataframe each, and build a single dataframe. So it should bind 52 different dataframes into a single one. I`ve tried lots of different ways to solve this problem, but I always failed.
Thanks for the support :)
I tried different loops, do.call
function, some native R functions, but none of them worked. I`d either get an error message (e.g.
Error in `[[<-`(`*tmp*`, i, value = as.data.frame(i)) :
attempt to select more than one element in vectorIndex
) or just create a dataframe with the first element of the file_contents
list. I was expecting to get a single dataframe with the 52 dataframes binded...
CodePudding user response:
Have you tried?
library(data.table)
rbindlist(file_contents, use.names = T, fill = T)
That assumes the col names are the same if they are not set use.names = F.
CodePudding user response:
You can use purrr::map
on a list of files and build a single dataset if all of the files are regularly shaped (have the same columns). Below is an example using the nc
dataset included with the sf
package.
library(sf)
library(dplyr)
library(purrr)
# make a temporary directory for the example
temp_dir <- tempdir()
# read nc data
nc <- st_read(system.file("shape/nc.shp", package="sf"))
# create two datasets with all the same columns, but different data
one <- nc[1:3,]
two <- nc[55,]
# write two separate kml objects to disk
st_write(one, paste0(temp_dir, "/", "one.kml"))
#> Writing layer `one' to data source `/tmp/RtmpfRjSGc/one.kml' using driver `KML'
#> Writing 3 features with 14 fields and geometry type Multi Polygon.
st_write(two, paste0(temp_dir, "/", "two.kml"))
#> Writing layer `two' to data source `/tmp/RtmpfRjSGc/two.kml' using driver `KML'
#> Writing 1 features with 14 fields and geometry type Multi Polygon.
# show the files on disk, just for illustration
list.files(path = temp_dir, pattern = "*.kml", full.names = T)
#> [1] "/tmp/RtmpfRjSGc/one.kml" "/tmp/RtmpfRjSGc/two.kml"
# read the two files & make them one dataframe:
together <- temp_dir %>%
list.files(pattern = "*.kml", full.names = T) %>%
map_dfr(st_read)
#> Reading layer `one' from data source `/tmp/RtmpfRjSGc/one.kml' using driver `LIBKML'
#> Simple feature collection with 3 features and 24 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -81.74091 ymin: 36.23402 xmax: -80.43509 ymax: 36.58977
#> Geodetic CRS: WGS 84
#> Reading layer `two' from data source `/tmp/RtmpfRjSGc/two.kml' using driver `LIBKML'
#> Simple feature collection with 1 feature and 24 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -83.259 ymin: 35.29087 xmax: -82.74374 ymax: 35.79195
#> Geodetic CRS: WGS 84
head(together)
#> Simple feature collection with 4 features and 24 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -83.259 ymin: 35.29087 xmax: -80.43509 ymax: 36.58977
#> Geodetic CRS: WGS 84
#> Name description timestamp begin end altitudeMode tessellate extrude
#> 1 Ashe <NA> <NA> <NA> <NA> <NA> -1 0
#> 2 Alleghany <NA> <NA> <NA> <NA> <NA> -1 0
#> 3 Surry <NA> <NA> <NA> <NA> <NA> -1 0
#> 4 Haywood <NA> <NA> <NA> <NA> <NA> -1 0
#> visibility drawOrder icon AREA PERIMETER CNTY_ CNTY_ID FIPS FIPSNO CRESS_ID
#> 1 -1 NA <NA> 0.114 1.442 1825 1825 37009 37009 5
#> 2 -1 NA <NA> 0.061 1.231 1827 1827 37005 37005 3
#> 3 -1 NA <NA> 0.143 1.630 1828 1828 37171 37171 86
#> 4 -1 NA <NA> 0.144 1.690 1996 1996 37087 37087 44
#> BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
#> 1 1091 1 10 1364 0 19 MULTIPOLYGON (((-81.47258 3...
#> 2 487 0 10 542 3 12 MULTIPOLYGON (((-81.2397 36...
#> 3 3188 5 208 3616 6 260 MULTIPOLYGON (((-80.45612 3...
#> 4 2110 2 57 2463 8 62 MULTIPOLYGON (((-82.74374 3...
Created on 2022-11-17 by the reprex package (v2.0.1)