Home > Software design >  Importing CSV with spatial data
Importing CSV with spatial data

Time:11-08

I am trying to convert spatial data from a CDC/HHS data on hospital strain, as downloadable from here:

https://healthdata.gov/Hospital/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/anag-cw7u

Here's a snippet of the data:

hospital_name             hospital_pk   geocoded_hospital_address
TRIHEALTH EVENDALE HOSPITAL 360362      POINT (-84.420098 39.253934)
KANE COUNTY HOSPITAL        461309      POINT (-112.52859 37.054324)
CRAIG HOSPITAL              062011      POINT (-104.978247 39.654008)

For entry:

structure(list(hospital_name = c("TRIHEALTH EVENDALE HOSPITAL", 
"KANE COUNTY HOSPITAL", "CRAIG HOSPITAL", "JAY HOSPITAL", "HARRISON COUNTY COMMUNITY HOSPITAL"
), geocoded_hospital_address = c("POINT (-84.420098 39.253934)", 
"POINT (-112.52859 37.054324)", "POINT (-104.978247 39.654008)", 
"POINT (-87.151673 30.950024)", "POINT (-94.025425 40.26528)"
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))

I'm trying to import it as an CSV, transform it, and then turn it into a shapefile. The file has a field, termed geocoded_hospital_address, that I am trying to use to convert the dataset. It is in POINT(longitude, latitude) format e.g., "POINT (-100.01382, 37.441504)". I am used to using two variables (longitude/latitude) under the coords option, and I cannot get the "sf_column_name" option to work for me or decompose the field into two parts:

test_sf<-COVID_19_Reported_Patient_Impact_and_Hospital_Capacity_by_Facility%>%
    st_as_sf(sf_column_name="geocoded_hospital_address", crs=4326)
Error in st_sf(x, ..., agr = agr, sf_column_name = sf_column_name) : 
  no simple features geometry column present

Any ideas?

CodePudding user response:

I think the problem is you have NA in geocoded_hospital_address. Remove them will fix your problem.

library(sf)
df_0 <- COVID_19_Reported_Patient_Impact_and_Hospital_Capacity_by_Facility %>% 
    filter(!is.na(geocoded_hospital_address))
test_sf = st_as_sf(df_0,crs=4326, wkt = "geocoded_hospital_address")

CodePudding user response:

This is a ridiculous solution, but it's the best I've got since the shapefile isn't downloadable.


library(tidyverse)
library(sf)

x <- read_csv('COVID-19_Reported_Patient_Impact_and_Hospital_Capacity_by_Facility.csv')

# alter geometry column to get just coordinates
#  remove 'POINT', parentheses, and whitespace
x$coords <- x$geocoded_hospital_address %>%
  str_remove('POINT') %>%
  str_remove('\\(') %>%
  str_remove('\\)') %>%
  str_trim()

# remove NA coords, then separate 'coords' into x & y, transform to an 'sf' object

x_sf <- x %>%
  filter(!is.na(coords)) %>%
  separate(coords, into = c('x','y'), sep = ' ') %>%
  st_as_sf(coords = c('x','y'))

head(x_sf)

#> Simple feature collection with 6 features and 128 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -108.616 ymin: 24.71104 xmax: -80.21099 ymax: 39.10636
#> CRS:           NA
#> # A tibble: 6 × 129
#>   hospital_pk collecti…¹ state ccn   hospi…² address city  zip   hospi…³ fips_…⁴
#>   <chr>       <date>     <chr> <chr> <chr>   <chr>   <chr> <chr> <chr>   <chr>  
#> 1 060054      2020-06-05 CO    0600… COMMUN… 2351 '… GRAN… 81505 Short … 08077  
#> 2 100156      2020-06-19 FL    1001… HCA FL… 340 NW… LAKE… 32055 Short … 12023  
#> 3 101312      2020-05-15 FL    1013… FISHER… 3301 O… MARA… 33050 Critic… 12087  
#> 4 102001      2020-06-12 FL    1020… SELECT… 955 NW… MIAMI 33128 Long T… 12086  
#> 5 102013      2020-06-26 FL    1020… KINDRE… 4801 N… TAMPA 33603 Long T… 12057  
#> 6 102028      2020-05-01 FL    1020… SELECT… 5050 C… OXFO… 34484 Long T… 12119  
#> # … with 119 more variables: is_metro_micro <lgl>, total_beds_7_day_avg <dbl>,
#> #   all_adult_hospital_beds_7_day_avg <dbl>,
#> #   all_adult_hospital_inpatient_beds_7_day_avg <dbl>,
#> #   inpatient_beds_used_7_day_avg <dbl>,
#> #   all_adult_hospital_inpatient_bed_occupied_7_day_avg <dbl>,
#> #   inpatient_beds_used_covid_7_day_avg <dbl>,
#> #   total_adult_patients_hospitalized_confirmed_and_suspected_covid_7_day_avg <dbl>, …

  • Related