Home > database >  read file from google drive
read file from google drive

Time:07-08

I have spreadsheet uploaded as csv file in google drive unlocked so users can read from it. This is the link to the csv file: https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/edit?usp=sharing

I am trying to read it from R but I am getting a long list of error messages. I am using:

id = "170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk"
read.csv(sprint("https://docs.google.com/spreadsheets/d/uc?id=%s&export=download",id))

Could someone suggest how to read files from google drive directly into R?

CodePudding user response:

I would try to publish the sheet as a CSV file (doc), and then read it from there.

It seems like your file is already published as a CSV. So, this should work. (Note that the URL ends with /pub?output=csv)

read.csv("https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/pub?output=csv")

CodePudding user response:

To read the CSV file faster you can use vroom which is even faster than fread(). See here.

Now using vroom,

library(vroom)

vroom("https://docs.google.com/spreadsheets/d/170235QwbmgQvr0GWmT-8yBsC7Vk6p_dmvYxrZNfsKqk/pub?output=csv")

#> Rows: 387048 Columns: 14
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (6): StationCode, SampleID, WeatherCode, OrganismCode, race, race2
#> dbl  (7): WaterTemperature, Turbidity, Velocity, ForkLength, Weight, Count, ...
#> date (1): SampleDate
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 387,048 × 14
#>    StationCode SampleDate SampleID WeatherCode WaterTemperature Turbidity
#>    <chr>       <date>     <chr>    <chr>                  <dbl>     <dbl>
#>  1 Gate 11     2000-04-25 116_00   CLD                    13.1       2   
#>  2 Gate 5      1995-04-26 117_95   CLR                    NA         2   
#>  3 Gate 2      1995-04-21 111_95   W                      10.4      12   
#>  4 Gate 6      2008-12-13 348_08   CLR                    49.9       1.82
#>  5 Gate 5      1999-12-10 344_99   CLR                     7.30      1.5 
#>  6 Gate 6      2012-05-25 146_12   CLR                    55.5       1.60
#>  7 Gate 10     2011-06-28 179_11   RAN                    57.3       3.99
#>  8 Gate 11     1996-04-25 116_96   CLR                    13.8      21   
#>  9 Gate 9      2007-07-02 183_07   CLR                    56.6       2.09
#> 10 Gate 6      2009-06-04 155_09   CLR                    58.6       3.08
#> # … with 387,038 more rows, and 8 more variables: Velocity <dbl>,
#> #   OrganismCode <chr>, ForkLength <dbl>, Weight <dbl>, Count <dbl>,
#> #   race <chr>, year <dbl>, race2 <chr>

Created on 2022-07-08 by the reprex package (v2.0.1)

  • Related