Home > front end >  Download file from url R
Download file from url R

Time:01-05

I am having problems downloading data from the link below directly with the code into R:

kaggle.com/c/house-prices-advanced-regression-techniques/data

I tried with this code: data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1")

I tried most of the options listed here: Access a URL and read Data with R

However, I only get html table and not tables with the relevant house-price data from the web-site. Not sure what I am doing wrong. tnx

CodePudding user response:

Here's a simple example post on kaggle how to achieve your goal, the code is taken from the example.

  1. Create a verified account
  2. Log in
  3. Go to you account (click the top right -> account)
  4. Click "Create new API token"
  5. Place the file somewhere sensible that you can access from R
library(httr)
library(jsonlite)
kgl_credentials <- function(kgl_json_path="~/.kaggle/kaggle.json"){
    # returns user credentials from kaggle json
    user <- fromJSON("~/.kaggle/kaggle.json", flatten = TRUE)
    return(user)    
}
kgl_dataset <- function(ref, file_name, type="dataset", kgl_json_path="~/.kaggle/kaggle.json"){
    # ref: depends on 'type':
    # - dataset: "sudalairajkumar/novel-corona-virus-2019-dataset"
    # - competition: competition ID, e.g. 8587 for "competitive-data-science-predict-future-sales"
    # file_name: specific dataset wanted, e.g. "covid_19_data.csv"
    .kaggle_base_url <- "https://www.kaggle.com/api/v1"
    user <- kgl_credentials(kgl_json_path)
    if(type=="dataset"){
        # dataset
        url <- paste0(.kaggle_base_url, "/datasets/download/", ref, "/", file_name)
    }else if(type=="competition"){
        # competition
        url <- paste0(.kaggle_base_url, "/competitions/data/download/", ref, "/", file_name)
    }
    # call
    rcall <- httr::GET(url, httr::authenticate(user$username, user$key, type="basic"))
    # content type
    content_type <- rcall[[3]]$`content-type`
    if( grepl("zip", content_type)){
        # download and unzup
        temp <- tempfile()
        download.file(rcall$url,temp)
        data <- read.csv(unz(temp, file_name))
        unlink(temp)
    }else{
        # else read as text -- note: code this better
        data <- content(rcall, type="text/csv", encoding = "ISO-8859-1")
    }
    return(data)
}

Then you can use the credentials to download the dataset as described in the post

kgl_dataset(file_name = 'test.csv',
            type = 'competition',
            ref = 'house-prices-advanced-regression-techniques',
            kgl_json_path = 'kaggle.json')

Alternatively you can use the unofficial R api

library(devtools)
install_github('mkearney/kaggler')
library(kaggler)
kgl_auth(creds_file = 'kaggle.json')
kgl_competitions_data_download('house-prices-advanced-regression-techniques', 'test.csv') 

However this fails, due to a mistake in the implementation of kgl_api_get

function (path, ..., auth = kgl_auth()) 
{
    r <- httr::GET(kgl_api_call(path, ...), auth)
    httr::warn_for_status(r)
    if (r$status_code != 200) { # <== should be "==" 
    ...
}

CodePudding user response:

I downloaded the data (which you should just do too, it's quite easy), but just in case you don't want to, I uploaded the data to Pastebin and you can run the code below. This is for their "train" dataset, downloaded from the link you provided above

data <- read.delim("https://pastebin.com/raw/aGvwwdV0", header=T)
  •  Tags:  
  • Related