I am having problems downloading data from the link below directly with the code into R:
kaggle.com/c/house-prices-advanced-regression-techniques/data
I tried with this code:
data<-read.csv("https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data?select=test.csv", skip = 1")
I tried most of the options listed here: Access a URL and read Data with R
However, I only get html table and not tables with the relevant house-price data from the web-site. Not sure what I am doing wrong. tnx
CodePudding user response:
Here's a simple example post on kaggle how to achieve your goal, the code is taken from the example.
- Create a verified account
- Log in
- Go to you account (click the top right -> account)
- Click "Create new API token"
- Place the file somewhere sensible that you can access from R
library(httr)
library(jsonlite)
kgl_credentials <- function(kgl_json_path="~/.kaggle/kaggle.json"){
# returns user credentials from kaggle json
user <- fromJSON("~/.kaggle/kaggle.json", flatten = TRUE)
return(user)
}
kgl_dataset <- function(ref, file_name, type="dataset", kgl_json_path="~/.kaggle/kaggle.json"){
# ref: depends on 'type':
# - dataset: "sudalairajkumar/novel-corona-virus-2019-dataset"
# - competition: competition ID, e.g. 8587 for "competitive-data-science-predict-future-sales"
# file_name: specific dataset wanted, e.g. "covid_19_data.csv"
.kaggle_base_url <- "https://www.kaggle.com/api/v1"
user <- kgl_credentials(kgl_json_path)
if(type=="dataset"){
# dataset
url <- paste0(.kaggle_base_url, "/datasets/download/", ref, "/", file_name)
}else if(type=="competition"){
# competition
url <- paste0(.kaggle_base_url, "/competitions/data/download/", ref, "/", file_name)
}
# call
rcall <- httr::GET(url, httr::authenticate(user$username, user$key, type="basic"))
# content type
content_type <- rcall[[3]]$`content-type`
if( grepl("zip", content_type)){
# download and unzup
temp <- tempfile()
download.file(rcall$url,temp)
data <- read.csv(unz(temp, file_name))
unlink(temp)
}else{
# else read as text -- note: code this better
data <- content(rcall, type="text/csv", encoding = "ISO-8859-1")
}
return(data)
}
Then you can use the credentials to download the dataset as described in the post
kgl_dataset(file_name = 'test.csv',
type = 'competition',
ref = 'house-prices-advanced-regression-techniques',
kgl_json_path = 'kaggle.json')
Alternatively you can use the unofficial R api
library(devtools)
install_github('mkearney/kaggler')
library(kaggler)
kgl_auth(creds_file = 'kaggle.json')
kgl_competitions_data_download('house-prices-advanced-regression-techniques', 'test.csv')
However this fails, due to a mistake in the implementation of kgl_api_get
function (path, ..., auth = kgl_auth())
{
r <- httr::GET(kgl_api_call(path, ...), auth)
httr::warn_for_status(r)
if (r$status_code != 200) { # <== should be "=="
...
}
CodePudding user response:
I downloaded the data (which you should just do too, it's quite easy), but just in case you don't want to, I uploaded the data to Pastebin and you can run the code below. This is for their "train" dataset, downloaded from the link you provided above
data <- read.delim("https://pastebin.com/raw/aGvwwdV0", header=T)