I am attempting to read a tsv file into r. Using rstudio's view file utility, my raw file looks like this:
nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
-------------------------------------- ------------ ------------ --------------- --------------- ------------ --------------------- ---------------------- ---------------------- --------------------- -------------------- -------------- -------------- ---------------- ------------------- -------------- --------------- ----------------- -------------- ------------ -------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 |
What I tried:
rawd <- read_tsv('training-data.tsv')
This runs but:
rawd %>% glimpse
Rows: 10,173
Columns: 1
$ `nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link` <chr> …
Everything is in one column.
From looking at the raw tsv file it looks like vertical bars are being used to separate the fields. Tried:
rawd <- read_tsv('training-data.tsv', delim = '|')
Error in read_tsv("training-data.tsv", delim = "|") :
unused argument (delim = "|")
Unexpected since delim is a parameter with help ?read_tsv
.
How can I read my 'tsv' file into r? Assuming it is indeed a tsv file?
CodePudding user response:
Using the data in the Note at the end:
L <- readLines('training-data.tsv')
DF <- read.table(text = L[-2], sep = "|", strip.white = TRUE,
header = TRUE, fill = TRUE)
str(DF)
giving:
'data.frame': 3 obs. of 21 variables:
$ nzid : chr "abc123" "jhgfdfghj543454" "ijhgfdrfgh765456"
$ converted : logi NA NA NA
$ logins_cnt : int 0 1 0
$ shootypes_cnt : int 4 9 4
$ galleries_cnt : int 0 0 0
$ photos_cnt : int 31 140 30
$ favorite_images_cnt : num 0 2 0
$ image_downloaded_cnt: num 0 1127 0
$ gallery_visitors_cnt: num 4 137 0
$ storage_used : num 2.79e 08 1.08e 09 2.79e 08
$ shared_gallery_cnt : int 0 1 0
$ password_set : int 1 1 1
$ site_created : int 0 0 NA
$ site_published : int 0 0 NA
$ pricelist_created : int 0 0 NA
$ used_desktop : int 1 0 NA
$ custom_domain : int 0 0 NA
$ added_watermark : int 0 0 NA
$ added_galley : int 1 1 NA
$ added_logo : int 0 0 NA
$ added_social_link : int 0 0 NA
Note
Lines <- " nzid | converted | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt | storage_used | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link
-------------------------------------- ------------ ------------ --------------- --------------- ------------ --------------------- ---------------------- ---------------------- --------------------- -------------------- -------------- -------------- ---------------- ------------------- -------------- --------------- ----------------- -------------- ------------ -------------------
abc123 | | 0 | 4 | 0 | 31 | 0.000000 | 0.000000 | 4.000000 | 278895839.000000 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0
jhgfdfghj543454 | | 1 | 9 | 0 | 140 | 2.000000 | 1127.000000 | 137.000000 | 1077768195.000000 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0
ijhgfdrfgh765456 | | 0 | 4 | 0 | 30 | 0 | 0 | 0 | 278796703.000000 | 0 | 1 | "
writeLines(Lines, "training-data.tsv")