Home > Software engineering >  read_tsv returns a 1 column df expected many columns
read_tsv returns a 1 column df expected many columns

Time:11-14

I am attempting to read a tsv file into r. Using rstudio's view file utility, my raw file looks like this:

                 nzid                 | converted  | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt |    storage_used     | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link 
-------------------------------------- ------------ ------------ --------------- --------------- ------------ --------------------- ---------------------- ---------------------- --------------------- -------------------- -------------- -------------- ---------------- ------------------- -------------- --------------- ----------------- -------------- ------------ -------------------
 abc123 |            |          0 |             4 |             0 |         31 |            0.000000 |             0.000000 |             4.000000 |    278895839.000000 |                  0 |            1 |            0 |              0 |                 0 |            1 |             0 |               0 |            1 |          0 |                 0
 jhgfdfghj543454 |            |          1 |             9 |             0 |        140 |            2.000000 |          1127.000000 |           137.000000 |   1077768195.000000 |                  1 |            1 |            0 |              0 |                 0 |            0 |             0 |               0 |            1 |          0 |                 0
 ijhgfdrfgh765456 |            |          0 |             4 |             0 |         30 |                   0 |                    0 |                    0 |    278796703.000000 |                  0 |            1 |       

What I tried:

rawd <- read_tsv('training-data.tsv')

This runs but:

rawd %>% glimpse
Rows: 10,173
Columns: 1
$ `nzid                 | converted  | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt |    storage_used     | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link` <chr>

Everything is in one column.

From looking at the raw tsv file it looks like vertical bars are being used to separate the fields. Tried:

rawd <- read_tsv('training-data.tsv', delim = '|')
Error in read_tsv("training-data.tsv", delim = "|") : 
  unused argument (delim = "|")

Unexpected since delim is a parameter with help ?read_tsv.

How can I read my 'tsv' file into r? Assuming it is indeed a tsv file?

CodePudding user response:

Using the data in the Note at the end:

L <- readLines('training-data.tsv')
DF <- read.table(text = L[-2], sep = "|", strip.white = TRUE, 
  header = TRUE, fill = TRUE)
str(DF)

giving:

'data.frame':   3 obs. of  21 variables:
 $ nzid                : chr  "abc123" "jhgfdfghj543454" "ijhgfdrfgh765456"
 $ converted           : logi  NA NA NA
 $ logins_cnt          : int  0 1 0
 $ shootypes_cnt       : int  4 9 4
 $ galleries_cnt       : int  0 0 0
 $ photos_cnt          : int  31 140 30
 $ favorite_images_cnt : num  0 2 0
 $ image_downloaded_cnt: num  0 1127 0
 $ gallery_visitors_cnt: num  4 137 0
 $ storage_used        : num  2.79e 08 1.08e 09 2.79e 08
 $ shared_gallery_cnt  : int  0 1 0
 $ password_set        : int  1 1 1
 $ site_created        : int  0 0 NA
 $ site_published      : int  0 0 NA
 $ pricelist_created   : int  0 0 NA
 $ used_desktop        : int  1 0 NA
 $ custom_domain       : int  0 0 NA
 $ added_watermark     : int  0 0 NA
 $ added_galley        : int  1 1 NA
 $ added_logo          : int  0 0 NA
 $ added_social_link   : int  0 0 NA

Note

Lines <- "                 nzid                 | converted  | logins_cnt | shootypes_cnt | galleries_cnt | photos_cnt | favorite_images_cnt | image_downloaded_cnt | gallery_visitors_cnt |    storage_used     | shared_gallery_cnt | password_set | site_created | site_published | pricelist_created | used_desktop | custom_domain | added_watermark | added_galley | added_logo | added_social_link 
-------------------------------------- ------------ ------------ --------------- --------------- ------------ --------------------- ---------------------- ---------------------- --------------------- -------------------- -------------- -------------- ---------------- ------------------- -------------- --------------- ----------------- -------------- ------------ -------------------
 abc123 |            |          0 |             4 |             0 |         31 |            0.000000 |             0.000000 |             4.000000 |    278895839.000000 |                  0 |            1 |            0 |              0 |                 0 |            1 |             0 |               0 |            1 |          0 |                 0
 jhgfdfghj543454 |            |          1 |             9 |             0 |        140 |            2.000000 |          1127.000000 |           137.000000 |   1077768195.000000 |                  1 |            1 |            0 |              0 |                 0 |            0 |             0 |               0 |            1 |          0 |                 0
 ijhgfdrfgh765456 |            |          0 |             4 |             0 |         30 |                   0 |                    0 |                    0 |    278796703.000000 |                  0 |            1 |       "
writeLines(Lines, "training-data.tsv")
  • Related