Home > Enterprise >  Read whitespace-delimited Stack Overflow data with row numbers directly into R
Read whitespace-delimited Stack Overflow data with row numbers directly into R

Time:07-30

Often Stack Overflow R questions can share sample data that is just data.frame output as such, instead of dput:

      id cate  result
 1     1 yes       1
 2     1 yes      NA
 3     1 no       NA
 4     2 no       NA
 5     2 yes       1
 6     2 yes      NA
 7     2 no       NA
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes      NA
13     3 no       NA
14     3 yes      NA
15     4 yes       1
16     4 yes      NA
17     4 yes      NA
18     4 no       NA
19     4 no       NA 

One way I found to read this into R while answering questions is to add a row_num column manually, then read_table, and select(-row_num).

readr::read_table("   row_num   id cate  result
 1     1 yes       1
 2     1 yes      NA
 3     1 no       NA
 4     2 no       NA
 5     2 yes       1
 6     2 yes      NA
 7     2 no       NA
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes      NA
13     3 no       NA
14     3 yes      NA
15     4 yes       1
16     4 yes      NA
17     4 yes      NA
18     4 no       NA
19     4 no       NA ") |>
  dplyr::select(-row_num)

# # A tibble: 19 × 3
#       id cate  result
#    <dbl> <chr>  <dbl>
#  1     1 yes        1
#  2     1 yes       NA
#  3     1 no        NA
#  4     2 no        NA
#  5     2 yes        1
#  6     2 yes       NA
#  7     2 no        NA
#  8     3 no        NA
#  9     3 yes       NA
# 10     3 no        NA
# 11     3 yes        1
# 12     3 yes       NA
# 13     3 no        NA
# 14     3 yes       NA
# 15     4 yes        1
# 16     4 yes       NA
# 17     4 yes       NA
# 18     4 no        NA
# 19     4 no        NA

Are there any simpler packages/tricks to read data.frame or tibble output in just one step?

CodePudding user response:

We can use soread after copying the lines

#source("http://news.mrdwab.com/install_github.R")
#install_github("mrdwab/overflow-mrdwab")
library(overflow)
# COPY THE DATA FROM STACK OVERFLOW, THEN RUN
df1 <- soread()

-output

> df1
   id cate result
1   1  yes      1
2   1  yes     NA
3   1   no     NA
4   2   no     NA
5   2  yes      1
6   2  yes     NA
7   2   no     NA
8   3   no     NA
9   3  yes     NA
10  3   no     NA
11  3  yes      1
12  3  yes     NA
13  3   no     NA
14  3  yes     NA
15  4  yes      1
16  4  yes     NA
17  4  yes     NA
18  4   no     NA
19  4   no     NA

CodePudding user response:

Or read.table:

df <- read.table(text = "      id cate  result
 1     1 yes       1
 2     1 yes      NA
 3     1 no       NA
 4     2 no       NA
 5     2 yes       1
 6     2 yes      NA
 7     2 no       NA
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes      NA
13     3 no       NA
14     3 yes      NA
15     4 yes       1
16     4 yes      NA
17     4 yes      NA
18     4 no       NA
19     4 no       NA", header = TRUE)
df
#>    id cate result
#> 1   1  yes      1
#> 2   1  yes     NA
#> 3   1   no     NA
#> 4   2   no     NA
#> 5   2  yes      1
#> 6   2  yes     NA
#> 7   2   no     NA
#> 8   3   no     NA
#> 9   3  yes     NA
#> 10  3   no     NA
#> 11  3  yes      1
#> 12  3  yes     NA
#> 13  3   no     NA
#> 14  3  yes     NA
#> 15  4  yes      1
#> 16  4  yes     NA
#> 17  4  yes     NA
#> 18  4   no     NA
#> 19  4   no     NA

Created on 2022-07-29 by the reprex package (v2.0.1)

CodePudding user response:

You could also do

library(data.table)
fread('      id cate  result
 1     1 yes       1
 2     1 yes      NA
 3     1 no       NA
 4     2 no       NA
 5     2 yes       1
 6     2 yes      NA
 7     2 no       NA
 8     3 no       NA
 9     3 yes      NA
10     3 no       NA
11     3 yes       1
12     3 yes      NA
13     3 no       NA
14     3 yes      NA
15     4 yes       1
16     4 yes      NA
17     4 yes      NA
18     4 no       NA
19     4 no       NA ')[,-1]
  • Related