Home > Software engineering >  How to turn text data into data frame in R?
How to turn text data into data frame in R?

Time:09-05

Similar questions have been asked about converting a text string of data into a data frame (for example, here). However, I can't seem to adapt them to my problem.

I have a string of data that I'm trying to turn into a 4 column data frame. I managed to solve my problem using the readr::read_table function (as shown below). However, I'm trying to do this in base R. I tried using base R's read.table (for clarity, it is actually utils::read.table and not base... but Im referring to it as base R) but I cant seem to get it to work.

For example:

# text data
myText <- c("5 3 10\n3\n1 5 14 0.1005662213\n2 0 0 0.671371791\n3 0 0 0.3407034564\n3\n1 1 25 -0.5748688752\n2 0 0 -4.699291421\n3 0 0 -0.4393139217\n5\n1 5 35 0\n2 0 0 1.749283465\n3 0 67 0.1521562187\n6 0 0 -0.5545833321\n7 0 0 3.083556757\n1\n1 0 0 0.1563740906\n3\n1 1 25 -0.5748688752\n2 0 0 -4.352982824\n3 0 0 -0.05197710951\n5\n1 5 35 0\n2 0 0 2.425573501\n3 0 67 0.1521562187\n6 0 0 0.2505656058\n7 0 0 3.46201086\n3\n1 0 70 0.1563740906\n2 0 0 -0.8389369233\n3 0 0 -0.8127210366\n3\n1 1 25 -0.5748688752\n2 0 0 -4.125099073\n3 0 0 0.441967459\n5\n1 5 35 0\n2 0 0 1.337439399\n3 0 67 0.1521562187\n6 0 0 -0.03812773992\n7 0 0 2.488268982\n5\n1 0 70 0.1563740906\n2 0 0 -0.3505144781\n3 3 12 -0.8127210366\n6 0 0 -4.823541056\n7 0 0 1.200961188\n3\n1 1 25 -0.5748688752\n2 0 0 -4.615762984\n3 0 0 0.3397146156\n3\n1 5 35 0\n2 0 0 0.721465764\n3 0 0 0.4643481329\n5\n1 0 70 0.1563740906\n2 0 0 -1.004169113\n3 3 12 -0.8127210366\n6 0 0 -2.918580322\n7 0 0 2.114195803\n3\n1 1 25 -0.5748688752\n2 0 0 -4.894243443\n3 0 0 0.2303526511\n3\n1 5 35 0\n2 0 0 1.841081293\n3 0 0 1.204413054\n")

# turn into df using readr
df <- suppressWarnings(
  readr::read_table(
    file = myText,
    col_names = c("idNum", "varNum", "val1", "val2"),
    skip = 1,
    na = c("")
  )
)

> df
# A tibble: 68 × 4
   idNum varNum  val1   val2
   <dbl>  <dbl> <dbl>  <dbl>
 1     3     NA    NA NA    
 2     1      5    14  0.101
 3     2      0     0  0.671
 4     3      0     0  0.341
 5     3     NA    NA NA    
 6     1      1    25 -0.575
 7     2      0     0 -4.70 
 8     3      0     0 -0.439
 9     5     NA    NA NA    
10     1      5    35  0    
# … with 58 more rows

As you can see, I have converted the string into a 4 column data frame (tibble in this case). But I'm trying to avoid using any extra packages and achieve this using base R.

I tried read.table from base R, but it gives an error:

dfNew <- read.table(file = myText,
           col.names = c("idNum", "varNum", "val1", "val2"),
           skip = 1,
           na.strings = "NA")
> dfNew
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") : cannot open file '5 3 10

I'm not sure how to solve the error. The additional warning also seems to say that it is not skipping any lines before reading the data.

Any suggestions as to how I could solve this?

CodePudding user response:

As you are reading from a character vector use the text argument instead of file. Also, as not all rows contain 4 values use fill=NA to fill missing values with NA:

df <- read.table(text = myText, skip = 1, fill = NA, col.names = c("idNum", "varNum", "val1", "val2"))

head(df)
#>   idNum varNum val1       val2
#> 1     3     NA   NA         NA
#> 2     1      5   14  0.1005662
#> 3     2      0    0  0.6713718
#> 4     3      0    0  0.3407035
#> 5     3     NA   NA         NA
#> 6     1      1   25 -0.5748689
  •  Tags:  
  • r
  • Related