Make table from text line in R-CodePudding

#I have data in a text file that looks like the following.

[1] "JUP rs1126821 93 1371.51338998448 0.00114678899082569"

#I want to separate the gene, rs#, the three different cells and calculate the sum

Any help ?

CodePudding user response：

Using scan,

scan(text=x, what='A', qui=T) |> {\(.) data.frame(foo=.[1], bar=.[2], baz=sum(as.numeric(.[-(1:2)])))}()
#   foo       bar      baz
# 1 JUP rs1126821 1464.515

or if you have a text file, read.table.

read.table('foo.txt') |> 
  apply(1, \(.) c(foo=.[[1]], bar=.[[2]], baz=sum(as.numeric(.[-(1:2)])))) |>
  t() |> as.data.frame() |> type.convert(as.is=TRUE)
#   foo       bar      baz
# 1 JUP rs1126821 1464.514
# 2 JUP rs1126821 1464.514
# 3 JUP rs1126821 1464.514

Data:

x <- "JUP rs1126821 93 1371.51338998448 0.00114678899082569"

writeLines('
JUP rs1126821 93 1371.51338998448 0.00114678899082569
JUP rs1126821 93 1371.51338998448 0.00114678899082569
JUP rs1126821 93 1371.51338998448 0.00114678899082569
', 'foo.txt')

CodePudding user response：

You could try:

some_txt <- "JUP rs1126821 93 1371.51338998448 0.00114678899082569"
some_thing <- sum(as.numeric(unlist(strsplit(some_txt, ' ')))[3:5])
Warning message:
NAs introduced by coercion 
some_thing
   [1] 1464.515

Where it would be useful if your text are orderly. Likely more useful is as is above or :

txt_df2 <- as.data.frame(t(unlist(strsplit(some_txt, ' '))), header= FALSE)
txt_df2
   V1        V2 V3               V4                  V5
1 JUP rs1126821 93 1371.51338998448 0.00114678899082569

txt_df2$sum_cells <- sum(as.numeric(txt_df2[, 3:5]))
> txt_df2
   V1        V2 V3               V4                  V5 sum_cells
1 JUP rs1126821 93 1371.51338998448 0.00114678899082569  1464.515

if you want to keep all intermediate values.