#I have data in a text file that looks like the following.
[1] "JUP rs1126821 93 1371.51338998448 0.00114678899082569"
#I want to separate the gene, rs#, the three different cells and calculate the sum
Any help ?
CodePudding user response:
Using scan
,
scan(text=x, what='A', qui=T) |> {\(.) data.frame(foo=.[1], bar=.[2], baz=sum(as.numeric(.[-(1:2)])))}()
# foo bar baz
# 1 JUP rs1126821 1464.515
or if you have a text file, read.table
.
read.table('foo.txt') |>
apply(1, \(.) c(foo=.[[1]], bar=.[[2]], baz=sum(as.numeric(.[-(1:2)])))) |>
t() |> as.data.frame() |> type.convert(as.is=TRUE)
# foo bar baz
# 1 JUP rs1126821 1464.514
# 2 JUP rs1126821 1464.514
# 3 JUP rs1126821 1464.514
Data:
x <- "JUP rs1126821 93 1371.51338998448 0.00114678899082569"
writeLines('
JUP rs1126821 93 1371.51338998448 0.00114678899082569
JUP rs1126821 93 1371.51338998448 0.00114678899082569
JUP rs1126821 93 1371.51338998448 0.00114678899082569
', 'foo.txt')
CodePudding user response:
You could try:
some_txt <- "JUP rs1126821 93 1371.51338998448 0.00114678899082569"
some_thing <- sum(as.numeric(unlist(strsplit(some_txt, ' ')))[3:5])
Warning message:
NAs introduced by coercion
some_thing
[1] 1464.515
Where it would be useful if your text are orderly. Likely more useful is as is above or :
txt_df2 <- as.data.frame(t(unlist(strsplit(some_txt, ' '))), header= FALSE)
txt_df2
V1 V2 V3 V4 V5
1 JUP rs1126821 93 1371.51338998448 0.00114678899082569
txt_df2$sum_cells <- sum(as.numeric(txt_df2[, 3:5]))
> txt_df2
V1 V2 V3 V4 V5 sum_cells
1 JUP rs1126821 93 1371.51338998448 0.00114678899082569 1464.515
if you want to keep all intermediate values.