I apologize if the articulation of my question is confusing, I haven't been able to find similar threads which clarify the English of my question.
I am working with a sample of data which resembles that seen below:
label1 | label2 | label3 | label# |
---|---|---|---|
value1 | value4 | value7 | label2 |
value2 | value5 | value8 | label1 |
value3 | value6 | value9 | label3 |
I'm trying to create a new column, 'currentvalue', which reads in the value of label# in a certain row, then for that row populates the column with that row's value of whatever column is named in label#. In other words, I want my output to look like this:
label1 | label2 | label3 | label# | currentvalue |
---|---|---|---|---|
value1 | value4 | value7 | label2 | value4 |
value2 | value5 | value8 | label1 | value2 |
value3 | value6 | value9 | label3 | value9 |
The only solutions I can think of for this involve multiple for loops, which I imagine is very computationally inefficient. I've been searching stack overflow for threads which could help me write a vectorized solution to this, but I don't think I've been able to articulate the problem very well because none of my searches were helpful. Any help is appreciated (including help stating my question better).
CodePudding user response:
A solution using dplyr
and purrr
. imap_chr
can apply a function efficiently through each row. The first argument is the content in label#
, while the second argument is the row number.
Usually rowwise
operation is slow when the data frame is huge, so try to avoid rowwise
and use alternative if possible.
library(dplyr)
library(purrr)
dat2 <- dat %>%
mutate(currentvalue = imap_chr(`label#`, ~dat[.y, .x]))
dat2
# label1 label2 label3 label# currentvalue
# 1 value1 value4 value7 label2 value4
# 2 value2 value5 value8 label1 value2
# 3 value3 value6 value9 label3 value9
Data
dat <- read.table(text = "label1 label2 label3 label
value1 value4 value7 label2
value2 value5 value8 label1
value3 value6 value9 label3", header = TRUE) %>%
setnames(c("label1", "label2", "label3", "label#"))
CodePudding user response:
It's bit messy and I think there might a better way, but you may try
library(dplyr)
library(tibble)
df <- read.table(text = "label1 label2 label3 label#
value1 value4 value7 label2
value2 value5 value8 label1
value3 value6 value9 label3", h = T)
df %>%
rowwise %>%
rownames_to_column(., "row") %>%
mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])
row label1 label2 label3 label currentvalue
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 value1 value4 value7 label2 value4
2 2 value2 value5 value8 label1 value2
3 3 value3 value6 value9 label3 value9
When I read your data with read.table
, label#
become label
.
column name label#
names(df)[4] <- "label#"
df %>%
rowwise %>%
rownames_to_column(., "row") %>%
mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == 'label#')]])
row label1 label2 label3 `label#` currentvalue
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 value1 value4 value7 label2 label2
2 2 value2 value5 value8 label1 label1
3 3 value3 value6 value9 label3 label3
using base R
x <- match(df$label, names(df))
y <- 1:nrow(df)
z <- data.frame(y, x)
df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])
Time check
microbenchmark::microbenchmark(
a = {
df %>%
rowwise %>%
rownames_to_column(., "row") %>%
mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])
},
b = {
x <- match(df$label, names(df))
y <- 1:nrow(df)
z <- data.frame(y, x)
df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
a 6157.8 6861.95 8773.098 7465.75 9367.1 26232.8 100 b
b 360.6 399.75 692.073 488.40 666.9 4225.0 100 a