I apologize if the articulation of my question is confusing, I haven't been able to find similar threads which clarify the English of my question.

I am working with a sample of data which resembles that seen below:

label1	label2	label3	label#
value1	value4	value7	label2
value2	value5	value8	label1
value3	value6	value9	label3

I'm trying to create a new column, 'currentvalue', which reads in the value of label# in a certain row, then for that row populates the column with that row's value of whatever column is named in label#. In other words, I want my output to look like this:

label1	label2	label3	label#	currentvalue
value1	value4	value7	label2	value4
value2	value5	value8	label1	value2
value3	value6	value9	label3	value9

The only solutions I can think of for this involve multiple for loops, which I imagine is very computationally inefficient. I've been searching stack overflow for threads which could help me write a vectorized solution to this, but I don't think I've been able to articulate the problem very well because none of my searches were helpful. Any help is appreciated (including help stating my question better).

CodePudding user response：

A solution using dplyr and purrr. imap_chr can apply a function efficiently through each row. The first argument is the content in label#, while the second argument is the row number.

Usually rowwise operation is slow when the data frame is huge, so try to avoid rowwise and use alternative if possible.

library(dplyr)
library(purrr)

dat2 <- dat %>%
  mutate(currentvalue = imap_chr(`label#`, ~dat[.y, .x]))
dat2
#   label1 label2 label3 label# currentvalue
# 1 value1 value4 value7 label2       value4
# 2 value2 value5 value8 label1       value2
# 3 value3 value6 value9 label3       value9

Data

dat <- read.table(text = "label1 label2  label3  label
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", header = TRUE) %>%
  setnames(c("label1", "label2", "label3", "label#"))

CodePudding user response：

It's bit messy and I think there might a better way, but you may try

library(dplyr)
library(tibble)
    
df <- read.table(text = "label1 label2  label3  label#
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", h = T)

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])

  row   label1 label2 label3 label  currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>  <chr>       
1 1     value1 value4 value7 label2 value4      
2 2     value2 value5 value8 label1 value2      
3 3     value3 value6 value9 label3 value9

When I read your data with read.table, label# become label.

column name `label#`

names(df)[4] <- "label#"

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == 'label#')]])

  row   label1 label2 label3 `label#` currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>    <chr>       
1 1     value1 value4 value7 label2   label2      
2 2     value2 value5 value8 label1   label1      
3 3     value3 value6 value9 label3   label3

using base R

x <- match(df$label, names(df))
y <- 1:nrow(df)
z <- data.frame(y, x)
df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])

Time check

microbenchmark::microbenchmark(
  a = {
    df %>%
      rowwise %>%
      rownames_to_column(., "row") %>%
      mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])
  },
  b = {
    x <- match(df$label, names(df))
    y <- 1:nrow(df)
    z <- data.frame(y, x)
    df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])
  }
)

Unit: microseconds
 expr    min      lq     mean  median     uq     max neval cld
    a 6157.8 6861.95 8773.098 7465.75 9367.1 26232.8   100   b
    b  360.6  399.75  692.073  488.40  666.9  4225.0   100  a

column name label#

using base R

Time check

column name `label#`