Home > database >  How do I create a column based on values in another column which are the names of variables in my da
How do I create a column based on values in another column which are the names of variables in my da

Time:12-15

I apologize if the articulation of my question is confusing, I haven't been able to find similar threads which clarify the English of my question.

I am working with a sample of data which resembles that seen below:

label1 label2 label3 label#
value1 value4 value7 label2
value2 value5 value8 label1
value3 value6 value9 label3

I'm trying to create a new column, 'currentvalue', which reads in the value of label# in a certain row, then for that row populates the column with that row's value of whatever column is named in label#. In other words, I want my output to look like this:

label1 label2 label3 label# currentvalue
value1 value4 value7 label2 value4
value2 value5 value8 label1 value2
value3 value6 value9 label3 value9

The only solutions I can think of for this involve multiple for loops, which I imagine is very computationally inefficient. I've been searching stack overflow for threads which could help me write a vectorized solution to this, but I don't think I've been able to articulate the problem very well because none of my searches were helpful. Any help is appreciated (including help stating my question better).

CodePudding user response:

A solution using dplyr and purrr. imap_chr can apply a function efficiently through each row. The first argument is the content in label#, while the second argument is the row number.

Usually rowwise operation is slow when the data frame is huge, so try to avoid rowwise and use alternative if possible.

library(dplyr)
library(purrr)

dat2 <- dat %>%
  mutate(currentvalue = imap_chr(`label#`, ~dat[.y, .x]))
dat2
#   label1 label2 label3 label# currentvalue
# 1 value1 value4 value7 label2       value4
# 2 value2 value5 value8 label1       value2
# 3 value3 value6 value9 label3       value9

Data

dat <- read.table(text = "label1 label2  label3  label
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", header = TRUE) %>%
  setnames(c("label1", "label2", "label3", "label#"))

CodePudding user response:

It's bit messy and I think there might a better way, but you may try

library(dplyr)
library(tibble)
    
df <- read.table(text = "label1 label2  label3  label#
value1  value4  value7  label2
value2  value5  value8  label1
value3  value6  value9  label3", h = T)

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])

  row   label1 label2 label3 label  currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>  <chr>       
1 1     value1 value4 value7 label2 value4      
2 2     value2 value5 value8 label1 value2      
3 3     value3 value6 value9 label3 value9 

When I read your data with read.table, label# become label.

column name label#

names(df)[4] <- "label#"

df %>%
  rowwise %>%
  rownames_to_column(., "row") %>%
  mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == 'label#')]])

  row   label1 label2 label3 `label#` currentvalue
  <chr> <chr>  <chr>  <chr>  <chr>    <chr>       
1 1     value1 value4 value7 label2   label2      
2 2     value2 value5 value8 label1   label1      
3 3     value3 value6 value9 label3   label3  

using base R

x <- match(df$label, names(df))
y <- 1:nrow(df)
z <- data.frame(y, x)
df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])

Time check

microbenchmark::microbenchmark(
  a = {
    df %>%
      rowwise %>%
      rownames_to_column(., "row") %>%
      mutate(currentvalue = .[[which(rownames(.) == row),which(names(.) == label)]])
  },
  b = {
    x <- match(df$label, names(df))
    y <- 1:nrow(df)
    z <- data.frame(y, x)
    df$currentvalue <- apply(z,1, function(x) df[x[1],x[2]])
  }
)

Unit: microseconds
 expr    min      lq     mean  median     uq     max neval cld
    a 6157.8 6861.95 8773.098 7465.75 9367.1 26232.8   100   b
    b  360.6  399.75  692.073  488.40  666.9  4225.0   100  a 
  • Related