I have a big dataset in R, and one of the columns has specific column names (target) for each row, as you can see below. What I want to do is to check every row for the column name, copy the values from that specific column, and paste it back into the relevant row of the "target" column.
target col1 col2 col3
col1 green one dog
col1 pink two cat
col3 blue three spider
col2 black four pig
col3 purple five elephant
col2 yellow six lion
This is my desired outcome:
target col1 col2 col3
green green one dog
pink pink two cat
spider blue three spider
four black four pig
fox purple five fox
six yellow six lion
I tried the code below, but after first a few hundreds row, it copies the wrong values.
df$target[df$target=="[col1]"] <- df$col1
df$target[df$target=="[col2]"] <- df$col2
df$target[df$target=="[col3]"] <- df$col3
df$target[df$target=="[col4]"] <- df$col4
Any suggestions? Thanks in advance
CodePudding user response:
You can use sapply
:
df$test <- sapply(1:nrow(df),
function(x) df[x, 1] <- df[x, names(df) == df[x,1]])
Output:
# target col1 col2 col3 test
# 1 col1 green one dog green
# 2 col1 pink two cat pink
# 3 col3 blue three spider spider
# 4 col2 black four pig four
# 5 col3 purple five elephant elephant
# 6 col2 yellow six lion six
Note I created a test
column to check but you could just overwrite target
Data
df <- read.table(text = "target col1 col2 col3
col1 green one dog
col1 pink two cat
col3 blue three spider
col2 black four pig
col3 purple five elephant
col2 yellow six lion", header = TRUE)
CodePudding user response:
The base vectorized approach: If you use a two-column numeric matrix with "[" you can get values for the rows and columns.
dat <- read.table(text="target col1 col2 col3
col1 green one dog
col1 pink two cat
col3 blue three spider
col2 black four pig
col3 purple five elephant
col2 yellow six lion", head=TRUE)
# extract col number from first column and add one to offset it
# then use that as the column numer for
dat[ matrix( c( 1:nrow(dat), as.numeric(gsub("col", "", dat[[1]])) 1 ), ncol=2) ]
[1] "green" "pink" "spider" "four" "elephant" "six"
Then just assign it to column "target".
> dat[, 1] <- dat[ matrix( c( 1:nrow(dat), as.numeric(gsub("col", "", dat[[1]])) 1 ), ncol=2) ]
> dat
target col1 col2 col3
1 green green one dog
2 pink pink two cat
3 spider blue three spider
4 four black four pig
5 elephant purple five elephant
6 six yellow six lion