I have a dataframe that looks something like this:
P1 <- c('P1 -> (Normal,_)', 'P1 -> (Normal,_)', 'NA', 'P5 -> (UP,_)')
P2 <- c('P4 -> (UP,_)', 'NA', 'P2 -> (UP,_)', 'P4 -> (UP,_)')
P3 <- c('P2 -> (UP,_)', 'P3 -> (UP,_)', 'P1 -> (UP,_)', 'P2 -> (UP,_)')
P4 <- c('NA', 'P4 -> (UP,_)', 'P3 -> (UP,_)', 'P3 -> (UP,_)')
P5 <- c('P3 -> (UP,_)', 'NA', 'NA', 'NA')
df <- data.frame(P1, P2, P3, P4, P5)
I need it to be ordered in a way that P1 column contains only P1 values, P2 column - P2 values, etc. If there is no value for that column, it should contain 'NA'.
So, the resulting dataframe should look like this:
CodePudding user response:
Having "NA"
instead of NA
probably isn't wise, but you can do this with a bit of indexing after matching up the P1/2/3/4/5
stem with the variable name:
sel <- df != "NA" ## use is.na(df) instead if data is actually NA
out <- replace(df, , "NA") ## use NA not "NA" if want an actual NA
out[ cbind(row(df)[sel], match(substr(df[sel],1,2), names(df)) ) ] <- df[sel]
out
# P1 P2 P3 P4 P5
#1 P1 -> (Normal,_) P2 -> (UP,_) P3 -> (UP,_) P4 -> (UP,_) NA
#2 P1 -> (Normal,_) NA P3 -> (UP,_) P4 -> (UP,_) NA
#3 P1 -> (UP,_) P2 -> (UP,_) P3 -> (UP,_) NA NA
#4 NA P2 -> (UP,_) P3 -> (UP,_) P4 -> (UP,_) P5 -> (UP,_)
CodePudding user response:
In this code I use 5 different ways, one for each column, but all of them made the same thing. You could choose one and the replicate for other columns.
Maybe could be usefull ;)
library(grepl)
library(stringr)
library(dplyr)
df %>% mutate(P1 = ifelse(P1 == "P1 -> (Normal,_)", P1, NA),
P2 = ifelse(P2 != "P2 -> (UP,_)", NA, P2),
P3 = ifelse(grepl("P3", P3), P3, NA),
P4 = ifelse(str_starts(P4, "P4"), P4, NA),
P5 = ifelse(str_detect(P5, "P5"), P5, NA))