I have data frame as follows:
df <- data.frame(Resource_Code = c("B-T234","B-T234","B-T234"),
"Valid_To"= c("2021-10-17", "2021-11-28", "2021-10-31"),
"V1"= c("2021-10-17", "2021-10-17", "2021-10-24"),
"V2" = c("2021-10-24", "2021-10-31", "2021-10-31"),
"V3" = c("2021-10-31", "2021-11-07", "2021-11-14"),
"V4" = c("2021-11-14", "2021-11-21", "2021-11-28"),
"V5" = c("2021-12-05", "2021-11-28", "2021-12-12"),
"V6" = c("2021-12-12", "2021-12-19", "2021-12-26"),
"V7" = c("2022-01-02", "2022-01-09", "2022-01-16"),
"V8" = c("2022-01-23", "2022-01-30", "2022-02-06"),
"V9" = c("2022-02-13", "2022-02-20", "2022-02-27"),
"v10" = c("2022-02-27", "2022-03-06", "2022-03-13"))
In the desired output, if any row cell(column V1 onwards) is equal to corresponding "Valid_To" row cell then subsequent cell (column V1 onwards) should become blank
Actual DF has hundreads of rows and columns and column V1 to VN in date format
The desired output is as follows:
df1 <- data.frame(Resource_Code = c("B-T234","B-T234","B-
T234"),
"Valid_To"= c("2021-10-17", "2021-11-28", "2021-10-31"),
"V1"= c("2021-10-17", "2021-10-17", "2021-10-24"),
"V2" = c("", "2021-10-31", "2021-10-31"),
"V3" = c("", "2021-11-07", ""),
"V4" = c("", "2021-11-21", ""),
"V5" = c("", "2021-11-28", ""),
"V6" = c("", "", ""),
"V7" = c("", "", ""),
"V8" = c("", "", ""),
"V9" = c("", "", ""),
"v10" = c("", "", ""))
My code is as follows: My code is NOT giving desired output Please help
for (i in 1:nrow(df1)){
for (j in 3:ncol(df1) ){
if (df[i,j] == df1[1,2]){
df[i,j 1] <- ""
}
}
}
CodePudding user response:
Base R way using apply
.
match
returns the position where the Valid_To
value matches the date in columns V1
to V10
. If the match
exist then we turn the values to blank from the next position till end of the series.
df[-1] <- t(apply(df[-1], 1, function(x) {
inds <- match(x[1], x[-1])
if(length(inds)) x[(inds 2):length(x)] <- ''
x
}))
df
# Resource_Code Valid_To V1 V2 V3 V4 V5 V6 V7 V8 V9 v10
#1 B-T234 2021-10-17 2021-10-17
#2 B-T234 2021-11-28 2021-10-17 2021-10-31 2021-11-07 2021-11-21 2021-11-28
#3 B-T234 2021-10-31 2021-10-24 2021-10-31
CodePudding user response:
If you want to use loop,
for (i in 1:dim(df)[1]){
dummy <- c()
for (j in 3:dim(df)[2]){
if (df[i,j] == df[i,2]){
dummy <- c(dummy, j)
}
}
check <- min(dummy) 1
df[i,c(check: dim(df)[2])] <- NA
}
df
Resource_Code Valid_To V1 V2 V3 V4 V5 V6 V7 V8 V9 v10
1 B-T234 2021-10-17 2021-10-17 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 B-T234 2021-11-28 2021-10-17 2021-10-31 2021-11-07 2021-11-21 2021-11-28 <NA> <NA> <NA> <NA> <NA>
3 B-T234 2021-10-31 2021-10-24 2021-10-31 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
CodePudding user response:
A base vectorised solution
tmp=grepl("V[0-9] ",colnames(df))
df[,tmp][df[,"Valid_To",drop=T]<df[,tmp]]=NA
Resource_Code Valid_To V1 V2 V3 V4 V5 V6 V7
1 B-T234 2021-10-17 2021-10-17 <NA> <NA> <NA> <NA> <NA> <NA>
2 B-T234 2021-11-28 2021-10-17 2021-10-31 2021-11-07 2021-11-21 2021-11-28 <NA> <NA>
3 B-T234 2021-10-31 2021-10-24 2021-10-31 <NA> <NA> <NA> <NA> <NA>
V8 V9 v10
1 <NA> <NA> 2022-02-27
2 <NA> <NA> 2022-03-06
3 <NA> <NA> 2022-03-13