Home > OS >  subsetting dataframe depending on second dataframe R
subsetting dataframe depending on second dataframe R

Time:09-07

I have two dataframes (df1 & df2) with the same dimensions. df1 contains numerical values:

df1
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1 11 21 31 41 51 61 71 81  91
2   2 12 22 32 42 52 62 72 82  92
3   3 13 23 33 43 53 63 73 83  93
4   4 14 24 34 44 54 64 74 84  94
5   5 15 25 35 45 55 65 75 85  95
6   6 16 26 36 46 56 66 76 86  96
7   7 17 27 37 47 57 67 77 87  97
8   8 18 28 38 48 58 68 78 88  98
9   9 19 29 39 49 59 69 79 89  99
10 10 20 30 40 50 60 70 80 90 100

and df2 contains strings:

      V1        V2    V3        V4        V5             V6    V7        V8             V9            V10
1  today     today today     today yesterday          today today     today          today tomorrow,today
2  today     today today     today     today      yesterday today     today          today          today
3  today yesterday today     today     today          today today yesterday          today          today
4  today     today today     today     today          today today     today tomorrow,today          today
5  today     today today yesterday yesterday          today today     today          today       tomorrow
6  today     today today yesterday     today tomorrow,today today     today          today          today
7  today     today today     today     today      yesterday today     today          today tomorrow,today
8  today yesterday today     today     today          today today     today          today          today
9  today     today today     today     today          today today     today tomorrow,today          today
10 today yesterday today yesterday     today          today today yesterday          today          today

please note that some cells contain more than one string (e.g. [1,V10])

Each cell of df1 corresponds to a cell in df2 (df1[1,1] <-> df2[1,1], df1[2,2] <-> df2[2,2], etc.).

Is there a smart way (preferably without using a for-loop) to only keep values in df1, where the string from the corresponding cell in df2 contains "today" and the rest is set to NA?

So the prefered output would be:

   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1 11 21 31 NA 51 61 71 81  91
2   2 12 22 32 42 NA 62 72 82  92
3   3 NA 23 33 43 53 63 NA 83  93
4   4 14 24 34 44 54 64 74 84  94
5   5 15 25 NA NA 55 65 75 85  NA
6   6 16 26 NA 46 56 66 76 86  96
7   7 17 27 37 47 NA 67 77 87  97
8   8 NA 28 38 48 58 68 78 88  98
9   9 19 29 39 49 59 69 79 89  99
10 10 NA 30 NA 50 60 70 NA 90 100

Thanks in advance!

CodePudding user response:

Here is another option using a combination of tidyverse and base R:

library(tidyverse)

df1[!mutate(df2, across(everything(), ~ str_detect(.x, "today")))] <- NA

Output

   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1 11 21 31 NA 51 61 71 81  91
2   2 12 22 32 42 NA 62 72 82  92
3   3 NA 23 33 43 53 63 NA 83  93
4   4 14 24 34 44 54 64 74 84  94
5   5 15 25 NA NA 55 65 75 85  NA
6   6 16 26 NA 46 56 66 76 86  96
7   7 17 27 37 47 NA 67 77 87  97
8   8 NA 28 38 48 58 68 78 88  98
9   9 19 29 39 49 59 69 79 89  99
10 10 NA 30 NA 50 60 70 NA 90 100

Data

df1 <- structure(list(V1 = 1:10, V2 = 11:20, V3 = 21:30, V4 = 31:40, 
    V5 = 41:50, V6 = 51:60, V7 = 61:70, V8 = 71:80, V9 = 81:90, 
    V10 = 91:100), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))

df2 <- structure(list(V1 = c("today", "today", "today", "today", "today", 
"today", "today", "today", "today", "today"), V2 = c("today", 
"today", "yesterday", "today", "today", "today", "today", "yesterday", 
"today", "yesterday"), V3 = c("today", "today", "today", "today", 
"today", "today", "today", "today", "today", "today"), V4 = c("today", 
"today", "today", "today", "yesterday", "yesterday", "today", 
"today", "today", "yesterday"), V5 = c("yesterday", "today", 
"today", "today", "yesterday", "today", "today", "today", "today", 
"today"), V6 = c("today", "yesterday", "today", "today", "today", 
"tomorrow,today", "yesterday", "today", "today", "today"), V7 = c("today", 
"today", "today", "today", "today", "today", "today", "today", 
"today", "today"), V8 = c("today", "today", "yesterday", "today", 
"today", "today", "today", "today", "today", "yesterday"), V9 = c("today", 
"today", "today", "tomorrow,today", "today", "today", "today", 
"today", "tomorrow,today", "today"), V10 = c("tomorrow,today", 
"today", "today", "today", "tomorrow", "today", "tomorrow,today", 
"today", "today", "today")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))

CodePudding user response:

We could use

replace(df1, !sapply(df2, grepl, pattern = "today"), NA)

-output

 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1 11 21 31 NA 51 61 71 81  91
2   2 12 22 32 42 NA 62 72 82  92
3   3 NA 23 33 43 53 63 NA 83  93
4   4 14 24 34 44 54 64 74 84  94
5   5 15 25 NA NA 55 65 75 85  NA
6   6 16 26 NA 46 56 66 76 86  96
7   7 17 27 37 47 NA 67 77 87  97
8   8 NA 28 38 48 58 68 78 88  98
9   9 19 29 39 49 59 69 79 89  99
10 10 NA 30 NA 50 60 70 NA 90 100

Or slightly more compact

df1 * NA^!grepl("today", as.matrix(df2))
   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1   1 11 21 31 NA 51 61 71 81  91
2   2 12 22 32 42 NA 62 72 82  92
3   3 NA 23 33 43 53 63 NA 83  93
4   4 14 24 34 44 54 64 74 84  94
5   5 15 25 NA NA 55 65 75 85  NA
6   6 16 26 NA 46 56 66 76 86  96
7   7 17 27 37 47 NA 67 77 87  97
8   8 NA 28 38 48 58 68 78 88  98
9   9 19 29 39 49 59 69 79 89  99
10 10 NA 30 NA 50 60 70 NA 90 100

data

df1 <- structure(list(V1 = 1:10, V2 = 11:20, V3 = 21:30, V4 = 31:40, 
    V5 = 41:50, V6 = 51:60, V7 = 61:70, V8 = 71:80, V9 = 81:90, 
    V10 = 91:100), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9", "10"))

df2 <- structure(list(V1 = c("today", "today", "today", "today", "today", 
"today", "today", "today", "today", "today"), V2 = c("today", 
"today", "yesterday", "today", "today", "today", "today", "yesterday", 
"today", "yesterday"), V3 = c("today", "today", "today", "today", 
"today", "today", "today", "today", "today", "today"), V4 = c("today", 
"today", "today", "today", "yesterday", "yesterday", "today", 
"today", "today", "yesterday"), V5 = c("yesterday", "today", 
"today", "today", "yesterday", "today", "today", "today", "today", 
"today"), V6 = c("today", "yesterday", "today", "today", "today", 
"tomorrow,today", "yesterday", "today", "today", "today"), V7 = c("today", 
"today", "today", "today", "today", "today", "today", "today", 
"today", "today"), V8 = c("today", "today", "yesterday", "today", 
"today", "today", "today", "today", "today", "yesterday"), V9 = c("today", 
"today", "today", "tomorrow,today", "today", "today", "today", 
"today", "tomorrow,today", "today"), V10 = c("tomorrow,today", 
"today", "today", "today", "tomorrow", "today", "tomorrow,today", 
"today", "today", "today")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
  • Related