Home > Blockchain >  Subsetting a dataframe from another dataframe when we have an NA
Subsetting a dataframe from another dataframe when we have an NA

Time:05-03

I have a dataframe (df1) which contains my entire data

                    Measures                          Format
1            space and shape     Constructed Response Expert
2            space and shape     Constructed Response Manual
3            space and shape     Constructed Response Expert
4            space and shape          Simple Multiple Choice
5            space and shape Constructed Response Auto-coded
6   change and relationships     Constructed Response Expert
7   change and relationships     Constructed Response Expert
8   change and relationships     Constructed Response Expert
9   change and relationships         Complex Multiple Choice
10  change and relationships         Complex Multiple Choice
11           space and shape         Complex Multiple Choice
12           space and shape          Simple Multiple Choice
13           space and shape     Constructed Response Expert
14           space and shape     Constructed Response Expert
15      uncertainty and data         Complex Multiple Choice
16                  quantity     Constructed Response Manual
17      uncertainty and data          Simple Multiple Choice
18      uncertainty and data         Complex Multiple Choice
19      uncertainty and data          Simple Multiple Choice
20                  quantity     Constructed Response Manual
21  change and relationships     Constructed Response Manual
22  change and relationships     Constructed Response Expert
23           space and shape          Simple Multiple Choice
24           space and shape     Constructed Response Expert
25           space and shape Constructed Response Auto-coded
26                  quantity     Constructed Response Manual
27                  quantity         Complex Multiple Choice
28                  quantity     Constructed Response Manual
29                  quantity          Simple Multiple Choice
30                  quantity          Simple Multiple Choice
31      uncertainty and data          Simple Multiple Choice
32  change and relationships          Simple Multiple Choice
33                  quantity         Complex Multiple Choice
34                  quantity          Simple Multiple Choice
35      uncertainty and data Constructed Response Auto-coded
36  change and relationships     Constructed Response Expert
37      uncertainty and data     Constructed Response Manual
38                  quantity     Constructed Response Manual
39  change and relationships     Constructed Response Expert
40  change and relationships     Constructed Response Manual
41                  quantity         Complex Multiple Choice
42                  quantity     Constructed Response Expert
43                  quantity          Simple Multiple Choice
44                  quantity     Constructed Response Expert
45                  quantity     Constructed Response Manual
46                  quantity          Simple Multiple Choice
47  change and relationships     Constructed Response Expert
48      uncertainty and data          Simple Multiple Choice
49  change and relationships     Constructed Response Manual
50      uncertainty and data          Simple Multiple Choice
51      uncertainty and data          Simple Multiple Choice
52      uncertainty and data          Simple Multiple Choice
53                  quantity     Constructed Response Manual
54                  quantity     Constructed Response Manual
55                  quantity          Simple Multiple Choice
56           space and shape          Simple Multiple Choice
57  change and relationships     Constructed Response Expert
58                  quantity     Constructed Response Manual
59           space and shape     Constructed Response Manual
60           space and shape          Simple Multiple Choice
61  change and relationships     Constructed Response Manual
62  change and relationships     Constructed Response Expert
63      uncertainty and data          Simple Multiple Choice
64      uncertainty and data          Simple Multiple Choice
65                  quantity          Simple Multiple Choice
66  change and relationships     Constructed Response Expert
67                  quantity     Constructed Response Manual
68  change and relationships          Simple Multiple Choice
69           space and shape     Constructed Response Expert
70                  quantity          Simple Multiple Choice
71                  quantity     Constructed Response Manual
72                  quantity     Constructed Response Expert
73           space and shape         Complex Multiple Choice
74           space and shape         Complex Multiple Choice
75           space and shape     Constructed Response Expert
76      uncertainty and data     Constructed Response Expert
77      uncertainty and data     Constructed Response Manual
78      uncertainty and data     Constructed Response Expert
79  change and relationships     Constructed Response Manual
80  change and relationships     Constructed Response Expert
81  change and relationships     Constructed Response Expert
82      uncertainty and data     Constructed Response Manual
83      uncertainty and data     Constructed Response Expert
84      uncertainty and data     Constructed Response Expert
85  change and relationships          Simple Multiple Choice
86  change and relationships          Simple Multiple Choice
87  change and relationships     Constructed Response Manual
88  change and relationships     Constructed Response Expert
89  change and relationships          Simple Multiple Choice
90      uncertainty and data     Constructed Response Expert
91           space and shape     Constructed Response Manual
92           space and shape         Complex Multiple Choice
93      uncertainty and data     Constructed Response Manual
94      uncertainty and data     Constructed Response Manual
95      uncertainty and data         Complex Multiple Choice
96      uncertainty and data          Simple Multiple Choice
97      uncertainty and data          Simple Multiple Choice
98                  quantity          Simple Multiple Choice
99                  quantity     Constructed Response Manual
100          space and shape          Simple Multiple Choice
101          space and shape     Constructed Response Expert
102          space and shape     Constructed Response Manual
103          space and shape     Constructed Response Manual
104 change and relationships     Constructed Response Expert
105          space and shape     Constructed Response Manual
106          space and shape     Constructed Response Expert
107                 quantity          Simple Multiple Choice
108 change and relationships     Constructed Response Manual
109 change and relationships         Complex Multiple Choice

I have another dataframe df2 (notice it has a 'number' column) which I use to subset from my df1. The 'Number' column tells me 'how many' of this type of row i want from my original dataset(df1)

         Measures                          Format Number
1 space and shape     Constructed Response Expert      2
2 space and shape     Constructed Response Manual      1
4 space and shape          Simple Multiple Choice      2
5 space and shape Constructed Response Auto-coded      1
6           asdaf                           asfas      0

I use the following code to do this

library(tidyverse)

inner_join(df1,df2) %>% 
  group_by(Measures, Format) %>% 
  slice(n=1:min(Number)) %>% 
  ungroup

However lets say my dataset looked like this (notice we have an 'NA'). In this case I would want to get 4 types of any row of 'space and shape' of any format (Ofc I don't want it to repeat- by this i mean the 2nd ,4rth and 5th row also ask for 'space and shape'. I don't want these row to be repeated when I ask for any format of space and shape in row 1) .

         Measures                          Format Number
1 space and shape                            <NA>      4
2 space and shape     Constructed Response Manual      1
4 space and shape          Simple Multiple Choice      2
5 space and shape Constructed Response Auto-coded      1
6           asdaf                           asfas      0

How can I do this?

Data is as follows:

df1

df1<-structure(list(Measures = c("space and shape", "space and shape", 
    "space and shape", "space and shape", "space and shape", "change and relationships", 
    "change and relationships", "change and relationships", "change and relationships", 
    "change and relationships", "space and shape", "space and shape", 
    "space and shape", "space and shape", "uncertainty and data", 
    "quantity", "uncertainty and data", "uncertainty and data", "uncertainty and data", 
    "quantity", "change and relationships", "change and relationships", 
    "space and shape", "space and shape", "space and shape", "quantity", 
    "quantity", "quantity", "quantity", "quantity", "uncertainty and data", 
    "change and relationships", "quantity", "quantity", "uncertainty and data", 
    "change and relationships", "uncertainty and data", "quantity", 
    "change and relationships", "change and relationships", "quantity", 
    "quantity", "quantity", "quantity", "quantity", "quantity", "change and relationships", 
    "uncertainty and data", "change and relationships", "uncertainty and data", 
    "uncertainty and data", "uncertainty and data", "quantity", "quantity", 
    "quantity", "space and shape", "change and relationships", "quantity", 
    "space and shape", "space and shape", "change and relationships", 
    "change and relationships", "uncertainty and data", "uncertainty and data", 
    "quantity", "change and relationships", "quantity", "change and relationships", 
    "space and shape", "quantity", "quantity", "quantity", "space and shape", 
    "space and shape", "space and shape", "uncertainty and data", 
    "uncertainty and data", "uncertainty and data", "change and relationships", 
    "change and relationships", "change and relationships", "uncertainty and data", 
    "uncertainty and data", "uncertainty and data", "change and relationships", 
    "change and relationships", "change and relationships", "change and relationships", 
    "change and relationships", "uncertainty and data", "space and shape", 
    "space and shape", "uncertainty and data", "uncertainty and data", 
    "uncertainty and data", "uncertainty and data", "uncertainty and data", 
    "quantity", "quantity", "space and shape", "space and shape", 
    "space and shape", "space and shape", "change and relationships", 
    "space and shape", "space and shape", "quantity", "change and relationships", 
    "change and relationships"), Format = c("Constructed Response Expert", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Simple Multiple Choice", "Constructed Response Auto-coded", 
    "Constructed Response Expert", "Constructed Response Expert", 
    "Constructed Response Expert", "Complex Multiple Choice", "Complex Multiple Choice", 
    "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert", 
    "Constructed Response Expert", "Complex Multiple Choice", "Constructed Response Manual", 
    "Simple Multiple Choice", "Complex Multiple Choice", "Simple Multiple Choice", 
    "Constructed Response Manual", "Constructed Response Manual", 
    "Constructed Response Expert", "Simple Multiple Choice", "Constructed Response Expert", 
    "Constructed Response Auto-coded", "Constructed Response Manual", 
    "Complex Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice", 
    "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
    "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Auto-coded", 
    "Constructed Response Expert", "Constructed Response Manual", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Constructed Response Manual", "Complex Multiple Choice", "Constructed Response Expert", 
    "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual", 
    "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice", 
    "Constructed Response Manual", "Simple Multiple Choice", "Simple Multiple Choice", 
    "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Manual", 
    "Simple Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert", 
    "Constructed Response Manual", "Constructed Response Manual", 
    "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Expert", 
    "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
    "Constructed Response Expert", "Constructed Response Manual", 
    "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Complex Multiple Choice", "Complex Multiple Choice", "Constructed Response Expert", 
    "Constructed Response Expert", "Constructed Response Manual", 
    "Constructed Response Expert", "Constructed Response Manual", 
    "Constructed Response Expert", "Constructed Response Expert", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Constructed Response Expert", "Simple Multiple Choice", "Simple Multiple Choice", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual", 
    "Complex Multiple Choice", "Constructed Response Manual", "Constructed Response Manual", 
    "Complex Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice", 
    "Simple Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice", 
    "Constructed Response Expert", "Constructed Response Manual", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Constructed Response Manual", "Constructed Response Expert", 
    "Simple Multiple Choice", "Constructed Response Manual", "Complex Multiple Choice"
    )), row.names = c(NA, -109L), class = "data.frame")

df2(without NA)

df2<- structure(list(Measures = c("space and shape", "space and shape", 
    "space and shape", "space and shape", "asdaf"), Format = c("Constructed Response Expert", 
    "Constructed Response Manual", "Simple Multiple Choice", "Constructed Response Auto-coded", 
    "asfas"), Number = c(2, 1, 2, 1, 0)), row.names = c("1", "2", 
    "4", "5", "6"), class = "data.frame")

df2 (with NA)

df2<- structure(list(Measures = c("space and shape", "space and shape", "space and shape", "space and shape", "asdaf"), Format = c(NA, "Constructed Response Manual", "Simple Multiple Choice", "Constructed Response Auto-coded", "asfas"), Number = c(4, 1, 2, 1, 0)), row.names = c("1", "2", "4", "5", "6"), class = "data.frame")

Here is an example of expected output(it can be something else too). I ask for 4 types of 'space and shape' rows which can be of 'any' format(because i have put NA) :

CodePudding user response:

library(tidyverse)

df1 <- structure(list(Measures = c(
  "space and shape", "space and shape",
  "space and shape", "space and shape", "space and shape", "change and relationships",
  "change and relationships", "change and relationships", "change and relationships",
  "change and relationships", "space and shape", "space and shape",
  "space and shape", "space and shape", "uncertainty and data",
  "quantity", "uncertainty and data", "uncertainty and data", "uncertainty and data",
  "quantity", "change and relationships", "change and relationships",
  "space and shape", "space and shape", "space and shape", "quantity",
  "quantity", "quantity", "quantity", "quantity", "uncertainty and data",
  "change and relationships", "quantity", "quantity", "uncertainty and data",
  "change and relationships", "uncertainty and data", "quantity",
  "change and relationships", "change and relationships", "quantity",
  "quantity", "quantity", "quantity", "quantity", "quantity", "change and relationships",
  "uncertainty and data", "change and relationships", "uncertainty and data",
  "uncertainty and data", "uncertainty and data", "quantity", "quantity",
  "quantity", "space and shape", "change and relationships", "quantity",
  "space and shape", "space and shape", "change and relationships",
  "change and relationships", "uncertainty and data", "uncertainty and data",
  "quantity", "change and relationships", "quantity", "change and relationships",
  "space and shape", "quantity", "quantity", "quantity", "space and shape",
  "space and shape", "space and shape", "uncertainty and data",
  "uncertainty and data", "uncertainty and data", "change and relationships",
  "change and relationships", "change and relationships", "uncertainty and data",
  "uncertainty and data", "uncertainty and data", "change and relationships",
  "change and relationships", "change and relationships", "change and relationships",
  "change and relationships", "uncertainty and data", "space and shape",
  "space and shape", "uncertainty and data", "uncertainty and data",
  "uncertainty and data", "uncertainty and data", "uncertainty and data",
  "quantity", "quantity", "space and shape", "space and shape",
  "space and shape", "space and shape", "change and relationships",
  "space and shape", "space and shape", "quantity", "change and relationships",
  "change and relationships"
), Format = c(
  "Constructed Response Expert",
  "Constructed Response Manual", "Constructed Response Expert",
  "Simple Multiple Choice", "Constructed Response Auto-coded",
  "Constructed Response Expert", "Constructed Response Expert",
  "Constructed Response Expert", "Complex Multiple Choice", "Complex Multiple Choice",
  "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert",
  "Constructed Response Expert", "Complex Multiple Choice", "Constructed Response Manual",
  "Simple Multiple Choice", "Complex Multiple Choice", "Simple Multiple Choice",
  "Constructed Response Manual", "Constructed Response Manual",
  "Constructed Response Expert", "Simple Multiple Choice", "Constructed Response Expert",
  "Constructed Response Auto-coded", "Constructed Response Manual",
  "Complex Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice",
  "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice",
  "Complex Multiple Choice", "Simple Multiple Choice", "Constructed Response Auto-coded",
  "Constructed Response Expert", "Constructed Response Manual",
  "Constructed Response Manual", "Constructed Response Expert",
  "Constructed Response Manual", "Complex Multiple Choice", "Constructed Response Expert",
  "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual",
  "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice",
  "Constructed Response Manual", "Simple Multiple Choice", "Simple Multiple Choice",
  "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Manual",
  "Simple Multiple Choice", "Simple Multiple Choice", "Constructed Response Expert",
  "Constructed Response Manual", "Constructed Response Manual",
  "Simple Multiple Choice", "Constructed Response Manual", "Constructed Response Expert",
  "Simple Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice",
  "Constructed Response Expert", "Constructed Response Manual",
  "Simple Multiple Choice", "Constructed Response Expert", "Simple Multiple Choice",
  "Constructed Response Manual", "Constructed Response Expert",
  "Complex Multiple Choice", "Complex Multiple Choice", "Constructed Response Expert",
  "Constructed Response Expert", "Constructed Response Manual",
  "Constructed Response Expert", "Constructed Response Manual",
  "Constructed Response Expert", "Constructed Response Expert",
  "Constructed Response Manual", "Constructed Response Expert",
  "Constructed Response Expert", "Simple Multiple Choice", "Simple Multiple Choice",
  "Constructed Response Manual", "Constructed Response Expert",
  "Simple Multiple Choice", "Constructed Response Expert", "Constructed Response Manual",
  "Complex Multiple Choice", "Constructed Response Manual", "Constructed Response Manual",
  "Complex Multiple Choice", "Simple Multiple Choice", "Simple Multiple Choice",
  "Simple Multiple Choice", "Constructed Response Manual", "Simple Multiple Choice",
  "Constructed Response Expert", "Constructed Response Manual",
  "Constructed Response Manual", "Constructed Response Expert",
  "Constructed Response Manual", "Constructed Response Expert",
  "Simple Multiple Choice", "Constructed Response Manual", "Complex Multiple Choice"
)), row.names = c(NA, -109L), class = "data.frame")

df2 <- structure(list(Measures = c("space and shape", "space and shape", "space and shape", "space and shape", "asdaf"), Format = c(NA, "Constructed Response Manual", "Simple Multiple Choice", "Constructed Response Auto-coded", "asfas"), Number = c(4, 1, 2, 1, 0)), row.names = c("1", "2", "4", "5", "6"), class = "data.frame")


set.seed(1337)

df2 %>%
  nrow() %>%
  seq() %>%
  map(~ {
    row <- df2 %>%
      slice(.x) %>%
      as.list()
    if (is.na(row$Format)) {
      # any format
      df1 %>%
        filter(Measures == row$Measures) %>%
        sample_n(row$Number) %>%
        mutate(Number = row$Number)
    } else {
      df1 %>%
        filter(Measures == row$Measures & Format == row$Format) %>%
        sample_n(row$Number) %>%
        mutate(Number = row$Number)
    }
  }) %>%
  bind_rows()
#>          Measures                          Format Number
#> 1 space and shape          Simple Multiple Choice      4
#> 2 space and shape     Constructed Response Expert      4
#> 3 space and shape         Complex Multiple Choice      4
#> 4 space and shape         Complex Multiple Choice      4
#> 5 space and shape     Constructed Response Manual      1
#> 6 space and shape          Simple Multiple Choice      2
#> 7 space and shape          Simple Multiple Choice      2
#> 8 space and shape Constructed Response Auto-coded      1

Created on 2022-05-03 by the reprex package (v2.0.0)

  • Related