I have a dataset datA which is a sample of the population. In that sample, some groups are too small, so they have to be added together. Because the sample needs to be weighted (sample/pop), these changes have to be made in the population as well. The issue is that simply applying the same code to datB does not work, because there are enough observations in the population.

Example data

I have example data as follows:

library(data.table)
datA <- fread("
NA,0,2,NA,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
3,4,3,0,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,35,cat Y, type 4
NA,0,2,NA,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
NA,4,3,NA,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,1,cat Y, type 4
1,0,2,4,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
3,4,3,2,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,2,cat Y, type 4
")

names(datA) <- c("A","B","C", "D", "cat", "type")

Changes made to sample/datA

observations_grp <- function(x) {
  cumsum_i <- 0
  nxtgrp <-  F
  n <- length(x)
  grp <- rep(0,n)
  grp_i <- 0;
  for (i in 1:n) {
    if (nxtgrp) {grp_i <- grp_i   1; cumsum_i <- 0;}
    nxtgrp <- !((cumsum_i   x[i]) < 2)
    cumsum_i <- cumsum_i   x[i]
    grp[i] <- grp_i
  }
  grp
}

datA <- datA[, observations_D := sum(!is.na(D)), by = c("cat", "type")]
datA <- datA[, new_type := type]
datA[,`:=`(new_type = last(new_type), observations_D =sum(observations_D)),
        .(cat,observations_grp(observations_D))
][]

I can get the rows that have been changed out by doing the following:

changed_rows <- datA[type!=new_type]

Making the same changes in the population/datB

Now I want to make these type changes to a different dataset, with a similar structure, but a different amount of rows, the population data represented by datB:

datB <- cbind(datA, datA)

I would like to use the information from changed_rows to make changes in pop_dat. Something like:

if the cat of changed_rows is equal to cat of pop_dat, change the type in changed_rows to the new type from changed_rows.

setDT(datB)[datB$cat==changed_rows$cat, new_type:=changed_rows$new_type]

I have no idea how to write this syntax however. Could someone give me some pointers?

CodePudding user response：

Create a data frame describing the type changes you computed:

type_changes <- unique(changed_rows[, .(cat, type, new_type)])
type_changes
#>      cat   type new_type
#> 1: cat X type 1   type 2
#> 2: cat Y type 1   type 3
#> 3: cat Y type 2   type 3

Then apply them to the other data frame with an update join:

datB <- rbind(datA, datA)[, 1:6]
head(datB)
#>     A B C  D   cat   type
#> 1: NA 0 2 NA cat X type 1
#> 2:  3 4 3  1 cat X type 2
#> 3:  1 0 2  2 cat X type 3
#> 4:  3 4 3  0 cat X type 4
#> 5: NA 0 2 NA cat Y type 1
#> 6: NA 4 3 NA cat Y type 2

datB[type_changes, type := new_type, on = .(cat, type)]
head(datB)
#>     A B C  D   cat   type
#> 1: NA 0 2 NA cat X type 2
#> 2:  3 4 3  1 cat X type 2
#> 3:  1 0 2  2 cat X type 3
#> 4:  3 4 3  0 cat X type 4
#> 5: NA 0 2 NA cat Y type 3
#> 6: NA 4 3 NA cat Y type 3