I have a dataset datA
which is a sample of the population. In that sample, some groups are too small, so they have to be added together. Because the sample needs to be weighted (sample/pop), these changes have to be made in the population as well. The issue is that simply applying the same code to datB
does not work, because there are enough observations in the population.
Example data
I have example data as follows:
library(data.table)
datA <- fread("
NA,0,2,NA,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
3,4,3,0,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,35,cat Y, type 4
NA,0,2,NA,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
NA,4,3,NA,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,1,cat Y, type 4
1,0,2,4,cat X, type 1
3,4,3,1,cat X, type 2
1,0,2,2,cat X, type 3
3,4,3,2,cat X, type 4
NA,0,2,NA,cat Y, type 1
NA,4,3,NA,cat Y, type 2
1,0,2,2,cat Y, type 3
3,4,3,2,cat Y, type 4
")
names(datA) <- c("A","B","C", "D", "cat", "type")
Changes made to sample/datA
observations_grp <- function(x) {
cumsum_i <- 0
nxtgrp <- F
n <- length(x)
grp <- rep(0,n)
grp_i <- 0;
for (i in 1:n) {
if (nxtgrp) {grp_i <- grp_i 1; cumsum_i <- 0;}
nxtgrp <- !((cumsum_i x[i]) < 2)
cumsum_i <- cumsum_i x[i]
grp[i] <- grp_i
}
grp
}
datA <- datA[, observations_D := sum(!is.na(D)), by = c("cat", "type")]
datA <- datA[, new_type := type]
datA[,`:=`(new_type = last(new_type), observations_D =sum(observations_D)),
.(cat,observations_grp(observations_D))
][]
I can get the rows that have been changed out by doing the following:
changed_rows <- datA[type!=new_type]
Making the same changes in the population/datB
Now I want to make these type changes to a different dataset, with a similar structure, but a different amount of rows, the population data represented by datB
:
datB <- cbind(datA, datA)
I would like to use the information from changed_rows
to make changes in pop_dat
. Something like:
if the cat
of changed_rows
is equal to cat
of pop_dat
, change the type
in changed_rows
to the new type
from changed_rows
.
setDT(datB)[datB$cat==changed_rows$cat, new_type:=changed_rows$new_type]
I have no idea how to write this syntax however. Could someone give me some pointers?
CodePudding user response:
Create a data frame describing the type changes you computed:
type_changes <- unique(changed_rows[, .(cat, type, new_type)])
type_changes
#> cat type new_type
#> 1: cat X type 1 type 2
#> 2: cat Y type 1 type 3
#> 3: cat Y type 2 type 3
Then apply them to the other data frame with an update join:
datB <- rbind(datA, datA)[, 1:6]
head(datB)
#> A B C D cat type
#> 1: NA 0 2 NA cat X type 1
#> 2: 3 4 3 1 cat X type 2
#> 3: 1 0 2 2 cat X type 3
#> 4: 3 4 3 0 cat X type 4
#> 5: NA 0 2 NA cat Y type 1
#> 6: NA 4 3 NA cat Y type 2
datB[type_changes, type := new_type, on = .(cat, type)]
head(datB)
#> A B C D cat type
#> 1: NA 0 2 NA cat X type 2
#> 2: 3 4 3 1 cat X type 2
#> 3: 1 0 2 2 cat X type 3
#> 4: 3 4 3 0 cat X type 4
#> 5: NA 0 2 NA cat Y type 3
#> 6: NA 4 3 NA cat Y type 3