Check if a column of a dataset contains a certain value and if not insert a new row-CodePudding

Given the following data set:

df <-
data.frame (
var2kreuz  = c(rep(c("AAA"), 8), rep(c("BBB"), 8)),
cohort  = c(
  rep(c("2012/13"), 3),
  rep(c("2013/14"), 2),
  rep(c("2014/15"), 3),
  rep(c("2012/13"), 2),
  rep(c("2013/14"), 3),
  rep(c("2014/15"), 3)
),
var2use  = c(
  "negative",
  "neutral",
  "positive",
  "neutral",
  "positive",
  "negative",
  "neutral",
  "positive",
  "negative",
  "neutral",
  "negative",
  "neutral",
  "positive",
  "negative",
  "neutral",
  "positive"
),
n = round(runif(8, min = 0, max = 101))
)

I now want to check for each group in var2kreuz and each cohort in cohort whether there are results for the three expressions negative, neutral and positive in var2use.

If there are no results for one or more expressions for a cohort, a row is to be added to the data set, e.g. for AAA and cohort 2013/14 or BBB and cohort 2012/13:

AAA in var2kreuz, 2013/14 in cohort , negative in var2use and 0 in n and BBB in var2kreuz, 2012/13 in cohort , positive in var2use and 0 in n.

CodePudding user response：

You can use tidyr::complete:

library(tidyr)
df %>% 
  complete(nesting(var2kreuz, cohort), var2use,
           fill = list(n = 0))

output

# A tibble: 18 × 4
   var2kreuz cohort  var2use      n
   <chr>     <chr>   <chr>    <dbl>
 1 AAA       2012/13 negative    75
 2 AAA       2012/13 neutral     32
 3 AAA       2012/13 positive    88
 4 AAA       2013/14 negative     0
 5 AAA       2013/14 neutral     77
 6 AAA       2013/14 positive    73
 7 AAA       2014/15 negative    11
 8 AAA       2014/15 neutral     10
 9 AAA       2014/15 positive     5
10 BBB       2012/13 negative    75
11 BBB       2012/13 neutral     32
12 BBB       2012/13 positive     0
13 BBB       2013/14 negative    88
14 BBB       2013/14 neutral     77
15 BBB       2013/14 positive    73
16 BBB       2014/15 negative    11
17 BBB       2014/15 neutral     10
18 BBB       2014/15 positive     5

CodePudding user response：

merge it with an expand.grid.

merge(df, do.call(expand.grid, lapply(df[1:3], unique)), all=TRUE)
#    var2kreuz  cohort  var2use   n
# 1        AAA 2012/13 negative  44
# 2        AAA 2012/13  neutral 101
# 3        AAA 2012/13 positive  93
# 4        AAA 2013/14 negative  NA
# 5        AAA 2013/14  neutral  19
# 6        AAA 2013/14 positive  66
# 7        AAA 2014/15 negative  66
# 8        AAA 2014/15  neutral  50
# 9        AAA 2014/15 positive  98
# 10       BBB 2012/13 negative  44
# 11       BBB 2012/13  neutral 101
# 12       BBB 2012/13 positive  NA
# 13       BBB 2013/14 negative  93
# 14       BBB 2013/14  neutral  19
# 15       BBB 2013/14 positive  66
# 16       BBB 2014/15 negative  66
# 17       BBB 2014/15  neutral  50
# 18       BBB 2014/15 positive  98