Home > Back-end >  R for loop not updating dataframe
R for loop not updating dataframe

Time:01-09

So I have this code that does a for loop but doesn't actually update my dataframe after the first run. If I do two runs without the for loop it works just fine. I'm probably overlooking something obvious but really don't see it.

Here is the code:

#preparation
data <- c("MARRIED", "DIVORCED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "NEVER MARRIED", "MARRIED", "MARRIED", "MARRIED", "SEPARATED", "DIVORCED", "NEVER MARRIED", "NEVER MARRIED", "DIVORCED", "DIVORCED", "MARRIED")
observed <- c(table(data))
n <- sum(observed)
k <- length(observed)
expProb =rep(1/k, k)
pObs = dmultinom(sort(observed, decreasing=TRUE), size=n, expProb)

counts <- seq(0, n, by = 1)
kCounts <- matrix(,nrow=n 1, ncol=k)
for (i in 1:k){
  kCounts[,i] <- counts
}

all_perm <- merge(kCounts[,1], as.data.frame(kCounts[,2]),all=TRUE)
all_perm <- all_perm[rowSums(all_perm) <= n,]

#THE FOR LOOP THAT DOESN'T WORK
for (i in 3:k){
  print(i)
  all_perm <- merge(all_perm, as.data.frame(kCounts[,i]),all=TRUE)
  all_perm <- all_perm[rowSums(all_perm) <= n,]
  print(dim(all_perm))
}

This will nicely print the the correct i (3 and 4) but the dimensions of all_perm remain with 3 columns instead of 4. The number of rows does change.

If I run the two (3 and 4) directly it does work, i.e. replacing the #THE FOR LOOP THAT DOESN'T WORK part to:

all_perm <- merge(all_perm, as.data.frame(kCounts[,3]),all=TRUE)
all_perm <- all_perm[rowSums(all_perm) <= n,]
all_perm <- merge(all_perm, as.data.frame(kCounts[,4]),all=TRUE)
all_perm <- all_perm[rowSums(all_perm) <= n,]
dim(all_perm)

It correctly shows that all_perm now has 4 columns.

I really don't get why the for loop doesn't work. I tried also a while loop but also that doesn't work. Any help would be appreciated.

Purpose: This code is part of function I'm trying to make for myself, where I'm trying to perform a multinomial test. It's just for a theoretical exercise to understand how the test works. The easy way to perform a multinomial test is by either using the EMT library and the 'multinomial.test' function, or xnomial library and the 'xmulti' function.

CodePudding user response:

Assuming you wanted a dataframe of k columns, where each row is a unique n-element sample (with replacement) of the numbers 0:n and the rowSum does not exceed n, here's one "loopless" approach:

  1. get the combinations and transpose the resulting k × n matrix to n × k:
all_perms <- t(combn(rep(0:n, times = k), k))
> all_perms |> head(3)
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    3
[2,]    0    1    2    4
[3,]    0    1    2    5

> nrow(all_perms)
[1] 1581580
  1. keep only rows with row sums <= n
all_perms <- 
  subset(all_perms,
         rowSums(all_perms) <= n
         )
> nrow(all_perms)
[1] 81769
  1. if needed, convert to dataframe and sort (V1 changing fastest):
library(dplyr) ## for convenient multi-column sort

all_perms <- 
  all_perms |>
  as.data.frame() |>
  arrange(V4, V3, V2, V1) 
> all_perms |> head(3)
  V1 V2 V3 V4
1  0  0  0  0
2  1  0  0  0
3  2  0  0  0
  • Related