I am running a logistic regression model implemented through generalized estimating equations (GEEs) and keep running into the following error despite trying multiple solutions posted here on SO and elsewhere. I am unsure from where this error arises. I am using the gee
package but the error also occurs in geepack
.
Does anyone know why this error may be occurring despite no NA
, inf
, or character variables in the dataset? My suspicion is that there is something very simple I am missing, but after two days, I have to throw it to better coders than me.
Minimal data and code to reproduce the error, attempts at solutions, and relevant SO questions are below.
Data
df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L, 24L, 21L, 19L, 5L, 4L, 18L,
13L, 23L, 16L, 25L, 12L, 10L, 9L, 22L, 17L, 11L, 3L, 2L, 2L),
levels = c("ALWA28M", "BOMA13M", "BOMA41M", "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
"FASI6M", "FRRO35M", "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
"MAAD60M", "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
testres = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L),
levels = c("POS", "NEG"), class = "factor"),
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
row.names = c(NA, 26L),
class = "data.frame")
Model
gee::gee(testres ~ agegrp, data = df,
id = id,
family = binomial,
corstr = "exchangeable")
Error
Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : NA/NaN/Inf in foreign function call (arg 2) In addition: Warning message: In gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : NAs introduced by coercion
Checking data to ensure no NA
, Inf
, or character variables - all are factors with no missing data
# All factors
str(df)
# 'data.frame': 26 obs. of 3 variables:
# $ id : Factor w/ 25 levels "ALWA28M","BOMA13M",..: 7 1 20 15 14 6 8 24 21 19 ...
# $ testres: Factor w/ 2 levels "POS","NEG": 1 1 1 2 1 1 1 1 1 1 ...
# $ agegrp : Factor w/ 6 levels "0","1","2","3",..: 5 3 3 5 1 1 2 2 1 2 ...
# No NAs or Infinites
lapply(df, table, useNA = "always")
# 0 NAs
lapply(df, \(x) table(is.infinite(x)))
# All FALSE
Alternative approach using geepack
geepack::geeglm(testres ~ agegrp,
data = df, id = id,
corstr = "exchangeable",
family = "binomial")
geepack
error:
Error in lm.fit(zsca, qlf(pr2), offset = soffset) : NA/NaN/Inf in 'y' In addition: Warning messages: 1: In model.response(mf, "numeric") : using type = "numeric" with a factor response will be ignored 2: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
Changing the correlation structure yields same error. Standard logistic regression converges:
summary(glm(testres ~ agegrp, data = df, family = "binomial"(link = logit)))
SO questions that did not resolve the issue. While this issue is common on the site, in my view there is not a sufficient answer to this question on SO, hence the decision to post.
- How to eliminate "NA/NaN/Inf in foreign function call (arg 7)" running predict with randomForest
- R: NA/NaN/Inf in foreign function call (arg 1)
- Error in fitting a model with gee(): NA/NaN/Inf in foreign function call (arg 3)
- NA/NaN/Inf in foreign function call (arg 2)
- NA/NaN/Inf in foreign function call (arg 5)
- lme: NA/NaN/Inf in foreign function call (arg 3)
- NA/NaN/Inf in foreign function call (arg 1) when trying to run a PGLS (Pagel's lambda)
- How to eliminate “NA/NaN/Inf in foreign function call (arg 3)” in bigglm
- R error in glmnet: NA/NaN/Inf in foreign function call
CodePudding user response:
Using 0 and 1 in testres
works:
df <- structure(list(id = structure(c(7L, 1L, 20L, 15L, 14L, 6L, 8L, 24L, 21L, 19L, 5L, 4L, 18L,
13L, 23L, 16L, 25L, 12L, 10L, 9L, 22L, 17L, 11L, 3L, 2L, 2L),
levels = c("ALWA28M", "BOMA13M", "BOMA41M", "DAYA35M", "DEMB72M", "EDAB3WM", "EFCH52M",
"FASI6M", "FRRO35M", "GRAS35F", "GRKA48M", "JARA35M", "KABA27M", "KECH4WM",
"MAAD60M", "MACH33M", "MEBA29F", "MIGU42M", "MTSA10M", "NTMA22F", "RACA2M",
"STMA35M", "TOKE39M", "TRMA12M", "YOLU29M"), class = "factor"),
testres = structure(c(1L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 1L, 1L)),
agegrp = structure(c(5L, 3L, 3L, 5L, 1L, 1L, 2L, 2L, 1L, 2L, 6L, 4L, 4L,
3L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 5L, 4L, 2L, 2L),
levels = c("0", "1", "2", "3", "4", "5"), class = "factor")),
row.names = c(NA, 26L),
class = "data.frame")
gee::gee(testres ~ agegrp, data = df,
id = id,
family = binomial,
corstr = "exchangeable")
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#> (Intercept) agegrp1 agegrp2 agegrp3 agegrp4
#> 1.956607e 01 -3.377525e-08 -1.817977e 01 -1.831331e 01 -1.887292e 01
#> agegrp5
#> -3.513736e-08
#> Error in gee::gee(testres ~ agegrp, data = df, id = id, family = binomial, : Cgee: error: logistic model for probability has fitted value very close to 1.
#> estimates diverging; iteration terminated.
There is now an error because the model has fitted some probabilities very close to 0 or 1, but I think this is an unrelated problem (see the section Details in ?glm
).