'''> seg.df.cut$email <- factor (ifelse(seg.df$email < median (seg.df$email), 1, 2))
'''
Warning message:
In mean.default(sort(x, partial = half 0L:1L)[half 0L:1L]) :
argument is not numeric or logical: returning NA
all of the other variables I'm converting are working just fine, moving further
along if I ignore this and try poLCA, I receive this follow up message
'''> seg.f<-with(seg.df.cut,cbind(age,credit.score,email,distance.to.store,online.visits,online.trans,online.spend,store.trans,store.spend)~1)
'''
'''> seg.LCA4<-poLCA(seg.f,data=seg.df.cut,nclass=4)
'''
Error in runif(R * K.j[j]) : invalid arguments
In addition: Warning messages:
1: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
2: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
3: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
4: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
5: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
6: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
7: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
8: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
9: In FUN(newX[, i], ...) :
no non-missing arguments to max; returning -Inf
10: In cbind(1, exb) :
number of rows of result is not a multiple of vector length (arg 1)
my dataset looks like this
''' >str(seg.df)
'''
'data.frame': 1000 obs. of 9 variables:
$ age : num 22.9 28 35.9 30.5 38.7 ...
$ credit.score : num 631 749 733 830 734 ...
$ email : chr "yes" "yes" "yes" "yes" ...
$ distance.to.store: num 2.58 48.18 1.29 5.25 25.04 ...
$ online.visits : int 20 121 39 1 35 1 1 48 0 14 ...
$ online.trans : int 3 39 14 0 11 1 1 13 0 6 ...
$ online.spend : num 58.4 756.9 250.3 0 204.7 ...
$ store.trans : int 4 0 0 2 0 0 2 4 0 3 ...
$ store.spend : num 140.3 0 0 95.9 0 ...
CodePudding user response:
You are trying to take a median of character data, it simply doesn't make sense - that's not an R problem, it's a logic problem. But R can help... Maybe convert to numeric values using ifelse()
but be what values occur in seg.df$email
- if its only yes and no, this will work
median(ifelse(seg.df$email=="yes", 1, 0), na.rm=TRUE)
Otherwise, nested ifelse may be useful
median(ifelse(seg.df$email=="yes", 1,
ifelse(seg.df$email=="no", 0, NA)),
na.rm=TRUE)