When I tried summing 24 rows for specific columns in my data frame, it spit out
Error in rowSums(., na.rm = TRUE) : 'x' must be numeric
I tried various methods to determine whether the columns of interest were numeric.
x_isnum <- select_if(x2009, is.numeric)
names(x_isnum)
# Check data type of every variable in data frame
str(x2009)
All columns of interest were listed as numeric. Then I even opened the data frame and hovered over each column to verify they were numeric; they were. I acknowledge that since the df is so large, it's possible I overlooked something. So I subset the data to learn about just the columns in question.
p = x2009[,c(48,49, 70:91)]
is.numeric(p)
FALSE
Since it returned false, I ran
str(p)
'data.frame': 17090 obs. of 24 variables:
$ poss_cannabis_female_over_64 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_female_under_10: num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_male_over_64 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_male_under_10 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_tot_10_12 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_tot_13_14 : num 0 1 0 0 0 0 1 0 0 0 ...
$ poss_cannabis_tot_15 : num 0 1 0 3 0 0 0 1 0 0 ...
$ poss_cannabis_tot_16 : num 1 0 3 2 1 0 2 2 2 1 ...
$ poss_cannabis_tot_17 : num 1 0 1 3 1 2 0 3 2 1 ...
$ poss_cannabis_tot_18 : num 0 0 1 2 2 1 1 1 0 0 ...
$ poss_cannabis_tot_19 : num 0 2 0 4 1 0 3 0 0 0 ...
$ poss_cannabis_tot_20 : num 0 1 0 2 0 0 2 1 1 3 ...
$ poss_cannabis_tot_21 : num 0 0 0 1 1 0 0 0 1 0 ...
$ poss_cannabis_tot_22 : num 0 2 0 1 0 0 2 0 1 0 ...
$ poss_cannabis_tot_23 : num 1 0 0 3 2 0 1 1 0 0 ...
$ poss_cannabis_tot_24 : num 1 0 0 0 1 0 0 0 0 0 ...
$ poss_cannabis_tot_25_29 : num 0 0 2 3 2 1 0 0 1 2 ...
$ poss_cannabis_tot_30_34 : num 0 0 0 1 0 1 0 1 0 0 ...
$ poss_cannabis_tot_35_39 : num 1 0 0 1 1 0 0 1 0 0 ...
$ poss_cannabis_tot_40_44 : num 0 1 0 0 0 0 0 1 0 0 ...
$ poss_cannabis_tot_45_49 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_tot_50_54 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_tot_55_59 : num 0 0 0 0 0 0 0 0 0 0 ...
$ poss_cannabis_tot_60_64 : num 0 0 0 0 1 0 0 0 0 0 ...
I also ran
sapply(p, is.numeric)
poss_cannabis_female_over_64
TRUE
poss_cannabis_female_under_10
TRUE
poss_cannabis_male_over_64
TRUE
poss_cannabis_male_under_10
TRUE
poss_cannabis_tot_10_12
TRUE
poss_cannabis_tot_13_14
TRUE
poss_cannabis_tot_15
TRUE
poss_cannabis_tot_16
TRUE
poss_cannabis_tot_17
TRUE
poss_cannabis_tot_18
TRUE
poss_cannabis_tot_19
TRUE
poss_cannabis_tot_20
TRUE
poss_cannabis_tot_21
TRUE
poss_cannabis_tot_22
TRUE
poss_cannabis_tot_23
TRUE
poss_cannabis_tot_24
TRUE
poss_cannabis_tot_25_29
TRUE
poss_cannabis_tot_30_34
TRUE
poss_cannabis_tot_35_39
TRUE
poss_cannabis_tot_40_44
TRUE
poss_cannabis_tot_45_49
TRUE
poss_cannabis_tot_50_54
TRUE
poss_cannabis_tot_55_59
TRUE
poss_cannabis_tot_60_64
TRUE
Finally, I ran sapply(p, class)
, which again displayed numeric for each variable. I again hovered over each column in the subsetted data frame, and again, each column said it was numeric
There must be something I am missing if r is telling me it's not numeric. I doubt the code is the problem because I ran it on a smaller, made up df with no issues, but just in case, here is what I ran to sum the rows of specific columns.
x2009 = x2009 %>%
mutate(poss_cannabis_juv_tot = select(., c(49,71:76))) %>%
rowSums(na.rm = TRUE) %>%
mutate(poss_cannabis_adult_tot = select(., c(48,70,77:91))) %>%
rowSums(na.rm = TRUE) %>%
relocate(poss_cannabis_juv_tot, .after = poss_cannabis_male_17) %>%
relocate(poss_cannabis_adult_tot, .after = poss_cannabis_male_over_64)
What is going on??
CodePudding user response:
The issue is in creating a column from from select
. Instead, select the columns within across
and get the rowSums
library(dplyr)
x2009 %>%
mutate(poss_cannabis_juv_tot = rowSums(across(where(is.numeric)),
na.rm = TRUE))
Or if it should be with indexes
x2009 %>%
mutate(poss_cannabis_juv_tot = rowSums(across(c(49,71:76)), na.rm = TRUE),
poss_cannabis_adult_tot = rowSums(across(c(48,70,77:91)), na.rm = TRUE)) %>%
relocate(poss_cannabis_juv_tot, .after = poss_cannabis_male_17) %>%
relocate(poss_cannabis_adult_tot, .after = poss_cannabis_male_over_64)
In the OP's code, the rowSums
part is selecting all the columns because the column created with select
is a data.frame
(in addition to the other non-numeric columns)
> head(iris) %>%
mutate(new = select(., 2:4)) %>%
str
'data.frame': 6 obs. of 6 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1
$ new :'data.frame': 6 obs. of 3 variables:
..$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
..$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
..$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
head(iris) %>%
mutate(new = select(., 2:4)) %>%
rowSums(na.rm = TRUE)
Error in rowSums(., na.rm = TRUE) : 'x' must be numeric
Instead, with across
head(iris) %>%
mutate(new = rowSums(across(2:4), na.rm = TRUE))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species new
1 5.1 3.5 1.4 0.2 setosa 5.1
2 4.9 3.0 1.4 0.2 setosa 4.6
3 4.7 3.2 1.3 0.2 setosa 4.7
4 4.6 3.1 1.5 0.2 setosa 4.8
5 5.0 3.6 1.4 0.2 setosa 5.2
6 5.4 3.9 1.7 0.4 setosa 6.0