I have levels that I wish to calculate the mean of. Do you have to use gsub
and replace characters or are there another way?
# Reproduce data
x <- c("(-48.2,-47.8]", "(-61.9,-61.5]", "(-52.2,-51.8]", "(-43.7,-43.3]", "(-51.4,-51]", "(-43.3,-42.9]", "(-43.7,-43.3]", "(-47.4,-47]")
# I have data on the form as below
X <- as.factor(x)
# I want the mean of e.g X[1]
# mean(X[1]) = mean(-48.2 -47.8)
CodePudding user response:
You could also try this approach using dplyr()
to preserve all the numbers:
library(dplyr)
library(tidyr)
data.frame(x) %>% separate(x, into = c("num1", "num2"), sep = ",") %>%
mutate(num1 = as.numeric(gsub("[()]|[][]", "", num1)),
num2 = as.numeric(gsub("[()]|[][]", "", num2)),
mean = (num1 num2) / 2)
Output:
# num1 num2 mean
# 1 -48.2 -47.8 -48.0
# 2 -61.9 -61.5 -61.7
# 3 -52.2 -51.8 -52.0
# 4 -43.7 -43.3 -43.5
# 5 -51.4 -51.0 -51.2
# 6 -43.3 -42.9 -43.1
# 7 -43.7 -43.3 -43.5
# 8 -47.4 -47.0 -47.2
CodePudding user response:
I think a three-step process of gsub
(to remove what we don't want/need), strsplit
(to separate the numbers by comma), and mean(as.numeric(.))
(to actually calculate the numeric average) should work:
gsub("[^-0-9.,]", "", x)
# [1] "-48.2,-47.8" "-61.9,-61.5" "-52.2,-51.8" "-43.7,-43.3" "-51.4,-51" "-43.3,-42.9" "-43.7,-43.3" "-47.4,-47"
strsplit(gsub("[^-0-9.,]", "", x), ",")
# [[1]]
# [1] "-48.2" "-47.8"
# [[2]]
# [1] "-61.9" "-61.5"
# [[3]]
# [1] "-52.2" "-51.8"
# [[4]]
# [1] "-43.7" "-43.3"
# [[5]]
# [1] "-51.4" "-51"
# [[6]]
# [1] "-43.3" "-42.9"
# [[7]]
# [1] "-43.7" "-43.3"
# [[8]]
# [1] "-47.4" "-47"
sapply(strsplit(gsub("[^-0-9.,]", "", x), ","), function(z) mean(as.numeric(z)))
# [1] -48.0 -61.7 -52.0 -43.5 -51.2 -43.1 -43.5 -47.2
(I should note that I'm interpreting your mean(-48.2 -47.8)
to really mean mean(c(-48.2, -47.8))
, since otherwise -48.2 -47.8
seems not right.)
CodePudding user response:
1) Assuming that what is wanted is the mean of the two numbers in each component of X
, remove the first and last character and read what is left using read.table
creating a data frame in which each row is formed from one component of X
. Finally use rowMeans
on that.
No packages are used.
rowMeans(read.table(text = sub(".(.*).", "\\1", X), sep = ","))
## [1] -48.0 -61.7 -52.0 -43.5 -51.2 -43.1 -43.5 -47.2
This can also be written as a pipeline:
X |>
sub(".(.*).", "\\1", x = _) |>
read.table(text = _, sep = ",") |>
rowMeans()
## [1] -48.0 -61.7 -52.0 -43.5 -51.2 -43.1 -43.5 -47.2
2) A similar approach using strapply
also works. This applies the indicated function, expressed using formula notation, to the capture groups.
library(gsubfn)
strapply(format(X), "^.(.*),(.*).$", ~ mean(as.numeric(c(x, y))), simplify = TRUE)
## [1] -48.0 -61.7 -52.0 -43.5 -51.2 -43.1 -43.5 -47.2
CodePudding user response:
sapply(regmatches(x, gregexpr('[-0-9.] ', x)), \(x) mean(as.numeric(x)))
# [1] -48.0 -61.7 -52.0 -43.5 -51.2 -43.1 -43.5 -47.2