I am a beginner at R so please bear with me.
A toy-example of my problem would be somewhat like this -
Fruits | Number |
---|---|
1_Fruit_A | 18 |
2_Fruit_A | 20 |
3_Fruit_A | 24 |
1_Fruit_B | 50 |
2_Fruit_B | 63 |
3_Fruit_B | 45 |
1_Fruit_C | 71 |
2_Fruit_C | 75 |
3_Fruit_C | 72 |
And from this table, I wish to create another table wherein I store data somewhat like this -
Fruits | Means | Standard Deviation |
---|---|---|
Fruit A - Average | 20.67 | 3 |
Fruit B - Average | 52.67 | 2 |
Fruit C - Average | 72.67 | 4 |
I need to write a reproducible code which I can use for say even coloured balls or flowers instead of fruits, but always a mean of triplicates which needs to be stored on another table with one column renamed and one column of means, I will then use ggplot to plot the resultant table. Any help would be greatly appreciated.
CodePudding user response:
You can extract the common value from Fruits
column and use it as group and then take average value in each group.
For your example, you can drop everything after a -
.
Using dplyr
:
library(dplyr)
df %>%
group_by(Fruits = sub('\\s-.*', '', Fruits)) %>%
summarise(Number = mean(Number), .groups = 'drop')
# Fruits Number
#1 Fruit A 20.66667
#2 Fruit B 52.66667
#3 Fruit C 72.66667
In base R,
aggregate(Number~Fruits, transform(df, Fruits = sub('\\s-.*', '', Fruits)), mean)
CodePudding user response:
Using data.table
library(data.table)
# create a new column which includes common string to aggregate upon
setDT(dfs)[,':='(
aggstr = substr(Fruits, 0, nchar(Fruits)-1)
# OR using regex
# aggstr = sub('\\s-.*', '', Fruits)
)]
result.dt <- dfs[,.(Avg = mean(Number) , Sd= sd(Number)), by="aggstr"]
EDIT: added standard deviation
CodePudding user response:
Very similar to @Ronak Shah's solution, but here I include a reproducible data and customized text manipulation on Fruits
column, based on your updated data:
mydf <- data.frame(Fruits =
c("1_Fruit_A",
"2_Fruit_A",
"3_Fruit_A",
"1_Fruit_B",
"2_Fruit_B",
"3_Fruit_B",
"1_Fruit_C",
"2_Fruit_C",
"3_Fruit_C"),
Number = c(18, 20, 24, 50, 63, 45, 71, 75, 72))
mydf$Fruits <- mydf %>%
pull(Fruits) %>%
gsub("^[[:digit:]] [_]", "", .) %>%
gsub("[_]", " ", .) %>% paste0(" - Average")
# Intermediate result
mydf
# Fruits Number
# 1 Fruit A - Average 18
# 2 Fruit A - Average 20
# 3 Fruit A - Average 24
# 4 Fruit B - Average 50
# 5 Fruit B - Average 63
# 6 Fruit B - Average 45
# 7 Fruit C - Average 71
# 8 Fruit C - Average 75
# 9 Fruit C - Average 72
newdf <- mydf %>% group_by(Fruits) %>% summarise(Means = mean(Number),
`Standard Deviation` = sd(Number))
# Final result
newdf
# # A tibble: 3 x 3
# Fruits Means `Standard Deviation`
# <chr> <dbl> <dbl>
# 1 Fruit A - Average 20.7 3.06
# 2 Fruit B - Average 52.7 9.29
# 3 Fruit C - Average 72.7 2.08
I think this code serves your purpose, but I strongly recommend to remove the words Average
in Fruits
column because it is misleading. It implies that the values in Mean
column is not the mean of values of each group, but instead the mean of the average of the values of each group.