`Hello everyone,
I am trying to create a mean variable within my data.
I would like a mean value of rp
for every month and region (consisting of code
) times the amount of years in my data frame.
This is a bit of the data I am working with:
structure(list(month = c("JAN", "FEV", "MAR", "ABR", "MAI", "JUN", "JUL", "AGO", "SET", "OUT", "NOV", "DEZ", "JAN", "FEV"), year = c(2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L), code = c("AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AM", "AM"), region = c("NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "N", "N"), month_num = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L), rp = c(151.351257324219, 150.433822631836, 145.326431274414, 144.790817260742, 139.691024780273, 138.706481933594, 137.455856323242, 145.046249389648, 136.064834594727, 135.468658447266, 134.540267944336, 137.561904907227, 142.4482421875, 141.584777832031)), row.names = c(NA, 14L), class = "data.frame")
I am fairly new to R and haven't been able to find a solution. I would like to create a new column in my dataset and not simply calculate it. I want to use the solution to create a graph.
Any help is appreciated.
CodePudding user response:
Using the data.table
library, I've provided syntax below to take the mean of rp
by month
and region
. The general syntax is: DT[i, j, by=]
, where DT
is your data.table object, i
is the subset of data, j
is what you'd like to return, and by=
lists the variables you'd like to group by.
library(data.table)
`data <- data.table(
month = c("JAN","FEV","MAR","ABR","MAI","JUN","JUL","AGO","SET","OUT","NOV","DEZ","JAN","FEV"),
year = c(rep(2004,12),rep(2005,2)),
code = c(rep("AL",12),rep("AM",2)),
region = c(rep("NE",12),rep("N",2)),
month = c(1:12,1:2),
rp = c(151.351257324219, 150.433822631836, 145.326431274414, 144.790817260742, 139.691024780273, 138.706481933594, 137.455856323242, 145.046249389648, 136.064834594727, 135.468658447266, 134.540267944336, 137.561904907227, 142.4482421875, 141.584777832031))
)
data
month year code region month rp
1: JAN 2004 AL NE 1 151.3513
2: FEV 2004 AL NE 2 150.4338
3: MAR 2004 AL NE 3 145.3264
4: ABR 2004 AL NE 4 144.7908
5: MAI 2004 AL NE 5 139.6910
6: JUN 2004 AL NE 6 138.7065
7: JUL 2004 AL NE 7 137.4559
8: AGO 2004 AL NE 8 145.0462
9: SET 2004 AL NE 9 136.0648
10: OUT 2004 AL NE 10 135.4687
11: NOV 2004 AL NE 11 134.5403
12: DEZ 2004 AL NE 12 137.5619
13: JAN 2005 AM N 1 142.4482
14: FEV 2005 AM N 2 141.5848
> data[,.(Mean = mean(rp)),by=.(month,region)]
month region Mean
1: JAN NE 151.3513
2: FEV NE 150.4338
3: MAR NE 145.3264
4: ABR NE 144.7908
5: MAI NE 139.6910
6: JUN NE 138.7065
7: JUL NE 137.4559
8: AGO NE 145.0462
9: SET NE 136.0648
10: OUT NE 135.4687
11: NOV NE 134.5403
12: DEZ NE 137.5619
13: JAN N 142.4482
14: FEV N 141.5848
CodePudding user response:
Here is a dplyr
option. I assume your original dataset is called df
.
library(dplyr)
df_mean <- df %>%
group_by(month, region) %>%
summarise(mean_rp = mean(rp), .groups = "drop")
Now join your df_mean
to the original dataset:
df <- left_join(df, df_mean)