Creating mean variables within data - new column-CodePudding

`Hello everyone,

I am trying to create a mean variable within my data.

I would like a mean value of rp for every month and region (consisting of code) times the amount of years in my data frame.

This is a bit of the data I am working with:

structure(list(month = c("JAN", "FEV", "MAR", "ABR", "MAI", "JUN", "JUL", "AGO", "SET", "OUT", "NOV", "DEZ", "JAN", "FEV"), year = c(2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L, 2004L), code = c("AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AM", "AM"), region = c("NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "NE", "N", "N"), month_num = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L), rp = c(151.351257324219, 150.433822631836, 145.326431274414, 144.790817260742, 139.691024780273, 138.706481933594, 137.455856323242, 145.046249389648, 136.064834594727, 135.468658447266, 134.540267944336, 137.561904907227, 142.4482421875, 141.584777832031)), row.names = c(NA, 14L), class = "data.frame")

I am fairly new to R and haven't been able to find a solution. I would like to create a new column in my dataset and not simply calculate it. I want to use the solution to create a graph.

Any help is appreciated.

CodePudding user response：

Using the data.table library, I've provided syntax below to take the mean of rp by month and region. The general syntax is: DT[i, j, by=], where DT is your data.table object, i is the subset of data, j is what you'd like to return, and by= lists the variables you'd like to group by.

library(data.table)

`data <- data.table(
  month = c("JAN","FEV","MAR","ABR","MAI","JUN","JUL","AGO","SET","OUT","NOV","DEZ","JAN","FEV"),
  year = c(rep(2004,12),rep(2005,2)),
  code = c(rep("AL",12),rep("AM",2)),
  region = c(rep("NE",12),rep("N",2)),
  month = c(1:12,1:2),
  rp = c(151.351257324219, 150.433822631836, 145.326431274414, 144.790817260742, 139.691024780273, 138.706481933594, 137.455856323242, 145.046249389648, 136.064834594727, 135.468658447266, 134.540267944336, 137.561904907227, 142.4482421875, 141.584777832031))
)

data
    month year code region month       rp
 1:   JAN 2004   AL     NE     1 151.3513
 2:   FEV 2004   AL     NE     2 150.4338
 3:   MAR 2004   AL     NE     3 145.3264
 4:   ABR 2004   AL     NE     4 144.7908
 5:   MAI 2004   AL     NE     5 139.6910
 6:   JUN 2004   AL     NE     6 138.7065
 7:   JUL 2004   AL     NE     7 137.4559
 8:   AGO 2004   AL     NE     8 145.0462
 9:   SET 2004   AL     NE     9 136.0648
10:   OUT 2004   AL     NE    10 135.4687
11:   NOV 2004   AL     NE    11 134.5403
12:   DEZ 2004   AL     NE    12 137.5619
13:   JAN 2005   AM      N     1 142.4482
14:   FEV 2005   AM      N     2 141.5848

> data[,.(Mean = mean(rp)),by=.(month,region)]
    month region     Mean
 1:   JAN     NE 151.3513
 2:   FEV     NE 150.4338
 3:   MAR     NE 145.3264
 4:   ABR     NE 144.7908
 5:   MAI     NE 139.6910
 6:   JUN     NE 138.7065
 7:   JUL     NE 137.4559
 8:   AGO     NE 145.0462
 9:   SET     NE 136.0648
10:   OUT     NE 135.4687
11:   NOV     NE 134.5403
12:   DEZ     NE 137.5619
13:   JAN      N 142.4482
14:   FEV      N 141.5848

CodePudding user response：

Here is a dplyr option. I assume your original dataset is called df.

library(dplyr)

df_mean <- df %>%
group_by(month, region) %>%
summarise(mean_rp = mean(rp), .groups = "drop")

Now join your df_mean to the original dataset:

df <- left_join(df, df_mean)