I have an R
dataframe
with a column titled 'Normalized Month' for different entities.
I am trying to evaluate average of the column 'Oil_Prod' grouped by 'Normalized Month'.
Hence, the final dataframe should have just two columns; 'Normalized Month' and associated mean of oil prod for that 'Normalized Month' value.
Here is the code:
mydf2 <- mydf %>% filter(`Normalized Month` > 0)
mydf2$`Normalized Month` <- factor(mydf2$`Normalized Month`)
levels(mydf2$`Normalized Month`)
GetAvg <- mydf2 %>%
group_by(mydf2$`Normalized Month`)%>%
arrange(mydf2, desc('Normalized Month'))%>%
dplyr::summarize(Mean = mean(mydf2$Oil_Prod, na.rm=TRUE))
Instead of getting mean for each Normalized Month, I am getting a single value of mean.
What I need is a dataframe with a below structure:
'Normalized Month' | 'Mean' |
1 value1
2 value2
3 value 3
dput
:
structure(list(Api_No14 = c("42227412150000", "42227412150000",
"42227412150000", "42227412150000", "42227412150000", "42227412040000",
"42227412040000", "42227412040000", "42227412040000", "42227412040000",
"42227412190000", "42227412190000", "42227412190000", "42227412190000",
"42227412190000", "42227412130000", "42227412130000", "42227412130000",
"42227412130000", "42227412130000"), Prod_Inj_Date = structure(c(1643587200,
1646006400, 1648684800, 1651276800, 1653955200, 1643587200, 1646006400,
1648684800, 1651276800, 1653955200, 1643587200, 1646006400, 1648684800,
1651276800, 1653955200, 1643587200, 1646006400, 1648684800, 1651276800,
1653955200), tzone = "UTC", class = c("POSIXct", "POSIXt")),
`Normalized Month` = structure(c(1L, 2L, 3L, 4L, 5L, 1L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), Oil_Prod = c(4612.588,
25118.511, 39470.742, 22936.157, 17928.723, 2997.715, 16893.884,
26115.048, 18989.759, 16545.852, 4624.317, 18241.738, 30150.569,
21997.824, 18308.631, 52.696, 9375.176, 26108.609, 16889.377,
14688.064), `CD Oil Rate` = c(148.793161290323, 897.089678571429,
1273.24974193548, 764.538566666667, 578.345903225807, 96.7004838709677,
603.353, 842.420903225806, 632.991966666667, 533.737161290323,
149.171516129032, 651.490642857143, 972.599, 733.2608, 590.601,
1.69987096774194, 334.827714285714, 842.213193548387, 562.979233333333,
473.808516129032)), row.names = c(NA, -20L), SBDFImportTypes = c(Api_No14 = "String",
Prod_Inj_Date = "DateTime", `Normalized Month` = "Integer", Oil_Prod = "Real",
`CD Oil Rate` = "Real"), class = "data.frame")
CodePudding user response:
When you use mydf2$Oil_Prod
you are referencing the external data frame, not the one from the pipe. I think if you stop subsetting within your dplyr functions, things will work as expected:
# assigning your dput to mydf
mydf %>% group_by(`Normalized Month`) %>% summarise(mean = mean(Oil_Prod, na.rm = TRUE))
# output
`Normalized Month` mean
<fct> <dbl>
1 1 3072.
2 2 17407.
3 3 30461.
4 4 20203.
5 5 16868.