I´m stuck with dplyr (again!) and trying to solve my problem without dying in the attemp.
The first lines of my df look like this:
df <- structure(list(fecha = c(1990, 1990, 1990, 1990, 1990, 1990,
1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990, 1990), cientifico = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Argentina sphyraena", class = "factor"),
dem_sect = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("AB", "EP", "FE", "MF",
"PA"), class = "factor"), sector = c("EPb", "EPc", "EPc",
"EPb", "EPa", "EPa", "EPb", "EPc", "EPb", "EPb", "EPb", "EPb",
"EPb", "EPb", "EPa"), md_area = c(3010.44, 665.88, 665.88,
3010.44, 1273.65, 1273.65, 3010.44, 665.88, 3010.44, 3010.44,
3010.44, 3010.44, 3010.44, 3010.44, 1273.65), md_peso = c(1.42957605985037,
1.04499099099099, 1.04499099099099, 1.42957605985037, 1.24025925925926,
1.24025925925926, 1.42957605985037, 1.04499099099099, 1.42957605985037,
1.42957605985037, 1.42957605985037, 1.42957605985037, 1.42957605985037,
1.42957605985037, 1.24025925925926), dummy = c(4303.65295361596,
695.838601081081, 695.838601081081, 4303.65295361596, 1579.65620555556,
1579.65620555556, 4303.65295361596, 695.838601081081, 4303.65295361596,
4303.65295361596, 4303.65295361596, 4303.65295361596, 4303.65295361596,
4303.65295361596, 1579.65620555556)), row.names = c(NA, -15L
), class = "data.frame")
I´m trying to "translate" this: sumsect <- tapply(md_peso * md_area, as.factor(substr(names(sector), 1, 2)), sum)
into dplyr. But with no success although I´ve tried many many approaches. I added a column ("dem_sect") which will be the result of as.factor(substr(names(sector), 1, 2))
in an attempt to solve the problem, but I failed.
The desired output would be a data frame with a new column: "sumsect" (with the same value (in this case 6579.148 (the sum of md_peso * md_area by sector (1579.6562 4303.6530 695.8386))
fecha cientifico dem_sect sector md_area md_peso dummy sumsect
1 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
2 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
3 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
4 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
5 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
6 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
7 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
8 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
9 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
10 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
11 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
12 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
13 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
14 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
15 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
Any hint will be more than welcome. Thanks in advance
CodePudding user response:
You can just mutate and then summarise the unique
values of dummy
df |>
mutate(sumsect = sum(unique(dummy)))
if you're reliant on md_area and md_peso you can use:
df |>
mutate(sumsect = sum(unique(md_area * md_peso)))
CodePudding user response:
You don't need tapply
if you will work with dpylr
. No necesitas tapply
si vas a trabajar con dpylr
.
library(tidyverse)
df %>% # target dataframe
cbind( # we will join a value as a new column for every row
df %>% # work with dataframe df
group_by(sector) %>% # calculate by sector
summarise(sumsect = unique(md_area*md_peso)) %>% # the md_area*md _peso
ungroup() %>% # remove grouping
summarise(sumsect = sum(sumsect)) # sum the 3 calculated values
)
Output:
fecha cientifico dem_sect sector md_area md_peso dummy sumsect
1 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
2 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
3 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
4 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
5 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
6 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
7 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
8 1990 Argentina sphyraena EP EPc 665.88 1.044991 695.8386 6579.148
9 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
10 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
11 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
12 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
13 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
14 1990 Argentina sphyraena EP EPb 3010.44 1.429576 4303.6530 6579.148
15 1990 Argentina sphyraena EP EPa 1273.65 1.240259 1579.6562 6579.148
If it is possible that you want to calculate sumsect
by grouped cientifico
or fecha
or both you can group them. In your example there is only one.
En tu ejemplo solo tienes 1 fecha y 1 cientifico. Si quieres que sumsect sea distinto para cada level de esas columnas no te olvides de agrupar también con esas columnas.
CodePudding user response:
Update: Seeing @Jahi Zamy answer 1 it is also possible using no grouping: Grouping would have the chance to control over different groups in the real data set:
df %>%
mutate(sumsect = sum(unique( md_peso * md_area)))
First answer:
You can do it this way with dplyr
: The trick is using group_by
and then ungroup()
and sum with unique
values. In case you want to sum for specific groups, then instead of ungroup
use group_by
the desired group:
df %>%
group_by(sector) %>%
mutate(y = md_peso * md_area) %>%
ungroup() %>%
mutate(sumsect = sum(unique(y)), .keep="unused")
fecha cientifico dem_sect sector md_area md_peso dummy sumsect
<dbl> <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
2 1990 Argentina sphyraena EP EPc 666. 1.04 696. 6579.
3 1990 Argentina sphyraena EP EPc 666. 1.04 696. 6579.
4 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
5 1990 Argentina sphyraena EP EPa 1274. 1.24 1580. 6579.
6 1990 Argentina sphyraena EP EPa 1274. 1.24 1580. 6579.
7 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
8 1990 Argentina sphyraena EP EPc 666. 1.04 696. 6579.
9 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
10 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
11 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
12 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
13 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
14 1990 Argentina sphyraena EP EPb 3010. 1.43 4304. 6579.
15 1990 Argentina sphyraena EP EPa 1274. 1.24 1580. 6579.