I have some data which looks like:
# A tibble: 10 × 4
RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3`
<chr> <dbl> <dbl> <dbl>
1 ANDALUCÍA NA NA NA
2 Almería NA NA NA
3 Abla 58 61 54
4 Abrucena 6 2 1
5 Adra 146 211 101
6 ALBÁNCHEZ 12 3 3
7 Alboloduy 2 2 2
8 Albox 33 66 35
9 ALCOLEA 0 1 1
10 Alcóntar 1 1 2
What I am trying to do is to group_by
and compute the sum
of each capitalised RegionName
. i.e. mutate(across(where(is.numeric)...
then add the value next to each capitalised region.
For example:
Using this post here I can extract the capitalised words and store then in a new column using:
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName)
So the data looks like:
# A tibble: 10 × 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA NA NA NA 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 12 3 3 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 0 1 1 3
10 ALCOLEA Alcóntar 1 1 2 3
**Ignoring the grp
column, I would like to group_by(REGIONNAME)
and mutate(across...
the Año...
columns in order to give me a sum
for each REGIONNAME
. Then I would like to fill the NA
value under each column.
Expected output (where the modifications are made next to the ***x***
):
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA ***212*** ***274*** ***155*** 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ ***35*** ***68*** ***37*** 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA ***1*** ***1*** ***2*** 3
10 ALCOLEA Alcóntar 1 1 2 3
Data:
data <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar"), `Año 2004_1` = c(NA, NA, 58, 6, 146, 12, 2, 33,
0, 1), `Año 2004_2` = c(NA, NA, 61, 2, 211, 3, 2, 66, 1, 1),
`Año 2004_3` = c(NA, NA, 54, 1, 101, 3, 2, 35, 1, 2)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
You can replace every capitalized rows with the sum of non-capitalized rows for each group:
#Data
data %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName)) %>%
relocate(REGIONNAME, .before = RegionName) %>%
# Here
mutate(across(starts_with("Año"),
~ ifelse(REGIONNAME == RegionName, sum(.x[REGIONNAME != RegionName], na.rm = T), .x)))
# A tibble: 10 x 6
# Groups: grp [3]
REGIONNAME RegionName `Año 2004_1` `Año 2004_2` `Año 2004_3` grp
<chr> <chr> <dbl> <dbl> <dbl> <int>
1 ANDALUCÍA ANDALUCÍA 210 274 156 1
2 ANDALUCÍA Almería NA NA NA 1
3 ANDALUCÍA Abla 58 61 54 1
4 ANDALUCÍA Abrucena 6 2 1 1
5 ANDALUCÍA Adra 146 211 101 1
6 ALBÁNCHEZ ALBÁNCHEZ 35 68 37 2
7 ALBÁNCHEZ Alboloduy 2 2 2 2
8 ALBÁNCHEZ Albox 33 66 35 2
9 ALCOLEA ALCOLEA 1 1 2 3
10 ALCOLEA Alcóntar 1 1 2 3