I have the following data frame in R out of which I'd like to create a new column containing the Nut for each municipal (See second table). "Nut" refers simply to a higher hirachy level of municipalities in portugal. For later analysis I need to group the data by Nuts. The entire dataframe consists of 308 municipalities and 25 Nuts.
Does someone have a suggestion on how to approach this task? Since the number of municipals in each Nut differes I have difficulties on where to begin.
geo-group | nuts_municipal |
---|---|
Nut III | Alto Minho |
Municipal | Arcos de Valdevez |
Municipal | Caminha |
Municipal | Monção |
Municipal | Ponte da Barca |
Nuts III | Ponte da Barca |
Municipal | Amares |
Municipal | Barcelos |
Municipal | Braga |
Nuts III | Fafe |
Municipal | Ave |
This is what I'd like to have as a final result.
geo-group | nuts_municipal | Nut |
---|---|---|
Nut III | Alto Minho | |
Municipal | Arcos de Valdevez | Alto Minho |
Municipal | Caminha | Alto Minho |
Municipal | Monção | Alto Minho |
Municipal | Ponte da Barca | Alto Minho |
Nut III | Cávado | |
Municipal | Amares | Cávado |
Municipal | Barcelos | Cávado |
Municipal | Braga | Cávado |
Nut III | Ave | |
Municipal | Fafe | Ave |
Municipal | Mondim de Basto | Ave |
I have difficulties on where to begin and so far haven't found any appreach.
CodePudding user response:
A second option would be to use tidyr::fill
and if_else
:
library(tidyverse)
dat |>
mutate(Nut = if_else(grepl("^Nut", `geo-group`), nuts_municipal, NA_character_)) |>
tidyr::fill(Nut) |>
mutate(Nut = if_else(grepl("^Nut", `geo-group`), "", Nut))
#> geo-group nuts_municipal Nut
#> 1 Nut III Alto Minho
#> 2 Municipal Arcos de Valdevez Alto Minho
#> 3 Municipal Caminha Alto Minho
#> 4 Municipal Monção Alto Minho
#> 5 Municipal Ponte da Barca Alto Minho
#> 6 Nuts III Ponte da Barca
#> 7 Municipal Amares Ponte da Barca
#> 8 Municipal Barcelos Ponte da Barca
#> 9 Municipal Braga Ponte da Barca
#> 10 Nuts III Fafe
#> 11 Municipal Ave Fafe
DATA
dat <- data.frame(
check.names = FALSE,
`geo-group` = c("Nut III","Municipal",
"Municipal","Municipal","Municipal","Nuts III","Municipal",
"Municipal","Municipal","Nuts III","Municipal"),
nuts_municipal = c("Alto Minho",
"Arcos de Valdevez","Caminha","Monção","Ponte da Barca",
"Ponte da Barca","Amares","Barcelos","Braga","Fafe","Ave")
)
CodePudding user response:
You can group_by
Nuts III and then use first
:
df %>%
group_by(gp = cumsum(`geo-group` == "Nuts III")) %>%
mutate(Nut = ifelse(row_number() == 1, "", first(nuts_municipal)))