Home > Enterprise >  Create new column based on pattern in other column
Create new column based on pattern in other column

Time:11-03

I have the following data frame in R out of which I'd like to create a new column containing the Nut for each municipal (See second table). "Nut" refers simply to a higher hirachy level of municipalities in portugal. For later analysis I need to group the data by Nuts. The entire dataframe consists of 308 municipalities and 25 Nuts.

Does someone have a suggestion on how to approach this task? Since the number of municipals in each Nut differes I have difficulties on where to begin.

geo-group nuts_municipal
Nut III Alto Minho
Municipal Arcos de Valdevez
Municipal Caminha
Municipal Monção
Municipal Ponte da Barca
Nuts III Ponte da Barca
Municipal Amares
Municipal Barcelos
Municipal Braga
Nuts III Fafe
Municipal Ave

This is what I'd like to have as a final result.

geo-group nuts_municipal Nut
Nut III Alto Minho
Municipal Arcos de Valdevez Alto Minho
Municipal Caminha Alto Minho
Municipal Monção Alto Minho
Municipal Ponte da Barca Alto Minho
Nut III Cávado
Municipal Amares Cávado
Municipal Barcelos Cávado
Municipal Braga Cávado
Nut III Ave
Municipal Fafe Ave
Municipal Mondim de Basto Ave

I have difficulties on where to begin and so far haven't found any appreach.

CodePudding user response:

A second option would be to use tidyr::fill and if_else:

library(tidyverse)

dat |> 
  mutate(Nut = if_else(grepl("^Nut", `geo-group`), nuts_municipal, NA_character_)) |> 
  tidyr::fill(Nut) |> 
  mutate(Nut = if_else(grepl("^Nut", `geo-group`), "", Nut))
#>    geo-group    nuts_municipal            Nut
#> 1    Nut III        Alto Minho               
#> 2  Municipal Arcos de Valdevez     Alto Minho
#> 3  Municipal           Caminha     Alto Minho
#> 4  Municipal            Monção     Alto Minho
#> 5  Municipal    Ponte da Barca     Alto Minho
#> 6   Nuts III    Ponte da Barca               
#> 7  Municipal            Amares Ponte da Barca
#> 8  Municipal          Barcelos Ponte da Barca
#> 9  Municipal             Braga Ponte da Barca
#> 10  Nuts III              Fafe               
#> 11 Municipal               Ave           Fafe

DATA

dat <- data.frame(
       check.names = FALSE,
       `geo-group` = c("Nut III","Municipal",
                       "Municipal","Municipal","Municipal","Nuts III","Municipal",
                       "Municipal","Municipal","Nuts III","Municipal"),
    nuts_municipal = c("Alto Minho",
                       "Arcos de Valdevez","Caminha","Monção","Ponte da Barca",
                       "Ponte da Barca","Amares","Barcelos","Braga","Fafe","Ave")
)

CodePudding user response:

You can group_by Nuts III and then use first:

df %>% 
  group_by(gp = cumsum(`geo-group` == "Nuts III")) %>% 
  mutate(Nut = ifelse(row_number() == 1, "", first(nuts_municipal)))
  • Related