I have some data which looks like:
RegionName
<chr>
1 ANDALUCÍA
2 Almería
3 Abla
4 Abrucena
5 Adra
6 ALBÁNCHEZ
7 Alboloduy
8 Albox
9 ALCOLEA
10 Alcóntar
Where some of the columns are uppercase
. I want to extract the uppercase columns into a new column and fill(down)
until the next uppercase column.
Expected output:
RegionName REGIONNAME
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA -first result
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ - change here
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA - change here
10 Alcóntar ALCOLEA
Data:
data = structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
CodePudding user response:
An idea is to use grepl()
to recognise the [[:upper:]]
, convert the others to NA and fill()
, i.e.
library(dplyr)
library(tidyr)
data %>%
mutate(new = replace(RegionName, !grepl("^[[:upper:]] $", RegionName), NA)) %>%
fill(new)
# A tibble: 10 x 2
RegionName new
<chr> <chr>
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA
CodePudding user response:
You can group the regions together based on if their name is ==
to their name in all upper case. Then set all names within the group to the first
RegionName
which is in all caps.
library(tidyverse)
df %>%
group_by(grp = cumsum(RegionName == toupper(RegionName))) %>%
mutate(REGIONNAME = first(RegionName))
Output
RegionName grp REGIONNAME
<chr> <int> <chr>
1 ANDALUCÍA 1 ANDALUCÍA
2 Almería 1 ANDALUCÍA
3 Abla 1 ANDALUCÍA
4 Abrucena 1 ANDALUCÍA
5 Adra 1 ANDALUCÍA
6 ALBÁNCHEZ 2 ALBÁNCHEZ
7 Alboloduy 2 ALBÁNCHEZ
8 Albox 2 ALBÁNCHEZ
9 ALCOLEA 3 ALCOLEA
10 Alcóntar 3 ALCOLEA
Data
df <- structure(list(RegionName = c("ANDALUCÍA", "Almería", "Abla",
"Abrucena", "Adra", "ALBÁNCHEZ", "Alboloduy", "Albox", "ALCOLEA",
"Alcóntar")), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"))
CodePudding user response:
An alternative with ifelse
and fill
:
library(tidyverse)
df %>%
mutate(REGIONNAME = ifelse(RegionName == toupper(RegionName), RegionName, NA)) %>%
fill(REGIONNAME)
RegionName REGIONNAME
1 ANDALUCÍA ANDALUCÍA
2 Almería ANDALUCÍA
3 Abla ANDALUCÍA
4 Abrucena ANDALUCÍA
5 Adra ANDALUCÍA
6 ALBÁNCHEZ ALBÁNCHEZ
7 Alboloduy ALBÁNCHEZ
8 Albox ALBÁNCHEZ
9 ALCOLEA ALCOLEA
10 Alcóntar ALCOLEA