I have a dataframe where the colunm names have a structure as follows:
Barcelona.Standard.2012.True
Berlin.One.2013.True
London.One.2014.True
Barcelona.Standard.2015.True
Berlin.One.2016.True
As you can see, each columns specifies the City
, Type of bank account
, Year it was open
and if it's Active
or not.
I want to extract into a list, all the different possibilities for each of the categories. For example, for the first category, which is the city, we would get:
Barcelona
Berlin
London
What I though of doing is splitting by the symbol .
and then getting all the unique values for a given position for all the columns.
I could do this with a loop, but I would like to do it with a sapply, if that's possible.
Something like:
lapply(strsplit(colnames(dat), split = "\\.")[[]][1])
Where [[]]
would be all the columns.
This is just a toy example, the real dataset has thousands of columns.
CodePudding user response:
With sapply
:
sapply(transpose(strsplit(col, "\\.")), function(x) unlist(unique(x), recursive = F))
Or use data.table::transpose
instead of transpose to make it easier:
sapply(data.table::transpose(strsplit(col, "\\.")), unique)
Finally, use setNames
to set the names:
sapply(transpose(strsplit(col, "\\.")), function(x) unlist(unique(x), recursive = F)) |>
setNames(c("City", "Type", "Year", "Active"))
output:
$City
[1] "Barcelona" "Berlin" "London"
$Type
[1] "Standard" "One"
$Year
[1] "2012" "2013" "2014" "2015" "2016"
$Active
[1] "True"
data
col <- c("Barcelona.Standard.2012.True",
"Berlin.One.2013.True",
"London.One.2014.True",
"Barcelona.Standard.2015.True",
"Berlin.One.2016.True")
CodePudding user response:
Here is a way with tidyr::separate
and the new base R >= 4.1 pipe operator and lambda.
tidyr::separate(
dat, x,
into = c("City", "Type", "Year", "Active"),
sep = "[^[:alnum:]]"
) |>
as.list() |>
(\(x) Map(unique, x))()
#> $City
#> [1] "Barcelona" "Berlin" "London"
#>
#> $Type
#> [1] "Standard" "One"
#>
#> $Year
#> [1] "2012" "2013" "2014" "2015" "2016"
#>
#> $Active
#> [1] "True"
Created on 2022-02-14 by the reprex package (v2.0.1)
Edit
Simpler, with lapply
instead of Map
.
tidyr::separate(
dat, x,
into = c("City", "Type", "Year", "Active"),
sep = "[^[:alnum:]]"
) |>
(\(x) lapply(x, unique))()
#> $City
#> [1] "Barcelona" "Berlin" "London"
#>
#> $Type
#> [1] "Standard" "One"
#>
#> $Year
#> [1] "2012" "2013" "2014" "2015" "2016"
#>
#> $Active
#> [1] "True"
Created on 2022-02-14 by the reprex package (v2.0.1)
Data
x <- scan(text = "
Barcelona.Standard.2012.True
Berlin.One.2013.True
London.One|2014.True
Barcelona.Standard.2015.True
Berlin.One.2016.True", what = character())
dat <- data.frame(x)
Created on 2022-02-14 by the reprex package (v2.0.1)