I've seen examples here that show how to split a dataframe using column index ranges but how do I split my dataframe with dplyr using strings found in the column? I've purposely created my dataframes so the columns have certain strings in them for future splitting.
Example data:
Site A_Argas A_Arachnicea A_Brus B_Burkoll B_Brielle B_Bact
1 10 0 0 0 0 0
2 0 0 0 10 22 123
3 1 2 3 88 12 546
I want to split this dataframe up based on strings such as "A_" or "B_" and assign them to new dataframes.
For example the output would be:
dataframeA
Site A_Argas A_Arachnicea A_Brus
1 10 0 0
2 0 0 0
3 1 2 3
dataframeB
Site B_Burkoll B_Brielle B_Bact
1 0 0 0
2 10 22 123
3 88 12 546
Because this data is not in long format, I can't seem to figure how to change my old code that I used to split the longform dataframes (for a different analysis).
dataframeA <- data %>% filter("GroupID" == "Arachnids") # where "A_" in column headers signify arachnid species
dataframeB <- data %>% filter("GroupID" == "Bacteria") # where all "B_" in the column headers are bacterial species
CodePudding user response:
One base R option might be to grep
for the columns starting with A_
and B_
, then to subset your original data frame.
a_names <- grep("^A_", names(data), value=TRUE)
dataframeA <- data[ , c("Site", a_names)]
b_names <- grep("^B_", names(data), value=TRUE)
dataframeB <- data[ , c("Site", b_names)]
CodePudding user response:
Another possible solution, based on group_split
, pivot_longer
, pivot_wider
and map
:
library(tidyverse)
df <- structure(list(Site = c(1, 2, 3), A_Argas = c(10, 0, 1), A_Arachnicea = c(0,
0, 2), A_Brus = c(0, 0, 3), B_Burkoll = c(0, 10, 88), B_Brielle = c(0,
22, 12), B_Bact = c(0, 123, 546)), row.names = c(NA, -3L), class = "data.frame")
df %>%
pivot_longer(-1) %>%
group_split(str_extract(name, "^[A-Z]\\_")) %>%
map(~ pivot_wider(.x, id_cols = Site))
#> [[1]]
#> # A tibble: 3 × 4
#> Site A_Argas A_Arachnicea A_Brus
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 0 0
#> 2 2 0 0 0
#> 3 3 1 2 3
#>
#> [[2]]
#> # A tibble: 3 × 4
#> Site B_Burkoll B_Brielle B_Bact
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0
#> 2 2 10 22 123
#> 3 3 88 12 546
CodePudding user response:
We may use split.default
for this in base R
, remove the first column ('Site'), split
the data into a list
of dataframe by removing the substring from the _
(using trimws
), cbind
the first column in the list
of data.frames and change the list
name
lst1 <- split.default(data[-1], trimws(names(data)[-1], whitespace = "_.*"))
lst2 <- Map(cbind, Site = data['Site'], lst1)
names(lst2) <- paste0("dataframe", names(lst1))
-output
> lst2
$dataframeA
Site A_Argas A_Arachnicea A_Brus
1 1 10 0 0
2 2 0 0 0
3 3 1 2 3
$dataframeB
Site B_Burkoll B_Brielle B_Bact
1 1 0 0 0
2 2 10 22 123
3 3 88 12 546
It may be better to keep it in a list and access the elements as
lst2[["dataframeA"]]
lst2$dataframeA
instead of creating multiple objects (although can be created with list2env
list2env(lst2, .GlobalEnv)
With tidyverse
, we could loop over the substring of the column names and select
the relevant columns with map
library(purrr)
library(dplyr)
library(stringr)
str_remove(names(data)[-1] , "_.*") %>%
unique %>%
map(~ data %>%
select(Site, starts_with(.x)))
-output
[[1]]
Site A_Argas A_Arachnicea A_Brus
1 1 10 0 0
2 2 0 0 0
3 3 1 2 3
[[2]]
Site B_Burkoll B_Brielle B_Bact
1 1 0 0 0
2 2 10 22 123
3 3 88 12 546
data
data <- structure(list(Site = 1:3, A_Argas = c(10L, 0L, 1L), A_Arachnicea = c(0L,
0L, 2L), A_Brus = c(0L, 0L, 3L), B_Burkoll = c(0L, 10L, 88L),
B_Brielle = c(0L, 22L, 12L), B_Bact = c(0L, 123L, 546L)),
class = "data.frame", row.names = c(NA,
-3L))