I want to convert data frame like this:

mre <- tibble::tribble(
  ~folder3, ~folder2, ~folder1,
    "V3=4",   "V2=1",   "V1=0",
    "V3=5",   "V2=1",   "V1=0",
    "V3=4",   "V2=2",   "V1=0",
    "V3=5",   "V2=2",   "V1=0",
    "V3=4",   "V2=1",   "V1=1",
    "V3=5",   "V2=1",   "V1=1",
    "V3=4",   "V2=2",   "V1=1",
    "V3=5",   "V2=2",   "V1=1"

to this:

folder3 folder2 folder1 V3  V2  V1
V3=4    V2=1    V1=0    4   1   0
V3=5    V2=1    V1=0    5   1   0
V3=4    V2=2    V1=0    4   2   0
V3=5    V2=2    V1=0    5   2   0
V3=4    V2=1    V1=1    4   1   1
V3=5    V2=1    V1=1    5   1   1
V3=4    V2=2    V1=1    4   2   1
V3=5    V2=2    V1=1    5   2   1

Basically extracting the unique variable names ("V3, "V2", "V1" here, but could be any valid names such as "a", "b", c" ) for each folder? column as the new column name, and keep the values in place.

I have the following for a single "folder" column by using the first row value:

mre %>% 
    tidyr::extract(folder1, into = .$folder1[1] |> word(1, sep="="), "\\S =(\\d )", remove = FALSE)

But I don't know how to expand to multiple "folders" columns (the number is not fixed). I tried to use map following the answers here, but could not figure out how to get the variable names from the first row.

Any suggestions?

CodePudding user response:

Instead of extract, we may create new columns within across itself - mutate across all the columns (everything()), use str_extract to get the digits (\\d ) that succeeds the =, while modifying the column names in names with str_replace

mre %>%
    ~ as.numeric(str_extract(., "(?<=\\=)\\d ")), 
       .names = "{str_replace(.col, 'folder', 'V')}"))


# A tibble: 8 × 6
  folder3 folder2 folder1    V3    V2    V1
  <chr>   <chr>   <chr>   <dbl> <dbl> <dbl>
1 V3=4    V2=1    V1=0        4     1     0
2 V3=5    V2=1    V1=0        5     1     0
3 V3=4    V2=2    V1=0        4     2     0
4 V3=5    V2=2    V1=0        5     2     0
5 V3=4    V2=1    V1=1        4     1     1
6 V3=5    V2=1    V1=1        5     1     1
7 V3=4    V2=2    V1=1        4     2     1
8 V3=5    V2=2    V1=1        5     2     1

CodePudding user response:

A base R option

      V2 ~ id   factor(V1, levels = unique(V1)),
        Map(function(x) cbind(read.table(text = x, sep = "="), id = seq_along(x)), mre)


  folder3 folder2 folder1 V3 V2 V1
1    V3=4    V2=1    V1=0  4  1  0
2    V3=5    V2=1    V1=0  5  1  0
3    V3=4    V2=2    V1=0  4  2  0
4    V3=5    V2=2    V1=0  5  2  0
5    V3=4    V2=1    V1=1  4  1  1
6    V3=5    V2=1    V1=1  5  1  1
7    V3=4    V2=2    V1=1  4  2  1
8    V3=5    V2=2    V1=1  5  2  1
