How to swap parts of strings in R?-CodePudding

I have a data frame with a column of characters. Im trying to swap the order of the characters. For example, if I have a dataframe that looks like this:

df <- data.frame(
  vars = c('Sepal.Width:Petal.Length' ,
           'Sepal.Length:Sepal.Length',
           'Sepal.Length:Petal.Length',
           'Petal.Length:Sepal.Length',
           'Petal.Width:Sepal.Length ',
           'Sepal.Length:Petal.Width ',
           'Petal.Width:Petal.Length ',
           'Sepal.Width:Sepal.Width  ',
           'Sepal.Width:Sepal.Length '),
  value = c(0.18750000,
            0.12500000,
            0.18750000,
            0.09791667,
            0.09791667,
            0.06666667,
            0.02500000,
            0.05625000,
            0.15625000)
)

We can see in the var column that row 3 is Sepal.Length:Petal.Length and row 4 is Petal.Length:Sepal.Length. What I'm trying to do is arrange the data frame so that the order of the names is the same. So taking row 3 and 4, I want to rearrange the names, say, alphabetically, so that we get: Petal.Length:Sepal.Length for both rows.

After rearranging the complete data frame, my desired output would look something like this:

                        var      value
1 Petal.Length:PetalWidth.  0.02500000
2 Petal.Length:Sepal.Length 0.09791667
3 Petal.Length:Sepal.length 0.18750000
4 Petal.Length:Sepal.Width  0.18750000
5 Petal.Width:Sepal.Length  0.09791667
6 Petal.Width:Sepal.Length  0.06666667
7 Sepal.Length:Sepal.Length 0.09791667
8 Sepal.Length:Sepal.Width  0.15625000
9 Sepal.Width:Sepal.Width   0.05625000

CodePudding user response：

You could use tidyverse functions to pull the "vars" column apart, sort everything alphabetically, and then glue the naming back together:

library(tidyverse)

result <- df %>% 
  rowwise() %>% 
  mutate(
    separated = list(sort(strsplit(vars, ':')[[1]])),
    trimmed = list(gsub('(^  |  $)', '', separated)),
    reordered = paste(trimmed, collapse = ':')
  ) %>% 
  select(vars = reordered, value)

  vars                       value
  <chr>                      <dbl>
1 Petal.Length:Sepal.Width  0.188 
2 Sepal.Length:Sepal.Length 0.125 
3 Petal.Length:Sepal.Length 0.188 
4 Petal.Length:Sepal.Length 0.0979
5 Petal.Width:Sepal.Length  0.0979
6 Petal.Width:Sepal.Length  0.0667
7 Petal.Length:Petal.Width  0.025 
8 Sepal.Width:Sepal.Width   0.0562
9 Sepal.Length:Sepal.Width  0.156

CodePudding user response：

Here is another solution that gives you the exact 'expected' output:

library(tidyverse)

df <- data.frame(
  vars = c('Sepal.Width:Petal.Length' ,
           'Sepal.Length:Sepal.Length',
           'Sepal.Length:Petal.Length',
           'Petal.Length:Sepal.Length',
           'Petal.Width:Sepal.Length ',
           'Sepal.Length:Petal.Width ',
           'Petal.Width:Petal.Length ',
           'Sepal.Width:Sepal.Width  ',
           'Sepal.Width:Sepal.Length '),
  value = c(0.18750000,
            0.12500000,
            0.18750000,
            0.09791667,
            0.09791667,
            0.06666667,
            0.02500000,
            0.05625000,
            0.15625000)
)

df %>%
  separate(vars, into = c("var1", "var2"), sep = ":") %>%
  mutate(var2 = str_trim(var2)) %>%
  mutate(vars = case_when(var1 < var2 ~ paste(var1, var2, sep = ":"),
                          var1 > var2 ~ paste(var2, var1, sep = ":"),
                          TRUE ~ paste(var1, var2, sep = ":"))) %>%
  select(vars, value) %>%
  arrange(vars, value)
#>                        vars      value
#> 1  Petal.Length:Petal.Width 0.02500000
#> 2 Petal.Length:Sepal.Length 0.09791667
#> 3 Petal.Length:Sepal.Length 0.18750000
#> 4  Petal.Length:Sepal.Width 0.18750000
#> 5  Petal.Width:Sepal.Length 0.06666667
#> 6  Petal.Width:Sepal.Length 0.09791667
#> 7 Sepal.Length:Sepal.Length 0.12500000
#> 8  Sepal.Length:Sepal.Width 0.15625000
#> 9   Sepal.Width:Sepal.Width 0.05625000

^{Created on 2022-03-21 by the reprex package (v2.0.1)}

CodePudding user response：

With tidyverse, we can split the string on :, then use map to apply a function to sort, remove the whitespace, and then paste back together.

library(tidyverse)

df %>%
  mutate(vars = map(
    str_split(vars, pattern = ":"),
    ~ sort(.x) %>% trimws(.) %>% paste0(., collapse = ':')
  )) %>% 
  arrange(vars)

Output

                       vars      value
1  Petal.Length:Sepal.Width 0.18750000
2 Sepal.Length:Sepal.Length 0.12500000
3 Petal.Length:Sepal.Length 0.18750000
4 Petal.Length:Sepal.Length 0.09791667
5  Petal.Width:Sepal.Length 0.09791667
6  Petal.Width:Sepal.Length 0.06666667
7  Petal.Length:Petal.Width 0.02500000
8   Sepal.Width:Sepal.Width 0.05625000
9  Sepal.Length:Sepal.Width 0.15625000

Or if you are wanting the exact expected output, then we could do something like this:

df %>%
  mutate(vars = map(
    str_split(vars, pattern = ":"),
    ~ sort(.x) %>% trimws(.)
  )) %>%
  mutate(vars = gsub("^c\\(|\\)$", "", vars), 
         vars = gsub(",  ", ",", gsub("\"", "", vars))) %>% 
  separate(vars, c("vars1", "vars2"), sep = ",") %>% 
  arrange(vars1, vars2, value) %>% 
  unite("vars", vars1:vars2, sep = ":")

Output

                       vars      value
1  Petal.Length:Petal.Width 0.02500000
2 Petal.Length:Sepal.Length 0.09791667
3 Petal.Length:Sepal.Length 0.18750000
4  Petal.Length:Sepal.Width 0.18750000
5  Petal.Width:Sepal.Length 0.06666667
6  Petal.Width:Sepal.Length 0.09791667
7 Sepal.Length:Sepal.Length 0.12500000
8  Sepal.Length:Sepal.Width 0.15625000
9   Sepal.Width:Sepal.Width 0.05625000