I have a data frame with a column of characters. Im trying to swap the order of the characters. For example, if I have a dataframe that looks like this:
df <- data.frame(
vars = c('Sepal.Width:Petal.Length' ,
'Sepal.Length:Sepal.Length',
'Sepal.Length:Petal.Length',
'Petal.Length:Sepal.Length',
'Petal.Width:Sepal.Length ',
'Sepal.Length:Petal.Width ',
'Petal.Width:Petal.Length ',
'Sepal.Width:Sepal.Width ',
'Sepal.Width:Sepal.Length '),
value = c(0.18750000,
0.12500000,
0.18750000,
0.09791667,
0.09791667,
0.06666667,
0.02500000,
0.05625000,
0.15625000)
)
We can see in the var
column that row 3 is Sepal.Length:Petal.Length
and row 4 is Petal.Length:Sepal.Length
. What I'm trying to do is arrange the data frame so that the order of the names is the same. So taking row 3 and 4, I want to rearrange the names, say, alphabetically, so that we get: Petal.Length:Sepal.Length
for both rows.
After rearranging the complete data frame, my desired output would look something like this:
var value
1 Petal.Length:PetalWidth. 0.02500000
2 Petal.Length:Sepal.Length 0.09791667
3 Petal.Length:Sepal.length 0.18750000
4 Petal.Length:Sepal.Width 0.18750000
5 Petal.Width:Sepal.Length 0.09791667
6 Petal.Width:Sepal.Length 0.06666667
7 Sepal.Length:Sepal.Length 0.09791667
8 Sepal.Length:Sepal.Width 0.15625000
9 Sepal.Width:Sepal.Width 0.05625000
CodePudding user response:
You could use tidyverse functions to pull the "vars" column apart, sort everything alphabetically, and then glue the naming back together:
library(tidyverse)
result <- df %>%
rowwise() %>%
mutate(
separated = list(sort(strsplit(vars, ':')[[1]])),
trimmed = list(gsub('(^ | $)', '', separated)),
reordered = paste(trimmed, collapse = ':')
) %>%
select(vars = reordered, value)
vars value
<chr> <dbl>
1 Petal.Length:Sepal.Width 0.188
2 Sepal.Length:Sepal.Length 0.125
3 Petal.Length:Sepal.Length 0.188
4 Petal.Length:Sepal.Length 0.0979
5 Petal.Width:Sepal.Length 0.0979
6 Petal.Width:Sepal.Length 0.0667
7 Petal.Length:Petal.Width 0.025
8 Sepal.Width:Sepal.Width 0.0562
9 Sepal.Length:Sepal.Width 0.156
CodePudding user response:
Here is another solution that gives you the exact 'expected' output:
library(tidyverse)
df <- data.frame(
vars = c('Sepal.Width:Petal.Length' ,
'Sepal.Length:Sepal.Length',
'Sepal.Length:Petal.Length',
'Petal.Length:Sepal.Length',
'Petal.Width:Sepal.Length ',
'Sepal.Length:Petal.Width ',
'Petal.Width:Petal.Length ',
'Sepal.Width:Sepal.Width ',
'Sepal.Width:Sepal.Length '),
value = c(0.18750000,
0.12500000,
0.18750000,
0.09791667,
0.09791667,
0.06666667,
0.02500000,
0.05625000,
0.15625000)
)
df %>%
separate(vars, into = c("var1", "var2"), sep = ":") %>%
mutate(var2 = str_trim(var2)) %>%
mutate(vars = case_when(var1 < var2 ~ paste(var1, var2, sep = ":"),
var1 > var2 ~ paste(var2, var1, sep = ":"),
TRUE ~ paste(var1, var2, sep = ":"))) %>%
select(vars, value) %>%
arrange(vars, value)
#> vars value
#> 1 Petal.Length:Petal.Width 0.02500000
#> 2 Petal.Length:Sepal.Length 0.09791667
#> 3 Petal.Length:Sepal.Length 0.18750000
#> 4 Petal.Length:Sepal.Width 0.18750000
#> 5 Petal.Width:Sepal.Length 0.06666667
#> 6 Petal.Width:Sepal.Length 0.09791667
#> 7 Sepal.Length:Sepal.Length 0.12500000
#> 8 Sepal.Length:Sepal.Width 0.15625000
#> 9 Sepal.Width:Sepal.Width 0.05625000
Created on 2022-03-21 by the reprex package (v2.0.1)
CodePudding user response:
With tidyverse
, we can split the string on :
, then use map
to apply a function to sort, remove the whitespace, and then paste back together.
library(tidyverse)
df %>%
mutate(vars = map(
str_split(vars, pattern = ":"),
~ sort(.x) %>% trimws(.) %>% paste0(., collapse = ':')
)) %>%
arrange(vars)
Output
vars value
1 Petal.Length:Sepal.Width 0.18750000
2 Sepal.Length:Sepal.Length 0.12500000
3 Petal.Length:Sepal.Length 0.18750000
4 Petal.Length:Sepal.Length 0.09791667
5 Petal.Width:Sepal.Length 0.09791667
6 Petal.Width:Sepal.Length 0.06666667
7 Petal.Length:Petal.Width 0.02500000
8 Sepal.Width:Sepal.Width 0.05625000
9 Sepal.Length:Sepal.Width 0.15625000
Or if you are wanting the exact expected output, then we could do something like this:
df %>%
mutate(vars = map(
str_split(vars, pattern = ":"),
~ sort(.x) %>% trimws(.)
)) %>%
mutate(vars = gsub("^c\\(|\\)$", "", vars),
vars = gsub(", ", ",", gsub("\"", "", vars))) %>%
separate(vars, c("vars1", "vars2"), sep = ",") %>%
arrange(vars1, vars2, value) %>%
unite("vars", vars1:vars2, sep = ":")
Output
vars value
1 Petal.Length:Petal.Width 0.02500000
2 Petal.Length:Sepal.Length 0.09791667
3 Petal.Length:Sepal.Length 0.18750000
4 Petal.Length:Sepal.Width 0.18750000
5 Petal.Width:Sepal.Length 0.06666667
6 Petal.Width:Sepal.Length 0.09791667
7 Sepal.Length:Sepal.Length 0.12500000
8 Sepal.Length:Sepal.Width 0.15625000
9 Sepal.Width:Sepal.Width 0.05625000