I want to count the frequency of paired value in two column, but I want to ignore the paired location. Such as the example below, the general aggregate or table function would reports three paired values (-0.25, 0.9), (0.9, -0.25) and (-0.77,2.9), but what I want to get is only two pairs which are (-0.25, 0.9) and (-0.77,2.9). How should I modify to only count the frequency of paired of value without considering the column location/names?
data <- data.frame(col1=c(-.25, 0.9, -.25, -.77, -.25),
col2=c(0.9, -.25, 0.9, 2.9, 0.9))
CodePudding user response:
Try this
> data[!duplicated(cbind(do.call(pmax, data), do.call(pmin, data))), ]
col1 col2
1 -0.25 0.9
4 -0.77 2.9
CodePudding user response:
One solution. First, we paste together the two columns:
paste(data$col1, data$col2)
[1] "-0.25 0.9" "0.9 -0.25" "-0.25 0.9" "-0.77 2.9" "-0.25 0.9"
Then split them into a list:
str_split(paste(data$col1, data$col2), " ")
[[1]]
[1] "-0.25" "0.9"
[[2]]
[1] "0.9" "-0.25"
[[3]]
[1] "-0.25" "0.9"
[[4]]
[1] "-0.77" "2.9"
[[5]]
[1] "-0.25" "0.9"
Create a custom function to sort and paste the values back together and sapply
to the list:
count_function = function(x) {
x = sort(x)
paste(x, collapse=", ")
}
sapply(str_split(paste(data$col1, data$col2), " "), count_function)
[1] "-0.25, 0.9" "-0.25, 0.9" "-0.25, 0.9" "-0.77, 2.9" "-0.25, 0.9"
Then take the unique values of this vector:
unique(sapply(str_split(paste(data$col1, data$col2), " "), count_function))
[1] "-0.25, 0.9" "-0.77, 2.9"