My input data df
is:
Action Difficulty strings characters POS NEG NEU
Field 0.635 7 59 0 0 7
Field or Catch 0.768 28 193 0 0 28
Field or Ball -0.591 108 713 6 0 101
Ball -0.717 61 382 3 0 57
Catch -0.145 89 521 1 0 88
Field 0.28 208 1214 2 3 178
Field and run 1.237 18 138 1 0 17
I am interested in group-based correlations of Difficulty
with the remaining variables strings, characters, POS, NEG, NEU
. The grouping variable is Action
. If I am interested only in the group Field, I can do filter(str_detect(Action, 'Field'))
.
I can do it one by one between Difficulty and the remaining variables. But is there a faster way to do it in one command with multiple variables? My partial solution is:
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>% # Original data had multiple NA
group_by(Action) %>%
summarise_all(funs(cor))
But this results in an error.
Some relevant SO posts that I looked at are: This is quite relevant to generate a correlation matrix but does not address my question Find correlation coefficient of two columns in a dataframe by group. Useful to compute different types of correlations and introduces a different way of ignoring NAs: Check the correlation of two columns in a dataframe (in R)
Any help or guidance on this would be greatly appreciated!
For reference, this is the sample dput()
structure(list(
Action = c("Field", "Field or Catch", "Field or Ball", "Ball", "Catch", "Field", "Field and run"), Difficulty = c(0.635, 0.768, -0.591, -0.717, -0.145, 0.28, 1.237),
strings = c(7L, 28L, 108L, 61L, 89L, 208L, 18L),
characters = c(59L, 193L, 713L, 382L, 521L, 1214L, 138L),
POS = c(0L, 0L, 6L, 3L, 1L, 2L, 1L),
NEG = c(0L, 0L, 0L, 0L, 0L, 3L, 0L),
NEU = c(7L, 28L, 101L, 57L, 88L, 178L, 17L)),
class = "data.frame", row.names = c(NA,
-7L))
CodePudding user response:
You may try -
library(dplyr)
library(stringr)
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>% # Original data had multiple NA
group_by(Action) %>%
summarise(across(-Difficulty, ~cor(.x, Difficulty)))
If you don't want to group_by
Action
-
df %>%
filter(str_detect(Action, 'Field')) %>%
na.omit %>%
summarise(across(-c(Difficulty, Action), ~cor(.x, Difficulty)))
# strings characters POS NEG NEU
#1 -0.557039 -0.5983826 -0.8733465 -0.1520684 -0.5899733