I have the following:
a <- c("ID_A", "ID_B", "Count")
b <- c("1", "1015", "19389")
c <- c("1", "1019", "206")
d <- c("1", "1027", "1210")
e <- c("2", "1019", "206")
f <- c("2", "1027", "1210")
df <- data.frame(a, b, c, d, e, f)
I want first to pivot_wider, that is get rid off the abcdef.
Then, I want to extract the maximum value of count, create a new variable (mutate) that has the ID_B of the highest value.
Something like this, with pivot_wider to get rid of abcdef:
a <- c("ID_A", "ID_B", "Count", "ID_B of max value")
b <- c("1", "1015", "19389", "1015")
c <- c("1", "1019", "206", "1015")
d <- c("1", "1027", "1210", "1015")
e <- c("2", "1019", "206", "1027")
f <- c("2", "1027", "1210", "1027")
new_df <- data.frame(a, b, c, d, e, f)
I tried group_by, sum, mutate. It doesn't work. Help please! Thanks a lot
CodePudding user response:
You could attempt this in a number of ways...however, it sounded like you might be interested in a tidyverse
approach. If so, you can use both pivot_longer
and pivot_wider
to evaluate the max Count
. Based on your example, it sounds like for each ID_A
you want the ID_B
that has the maximum Count
value (unclear how you want to handle ties, if that is possible...in this case will just use the first ID_B
). After you indicate which ID_B
has the maximum Count
, you can do the same to revert back to your original format.
library(tidyverse)
df %>%
pivot_longer(cols = -a, names_to = "old_vars", values_transform = list(value = as.numeric)) %>%
pivot_wider(names_from = a) %>%
group_by(ID_A) %>%
mutate(ID_B_Count_Max = ID_B[Count == max(Count)[1]]) %>%
pivot_longer(cols = -old_vars) %>%
pivot_wider(names_from = old_vars)
Output
name b c d e f
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ID_A 1 1 1 2 2
2 ID_B 1015 1019 1027 1019 1027
3 Count 19389 206 1210 206 1210
4 ID_B_Count_Max 1015 1015 1015 1027 1027
CodePudding user response:
I have made a couple of assumptions about your data, first that you intended the numbers to be numerics, and second that the data frame should be transposed so that it has columns ID_A, ID_B, and Count.
a <- c("ID_A", "ID_B", "Count")
b <- c(1, 1015, 19389)
c <- c(1, 1019, 206)
d <- c(1, 1027, 1210)
e <- c(2, 1019, 206)
f <- c(2, 1027, 1210)
I am using the data.table
library, so make a data.table
with your data:
library(data.table)
df <- as.data.table(rbind(b,c,d,e,f))
names(df) <- a
Now I can make a new variable called ID_B of max value
which corresponds to the value of ID_B
at the value that maximises Count.
df[, `ID_B of max value` := .SD$ID_B[which.max(.SD$Count)] , by=ID_A]
df
ID_A ID_B Count ID_B of max value
1: 1 1015 19389 1015
2: 1 1019 206 1015
3: 1 1027 1210 1015
4: 2 1019 206 1027
5: 2 1027 1210 1027
A few aspects of data.table
to note here. First, the :=
operator is a special assignment operator. .SD
means the current subset of the data, so with by=ID_A
it corresponds to the subset defined by each value of ID_A
.