I have the following dataset:
structure(list(Patient_ID = c("1234", "1234", "1234", "1234",
"1234", "1234", "1234", "1234", "1234"), Unit_Type = c("ABC",
"ABC", "ABC", "ABC", "ABC", "DEF", "DEF", "DEF", "GHI"), Status = c("Returned",
"Returned", "Returned", "Returned", "Transfused", "Transfused",
"Transfused", "Transfused", "Transfused")), class = "data.frame", row.names = c(NA,
-9L))
and have used the following calculation on it:
df <- df %>%
count(Patient_ID, Unit_Type, Status) %>%
pivot_wider(names_from = c(Unit_Type, Status), values_from = n)
I want to sum 'ABC_Returned'
and 'ABC_Transfused'
by Patient_ID
(I know the example dataset only has one unique patient ID, but my real dataset has many more), but I keep getting the following error message:
> aggregate(df, by=list(df$ABC_Transfused, df$ABC_Returned), FUN=sum, na.rm = TRUE)
Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument
Anyone know how to bypass this? Ideally, I would like to create a new column called ABC_Ordered
That is the sum of 'ABC_Returned'
and 'ABC_Transfused'
, grouped by Patient ID.
CodePudding user response:
I think you look for this:
The reason why you get the error is that with your code you try to sum also the first column which is a character column, subsetting by df[,-1] should work:
aggregate(df[,-1], by=list(df$ABC_Transfused, df$ABC_Returned), FUN=sum, na.rm = TRUE)
Group.1 Group.2 ABC_Returned ABC_Transfused DEF_Transfused GHI_Transfused
1 1 4 4 1 3 1
CodePudding user response:
We could use
library(dplyr)
df %>%
mutate(ABC_ordered = ABC_Returned ABC_Transfused)
-output
# A tibble: 1 × 6
Patient_ID ABC_Returned ABC_Transfused DEF_Transfused GHI_Transfused ABC_ordered
<chr> <int> <int> <int> <int> <int>
1 1234 4 1 3 1 5
CodePudding user response:
Maybe this is what youre looking for?
I added another dataframe with another ID, hopefully to clarify, but I think you want a rowwise operation.
Since the data is already aggregated in the count step I dont think you need to do any additional grouping since one observation is one patient
data <- structure(list(Patient_ID = c("1234", "1234", "1234", "1234",
"1234", "1234", "1234", "1234", "1234"), Unit_Type = c("ABC",
"ABC", "ABC", "ABC", "ABC", "DEF", "DEF", "DEF", "GHI"), Status = c("Returned",
"Returned", "Returned", "Returned", "Transfused", "Transfused",
"Transfused", "Transfused", "Transfused")), class = "data.frame", row.names = c(NA,
-9L))
data2 <- structure(list(Patient_ID = c("1235", "1235", "1235", "1235",
"1235", "1235", "1235", "1235", "1235"), Unit_Type = c("ABC",
"ABC", "ABC", "ABC", "ABC", "DEF", "DEF", "DEF", "GHI"), Status = c("Returned",
"Returned", "Returned", "Returned", "Transfused", "Transfused",
"Transfused", "Transfused", "Transfused")), class = "data.frame", row.names = c(NA,
-9L))
data%>%
rbind(data2) -> data_full
data_full2 <- data_full %>%
count(Patient_ID, Unit_Type, Status) %>%
pivot_wider(names_from = c(Unit_Type, Status), values_from = n)
data_full2%>%
rowwise()%>%
mutate(ABC_Ordered = sum(c(ABC_Returned,
ABC_Transfused),
na.rm = TRUE))%>%
ungroup() -> data_full3