Home > front end >  How to make the expected value of the difference in the values in paired data using ggplot2
How to make the expected value of the difference in the values in paired data using ggplot2

Time:07-21

I have a pair data as below and I want to make the expected value of the difference in the value (column called value) of pairs. In all the pairs, one has disease and the other one does not have disease as you can see from the data. In other words, the expected value of the difference of the value in one sibling compare to his/her sibling. The description of the variable in the data are:

id = individual ID
family ID = family ID showing their dependency
status = 1 means disease and status = 0 means no-disease

Any guidance is appreciated.

d <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
                    familyID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10), 
                    status = c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1), 
                    value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)), 
            class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))

CodePudding user response:

I'm not certain if this is what you are looking for, but I used pivot_wider from tidyr to spread the values into two columns, though with status 0 and those with status 1. Then I used mutate to take a difference between the two columns, then plotted the familyID by the newly created difference with ggplot. Note that I removed the id column for the pivot_wider to work.

d <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20), 
                    familyID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10), 
                    status = c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1), 
                    value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)), 
               class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))

library(dplyr)
library(tidyr)
library(ggplot2)
d%>%
  select(-id)%>%
  pivot_wider(values_from = value, names_from = status)%>%
  mutate("Diff" = (`0`-`1`))%>%
  ggplot() 
  aes(as.character(familyID), Diff) 
  geom_point()

CodePudding user response:

You can group by familyID, then use summarize() from the dplyr package to find the differences. Also note the conversion of id, familyID, and status to factors, which may make life easier so they aren't confused with being integers.

library(dplyr)
library(forcats)
library(ggplot2)

d <- structure(list(id = as.factor(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)), 
                    familyID = as.factor(c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10)), 
                    status = as.factor(c(0,1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1)), 
                    value = c(29,26, 39, 22.3, 24, 41, 29.7, 24, 25.9, 21, 29,24,26,29, 15.2, 11, 35, 15.4,16, 13.4)), 
               class = c("tbl_df","tbl", "data.frame"), row.names = c(NA, -20L))
diffs <- group_by(d, familyID) %>%
         summarize(., diff = (value[status == 0] - value[status == 1]))

Reordering the families by difference can help get a sense of the distribution of differences

diffs$familyID <- fct_reorder(diffs$familyID, diffs$diff, .desc = TRUE)

ggplot(diffs, aes(x = familyID, y = diff))  
  geom_bar(stat="identity")

A ggplot bar chart with familyID on the x-axis and differences on the y-axis. Families are sorted from largest to smallest difference.

If you really have a lot of families you may want to display a summary of the differences.

One option is with a histogram (modifying binwidth can control how fine the bins are):

ggplot(diffs, aes(x = diff))  
  geom_histogram(binwidth = 3)

A ggplot histogram with difference magnitude along the x-axis and counts of families on the y-axis.

Similar to a histogram is a density plot:

ggplot(diffs, aes(x = diff))  
  geom_density()

A ggplot density graph, with difference magnitude along the x-axis and family density on the y-axis.

Finally, a boxplot is also a familiar summary. They're mostly meant for comparing multiple groups, but it works okay with just one. I've added the individual points using the geom_jitter() function.

ggplot(diffs, aes(y = diff))   #If using multiple groups add x=group inside the aes() function.
  geom_boxplot()  
  geom_jitter(aes(x = 0))

A ggplot boxplot with a single category. The y-axis is treatment difference. Individual points have been added as jitter.

  • Related