Home > OS >  Creating a boxplot from two dataframes
Creating a boxplot from two dataframes

Time:08-17

I have two separate data frames - each representing a feature (activity, and sleep) and the amount of days that each of these features were recorded by each id number. The amount of days need to reflect on the y-axis and the feature itself needs to reflect on the x-axis. I managed to draw the boxplots separately, showing the outliers clearly esp for the one set, however if I want to place the two boxplots next to each other, the outliers do not show up clearly. Also, how do I get the names of the two features (activity and sleep) on my x-axis?

The dataframe for the "sleep "feature:

head(idday)

A tibble: 6 x 2

      id  days
   <dbl> <int>
1 1503960366    25
2 1644430081     4
3 1844505072     3
4 1927972279     5
5 2026352035    28
6 2320127002     1

The dataframe for the "activity "feature:

head(iddaya)

A tibble: 6 x 2

      id  days
   <dbl> <int>
1 1503960366    31
2 1624580081    31
3 1644430081    30
4 1844505072    31
5 1927972279    31
6 2022484408    31 

My attempt for sleep:

ggplot(idday, aes(y = days), boxwex = 0.05)   
stat_boxplot(geom = "errorbar",
               width = 0.2)     
  geom_boxplot(alpha=0.9, outlier.color="red")

and for activity:

ggplot(iddaya, aes(y = days), boxwex = 0.05)   
  stat_boxplot(geom = "errorbar", 
               width = 0.2)      
  geom_boxplot(alpha=0.9, outlier.color="red")

I then combined them:

boxplot(summary(idday$days), summary(iddaya$days))

In this final image the outliers do not show clearly, and I want to name my x-axis and y-axis.

CodePudding user response:

If you want to compare these two boxplots with each other I recommend to use the same range for your y-axis. To achieve this you first have to combine both data frames. You can do this with inner_join() from the dplyr package.

data_combined <- inner_join(idday, iddaya, 
                            by = "id", 
                            suffix = c("_sleep", "_activity"))

Then you need to transform your data frame into long-format with pivot_longer() from the tidyr package:

data_combined_long <- data_combined %>% 
                        pivot_longer(days_sleep:days_activity, 
                                     names_to = "features", 
                                     names_prefix = "days_", 
                                     values_to = "days")

After that you can again use ggplot() to create your boxplot. But now you have to define that you want your x-axis to represent your features:

ggplot(data_combined_long, aes(y = days, x = features), boxwex = 0.05) 
   stat_boxplot(geom = "errorbar",
               width = 0.5)   
   geom_boxplot(alpha=0.9, outlier.color="red")

Your plot should then look like this: enter image description here

CodePudding user response:

There are several ways to achieve your task. One way could be:

If your dataframes are coalled df_sleep and df_activity then we could combine them in a named list and add a new column feature, then plot:

df_sleep
df_activity

library(tidyverse)

bind_rows(list(sleep = df_sleep,  activity =  df_activity), .id = 'feature') %>% 
  ggplot(aes(x = feature, y=days, fill=feature)) 
  geom_boxplot()

enter image description here

  • Related