Merging dataframes based on different subets-CodePudding

DF <- mtcars

## subsets

df1 <- subset(DF, carb == 1)
df2 <- subset(DF, am == 1)

## table

table1 <- prop.table(with(df1, table(cyl, gear)), margin = 1)
table2 <- prop.table(with(df2, table(cyl, vs)), margin = 1)

## total

total <- cbind(table1 , table2)

which results in:

Error in cbind(B5A2, B5A3) : 
  number of rows of matrices must match (see arg 2)

I'm trying to merge (preferably by cols ) those 2 data frames (table1 , table2) based on two different subets (df1, df2) from the mtcars dataset. I've tried so far: convert to data.frame and use bind.cols and plyr::rbind.fill.matrix(table1, table2) which both didn't work. Is there a way to fix this?

The preferred output should look like this:

cyl   3   4     0     1
  4 0.2 0.8 0,125 0.875
  6 1.0 0.0 1.000 0.000
  8         1.000 0.000

Thanks!

CodePudding user response：

You could use the cbindX function from the gdata package:

cbindX column-binds objects with different number of rows.

Code:

DF <- mtcars
df1 <- subset(DF, carb == 1)
df2 <- subset(DF, am == 1)

table1 <- prop.table(with(df1, table(cyl, gear)), margin = 1)
table2 <- prop.table(with(df2, table(cyl, vs)), margin = 1)

library(gdata)
cbindX(table2, table1)
#>       0     1   3   4
#> 4 0.125 0.875 0.2 0.8
#> 6 1.000 0.000 1.0 0.0
#> 8 1.000 0.000  NA  NA

^{Created on 2022-07-05 by the reprex package (v2.0.1)}

CodePudding user response：

You could also construct your frequency tables as actual tibbles/data.frames* (I am using janitor's tabyl here) and then do a full join (I am using dplyr).

library(dplyr)
library(janitor)

table1 <- df1 |> tabyl(cyl, gear) |> adorn_percentages()
table2 <- df2 |> tabyl(cyl, vs) |> adorn_percentages()

table1 |> full_join(table2)

Output:

 cyl   3   4     0     1
   4 0.2 0.8 0.125 0.875
   6 1.0 0.0 1.000 0.000
   8  NA  NA 1.000 0.000

(*) When you write "2 data frames (table1 , table2)", you're not completely right. Your tables are tables, not data frames, which is not the same. Using as.dataframe(table) will give you the data in the table organized in another format better suited for other kinds of analysis.