Home > Back-end >  Add/complete fixed number of columns on dataframe in R
Add/complete fixed number of columns on dataframe in R

Time:04-27

I'm processing multiple R files some have 3 columns

id v_1 v_2 V_3
 1 1   2   3     

Other have more some less

id v_1 v_2
 1 1  7       

Is is possible to "force" every file to have 10 columns (not counting id col) and the extra ones with no values such that you get

For the first example

id v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10
1  1   2   3

And for the second example

id v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10
1  1  7

So basically regardless of the number of columns from the input file you always get 10 columns

CodePudding user response:

Here is a possibility using bind_rows, where you first create an empty dataframe with the desired columns, then bind them together.

library(dplyr)

df_cols <- setNames(data.frame(matrix(ncol = 11, nrow = 0)), c("id", paste0("v_", 1:10)))

bind_rows(df_cols, df)

Another (probably faster option) is to use data.table:

library(data.table)

rbindlist(list(df_cols, df), fill = TRUE)

Another option is to use plyr:

plyr::rbind.fill(df_cols, df)

Output

  id v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10
1  1   1   2   3  NA  NA  NA  NA  NA  NA   NA

If you have a list of dataframes, then you could use map to update all the dataframes in the list:

library(tidyverse)

map(list(df, df2), bind_rows, df_cols)

#[[1]]
#  id v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10
#1  1   1   2   3  NA  NA  NA  NA  NA  NA   NA

#[[2]]
#  id v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10
#1  1   1   2  NA  NA  NA  NA  NA  NA  NA   NA

Data

df <- read.table(text = "id v_1 v_2 v_3
 1 1   2   3", header = T)

df2 <- read.table(text = "id v_1 v_2
 1 1   2", header = T)
  • Related