Home > database >  How to select columns of a data frame based on the columns of another data frame
How to select columns of a data frame based on the columns of another data frame

Time:07-19

I have a data frame, df_1_2017, with 38 columns. I have another data frame, df_2_2018, with 43 columns. I want the same number of columns/header names so I can easily cbind the two data frames.

I have tried the below with out any luck

col_names_2017 <- colnames(df_1_2017)

selected_cols_df_2_2018 <- df_2_2018 %>%
      select(col_names_2017)

Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Canopy_cover_mean` doesn't exist.

How can I incorporate a select where if the colnames from df_1_2017 are present in df_2_2018 then to select all the columns the two data frames share.

CodePudding user response:

You can use

common_colsnms <- intersect(colnames(df_1_2017) , colnames(df_2_2018))

# apply

selected_cols_df_2_2018 <- df_2_2018 %>%
      select(common_colsnms)

CodePudding user response:

Please see https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html for future reference. Let me know if this works.

col_names_2017 <- colnames(df_1_2017)

selected_cols_df_2_2018 <- df_2_2018 %>%
      select(all_of(col_names_2017))

CodePudding user response:

If you really mean cbind, then it has nothing to do with the numbers or names of columns (well, duplicate names are discouraged but possible). In this case, you should be looking at the number of rows in each, and if they align row-wise; normally either they are perfectly a match (same number of rows, each row means the same thing) or they have shared ID fields that require a join/merge operation.

However, in case you mean rbind instead, where you feel you need the columns to match, by-name, then you have a couple of options.

base R

mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt2)[6:8] <- paste(names(mt2)[3:5], "_3")
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt3)[6:8] <- paste(names(mt3)[6:8], "_3")
mt2
#                mpg cyl disp_2 hp_2 drat_2    wt  qsec vs am gear carb
# Mazda RX4     21.0   6    160  110   3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6    160  110   3.90 2.875 17.02  0  1    4    4
# Datsun 710    22.8   4    108   93   3.85 2.320 18.61  1  1    4    1
mt3
#                    mpg cyl disp  hp drat  wt_3 qsec_3 vs_3 am gear carb
# Hornet 4 Drive    21.4   6  258 110 3.08 3.215  19.44    1  0    3    1
# Hornet Sportabout 18.7   8  360 175 3.15 3.440  17.02    0  0    3    2
# Valiant           18.1   6  225 105 2.76 3.460  20.22    1  0    3    1

common <- intersect(names(mt2), names(mt3))
mt2[,common]
#                mpg cyl am gear carb
# Mazda RX4     21.0   6  1    4    4
# Mazda RX4 Wag 21.0   6  1    4    4
# Datsun 710    22.8   4  1    4    1
mt3[,common]
#                    mpg cyl am gear carb
# Hornet 4 Drive    21.4   6  0    3    1
# Hornet Sportabout 18.7   8  0    3    2
# Valiant           18.1   6  0    3    1
rbind(mt2[,common], mt3[,common])
#                    mpg cyl am gear carb
# Mazda RX4         21.0   6  1    4    4
# Mazda RX4 Wag     21.0   6  1    4    4
# Datsun 710        22.8   4  1    4    1
# Hornet 4 Drive    21.4   6  0    3    1
# Hornet Sportabout 18.7   8  0    3    2
# Valiant           18.1   6  0    3    1

dplyr, limiting names

library(dplyr)
rbind(select(mt2, any_of(names(mt3))), select(mt3, any_of(names(mt2))))
#                    mpg cyl am gear carb
# Mazda RX4         21.0   6  1    4    4
# Mazda RX4 Wag     21.0   6  1    4    4
# Datsun 710        22.8   4  1    4    1
# Hornet 4 Drive    21.4   6  0    3    1
# Hornet Sportabout 18.7   8  0    3    2
# Valiant           18.1   6  0    3    1

Or using the more-flexible bind_rows:

select(mt2, any_of(names(mt3))) %>%
  bind_rows(select(mt3, any_of(names(mt2))))
#                    mpg cyl am gear carb
# Mazda RX4         21.0   6  1    4    4
# Mazda RX4 Wag     21.0   6  1    4    4
# Datsun 710        22.8   4  1    4    1
# Hornet 4 Drive    21.4   6  0    3    1
# Hornet Sportabout 18.7   8  0    3    2
# Valiant           18.1   6  0    3    1

dplyr, accept all columns

If you are less concerned about extra columns, then you can use bind_rows and its innate ability to align columns by name and create columns in one frame where it is not found.

bind_rows(mt2, mt3)
#                    mpg cyl disp_2 hp_2 drat_2    wt  qsec vs am gear carb disp  hp drat  wt_3 qsec_3 vs_3
# Mazda RX4         21.0   6    160  110   3.90 2.620 16.46  0  1    4    4   NA  NA   NA    NA     NA   NA
# Mazda RX4 Wag     21.0   6    160  110   3.90 2.875 17.02  0  1    4    4   NA  NA   NA    NA     NA   NA
# Datsun 710        22.8   4    108   93   3.85 2.320 18.61  1  1    4    1   NA  NA   NA    NA     NA   NA
# Hornet 4 Drive    21.4   6     NA   NA     NA    NA    NA NA  0    3    1  258 110 3.08 3.215  19.44    1
# Hornet Sportabout 18.7   8     NA   NA     NA    NA    NA NA  0    3    2  360 175 3.15 3.440  17.02    0
# Valiant           18.1   6     NA   NA     NA    NA    NA NA  0    3    1  225 105 2.76 3.460  20.22    1
  • Related