Indexing / identifying columns adjacent to a column of interest in R dataframe-CodePudding

I'm looking for a way to select data/columns adjacent to a particular column. For example, let's say I want to select the two columns to the left and to the right of 'cat_weight'

df <- data.frame(dog_height = 1:5,
                 dog_weight = 2:6,
                 cat_height = 3:7,
                 cat_weight = 4:8,
                 bird_height = 5:9,
                 bird_weight = 6:10
)

So in this case, that would be 'dog_weight', 'cat_height', 'bird_height', and 'bird_weight'. So far what I've tried is simply creating variables to use as index values based on the index number of the column I'm interested in, e.g.

IDvar <- which(colnames(df)=="cat_weight")
IDvar_left_1 <- IDvar-2
IDvar_left_2 <- IDvar-1
IDvar_right_1 <- IDvar 1
IDvar_right_2 <- IDvar 2

cols_left <- df[,IDvar_left_1:IDvar_left_2]
cols_right <- df[,IDvar_right_1:IDvar_right_2]

This works but I can't help but think there must be a more elegant solution?

My real data is, of course, much larger and hence would require some more complex indexing (e.g. excluding columns immediately adjacent to the column of interest), so just trying to work out a simple method to apply!

CodePudding user response：

Here is a base R solution:

After getting the index of the desired column with which() we could subset with simple maths:

x <- which(colnames(df)=="cat_weight")

df[,c(x-2, x-2, x 1, x 2 )]

 dog_weight dog_weight.1 bird_height bird_weight
1          2            2           5           6
2          3            3           6           7
3          4            4           7           8
4          5            5           8           9
5          6            6           9          10

CodePudding user response：

You had a good idea, here's a solution using dplyr:

library(dplyr)

df <- data.frame(dog_height = 1:5,
                 dog_weight = 2:6,
                 cat_height = 3:7,
                 cat_weight = 4:8,
                 bird_height = 5:9,
                 bird_weight = 6:10
)


get_range <- function(central_column,
                      max_range){
  range <- seq(which(colnames(df)==central_column)[[1]]-max_range,
               which(colnames(df)==central_column)[[1]] max_range)
  
  range[! range %in% which(colnames(df)==central_column)]
}



df |> 
  select(all_of(get_range("cat_weight",2)))

Output:

  dog_weight cat_height bird_height bird_weight
1          2          3           5           6
2          3          4           6           7
3          4          5           7           8
4          5          6           8           9
5          6          7           9          10

CodePudding user response：

Just for variety.

Using base R.

myfunc <- function(df, window, column_name){
  
  colInd <- which(colnames(df) %in% column_name)
  
  rng <- range(colInd - window, colInd   window)
  
  subset(df, , ifelse(min(rng) < 1, 1, min(rng)):ifelse(max(rng) > ncol(df), ncol(df), max(rng)))
  
}

myfunc(data, 3, "cat_weight")

 dog_height dog_weight cat_height cat_weight bird_height bird_weight
1          1          2          3          4           5           6
2          2          3          4          5           6           7
3          3          4          5          6           7           8
4          4          5          6          7           8           9
5          5          6          7          8           9          10

Edit:

I may have misinterpreted the question. I thought that it included the initial column. Using setdiff, we can remove the column and and get the range to subset


myfunc <- function(df, window, column_name){
  
  colInd <- which(colnames(df) %in% column_name)
  
  diff <- setdiff((colInd - window):(colInd   window), colInd)
  
  df[,diff]
}

myfunc(df, 1, "cat_weight")

df[myfunc(df, 1, "cat_weight")]
  cat_height bird_height
1          3           5
2          4           6
3          5           7
4          6           8
5          7           9

CodePudding user response：

Anther solution using data.table library

library(data.table)

i <- which(colnames(df)=="cat_weight")
setDT(df)
df[,(i-2):(i 2)][ , -c("cat_weight")]

output

   dog_weight cat_height bird_height bird_weight
1:          2          3           5           6
2:          3          4           6           7
3:          4          5           7           8
4:          5          6           8           9
5:          6          7           9          10