Home > Mobile >  Extract specific columns and others columns containing certain characters in a for loop
Extract specific columns and others columns containing certain characters in a for loop

Time:01-30

Suppose for a dataframe df as follows:

df <- structure(list(date = c("2021-1-1", "2021-1-2", "2021-1-3", "2021-1-4", 
"2021-1-5", "2021-1-6"), buy_price_actual = 1:6, call_price_actual = 2:7, 
    sell_price_actual = 3:8, buy_price_pred = 4:9, call_price_pred = 5:10, 
    sell_price_pred = 6:11), class = "data.frame", row.names = c(NA, 
-6L))

Out:

       date buy_price_actual call_price_actual sell_price_actual buy_price_actual.1 call_price_pred sell_price_pred
1 2021-1-1 1 2 3 4 5 6
2 2021-1-2 2 3 4 5 6 7
3 2021-1-3 3 4 5 6 7 8
4 2021-1-4 4 5 6 7 8 9
5 2021-1-5 5 6 7 8 9 10
6 2021-1-6 6 7 8 9 10 11

I want to extract date column and the actual and predicted values of buy and sell prices in a for loop:

cols <- list(
   c("date", "buy_price_actual", "buy_price_pred"),
   c("date", "sell_price_actual", "sell_price_pred")
   )

for (col in cols){
   print(col)
}

for (col in cols){
   df1 <- df %>%
     select(col) %>%
   print(df1)
}

Out:

Error in print.default(m, ..., quote = quote, right = right, max = max) : 
  invalid printing digits -2147483648

Another way to deal with it is to search for keywords through grep, and add date column:

price_types <- c('buy', 'sell')
for (price_type in price_types){
   df1 <- df %>%
     select_if(grepl('date'|price_type, names(.)))
   print(df1)
}

However, there are still bugs in the above two solutions, how to deal with them? Thanks!

CodePudding user response:

The first loop fails because there's an extra pipe, the last one in df1 <- df %>% select(col) %>% print(df1), so the expression evaluates as df1 <- print(select(df, col), df1) which you probably don't want. Try this instead:

for (col in cols){
  df1 <- df %>%
    select(col)
  print(df1)
}

In the 2nd loop you still have to construct a valid string to use as a first parameter of grepl(), for example with paste0() :

price_types <- c('buy', 'sell')
for (price_type in price_types){
  df1 <- df %>%
    select_if(grepl(paste0('date|',price_type), names(.)))
  print(df1)
}

Though I'd rather use something like this instead:

library(dplyr)

# add names
cols <- list(
  "buy"  = c("date", "buy_price_actual", "buy_price_pred"),
  "sell" = c("date", "sell_price_actual", "sell_price_pred")
)
lapply(cols, \(x) select(df, all_of(x)))
#> $buy
#>       date buy_price_actual buy_price_pred
#> 1 2021-1-1                1              4
#> 2 2021-1-2                2              5
#> 3 2021-1-3                3              6
#> 4 2021-1-4                4              7
#> 5 2021-1-5                5              8
#> 6 2021-1-6                6              9
#> 
#> $sell
#>       date sell_price_actual sell_price_pred
#> 1 2021-1-1                 3               6
#> 2 2021-1-2                 4               7
#> 3 2021-1-3                 5               8
#> 4 2021-1-4                 6               9
#> 5 2021-1-5                 7              10
#> 6 2021-1-6                 8              11
price_types <- c('buy', 'sell')
lapply(setNames(price_types, price_types), \(x) select(df, date, contains(x)))
#> $buy
#>       date buy_price_actual buy_price_pred
#> 1 2021-1-1                1              4
#> 2 2021-1-2                2              5
#> 3 2021-1-3                3              6
#> 4 2021-1-4                4              7
#> 5 2021-1-5                5              8
#> 6 2021-1-6                6              9
#> 
#> $sell
#>       date sell_price_actual sell_price_pred
#> 1 2021-1-1                 3               6
#> 2 2021-1-2                 4               7
#> 3 2021-1-3                 5               8
#> 4 2021-1-4                 6               9
#> 5 2021-1-5                 7              10
#> 6 2021-1-6                 8              11

Input:

df <- structure(list(
  date = c(
    "2021-1-1", "2021-1-2", "2021-1-3", "2021-1-4","2021-1-5", "2021-1-6"
  ), buy_price_actual = 1:6, call_price_actual = 2:7, sell_price_actual = 3:8, 
  buy_price_pred = 4:9, call_price_pred = 5:10,sell_price_pred = 6:11
), class = "data.frame", row.names = c(
  NA,
  -6L
))

Created on 2023-01-30 with reprex v2.0.2

CodePudding user response:

You can generate two dataframes names df_buy and df_sell by looping over the two strings and selecting the columns containing that string as well as 'date'. We use assign() to name the dataframe according to the string as well:

library(dplyr)

for (string in c('buy','sell')) {
  assign(paste0("df_",string), df %>%
           select(matches(paste0("date|",string))))
}
  • Related