Home > database >  How can i select specific columns that start with a word according to a condition of another column
How can i select specific columns that start with a word according to a condition of another column

Time:10-25

I have a data frame that looks like this :

date var cat_low dog_low cat_high dog_high Love Friend
2022-01-01 A 1 7 13 19 NA friend
2022-01-01 A 2 8 14 20 NA friend
2022-01-01 A 3 9 15 21 NA friend
2022-02-01 B 4 10 16 22 love NA
2022-02-01 B 5 11 17 23 love NA
2022-02-01 B 6 12 18 24 love NA

I want to select the columns related to columns Love and Friend. If the column Love is love to give the columns that starts with cat and if the column Friend is friend to give me the columns that start with dog.

ideally i want to look like this :

date var a b
2022-01-01 A 7 19
2022-01-01 A 8 20
2022-01-01 A 9 21
2022-02-01 B 4 16
2022-02-01 B 5 17
2022-02-01 B 6 18
library(lubridate)
date = c(rep(as.Date("2022-01-01"),3),rep(as.Date("2022-02-01"),3))
var = c(rep("A",3),rep("B",3))
cat_low = seq(1,6,1)
dog_low = seq(7,12,1)
cat_high = seq(13,18,1)
dog_high = seq(19,24,1)
Friend = c(rep("friend",3),rep(NA,3))
Love = c(rep(NA,3),rep("love",3))
df = tibble(date,var,cat_low,dog_low,cat_high,dog_high,Love,Friend);df


Any help? How i can do that in R using dplyr ?

CodePudding user response:

There might be better ways, but here's one:

library(tidyr)
library(dplyr)
df %>% 
  pivot_longer(cols = starts_with(c("cat", "dog")),
               names_to = c("animal", ".value"),
               names_pattern = "(cat|dog)_(low|high)") %>% 
  filter((is.na(Love) & animal == "dog") |
         (is.na(Friend) & animal == "cat")) %>% 
  select(date, var, low, high)

output

# A tibble: 6 × 4
  date       var     low  high
  <date>     <chr> <dbl> <dbl>
1 2022-01-01 A         7    19
2 2022-01-01 A         8    20
3 2022-01-01 A         9    21
4 2022-02-01 B         4    16
5 2022-02-01 B         5    17
6 2022-02-01 B         6    18

CodePudding user response:

With dplyr try this. The first summarise filters for dog or cat, the second renames and puts the variables together.

library(dplyr)

df %>% 
  summarise(date, var, 
    across(starts_with("dog"), ~ .x[Friend == "friend"]), 
    across(starts_with("cat"), ~ .x[Love == "love"])) %>% 
  rename(a = dog_low, b = dog_high) %>% 
  summarise(date, var, a = ifelse(is.na(a), cat_low, a), 
    b = ifelse(is.na(b), cat_high, b))
        date var a  b
1 2022-01-01   A 7 19
2 2022-01-01   A 8 20
3 2022-01-01   A 9 21
4 2022-02-01   B 4 16
5 2022-02-01   B 5 17
6 2022-02-01   B 6 18
  • Related