Home > front end >  Is there a way R can find the first instance of a number in a row and then return the column value?
Is there a way R can find the first instance of a number in a row and then return the column value?

Time:10-07

I have an issue trying to find a way to automate a process that takes me forever to do by hand in Excel, and I am seeing if there is a way to do it in R.

My data set relates to the development of a plants growth stage over time. Each plant is measured across multiple days in a season, and each plant (being its own row) has a score, from 0-5, that is associated with the day that it was scored (each day being its own column). Below is an example of how the data looks like for one plant.

This is one example of what a plant's growth over time is recorded in my dataset

What I would like to find is the associated day in which the plant went from a '0' score to a '1' score, the day in which the plant went from a '1' score to a '2' score, and so on and so forth through the 5th and final score.

If there is any way you can imagine this being done in R, please advise. Thank you and have a nice day.

CodePudding user response:

Given some data like you describe:

df1 <- structure(list(ID = c("A", "B"), `2022-10-06` = c(0, 1), `2022-10-07` = c(0,2), 
                `2022-10-08` = c(0, 3), `2022-10-09` = c(1, 3), `2022-10-10` = c(2,4),
                `2022-10-11` = c(3, 4), `2022-10-12` = c(3, 4), `2022-10-13` = c(4,5), 
                `2022-10-14` = c(4, 5), `2022-10-15` = c(5, 5)), class = c("tbl_df","tbl", "data.frame"), 
                row.names = c(NA, -2L))
df1
# A tibble: 2 × 11
  ID    `2022-10-06` `2022-10-07` `2022-10-08` `2022-10-09` `2022-10-10` `2022-10-11` `2022-10-12` `2022-10-13` `2022-10-14` `2022-10-15`
  <chr>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>        <dbl>
1 A                0            0            0            1            2            3            3            4            4            5
2 B                1            2            3            3            4            4            4            5            5            5

I'd use the following to reshape the data into long format, group by each plant, and filter to only keep observations that are changes from the prior date.

library(tidyverse)
df1 %>%
  pivot_longer(-ID) %>% 
  group_by(ID) %>%
  filter(value != lag(value, default = 0)) %>%
  ungroup() 

# A tibble: 10 × 3
   ID    name       value
   <chr> <chr>      <dbl>
 1 A     2022-10-09     1
 2 A     2022-10-10     2
 3 A     2022-10-11     3
 4 A     2022-10-13     4
 5 A     2022-10-15     5
 6 B     2022-10-06     1
 7 B     2022-10-07     2
 8 B     2022-10-08     3
 9 B     2022-10-10     4
10 B     2022-10-13     5

It might be possible for a plant to grow more than 1 between observations, in which case you might want the new observation date to be recorded for all the heights that are new. Below, I remove the "height = 3" observations to simulate this situation and show one way to add those in with the next date:

df1 %>%
  pivot_longer(-ID) %>% 
  group_by(ID) %>%
  filter(value != lag(value, default = 0)) %>%
  filter(row_number() != 3) %>%
  complete(value = 1:5) %>%
  fill(name, .direction = "up") %>% 
  ungroup()


# A tibble: 10 × 3
   ID    value name      
   <chr> <dbl> <chr>     
 1 A         1 2022-10-09
 2 A         2 2022-10-10
 3 A         3 2022-10-13
 4 A         4 2022-10-13
 5 A         5 2022-10-15
 6 B         1 2022-10-06
 7 B         2 2022-10-07
 8 B         3 2022-10-10
 9 B         4 2022-10-10
10 B         5 2022-10-13
  • Related