Home > OS >  How to get the value in the largest date of each name?
How to get the value in the largest date of each name?

Time:08-24

I need to use dplyr mutate to create a new column that can show me (perhaps repeated values) of each maximum date what the respective value is.

library(dplyr)

set.seed(1234)
df <- data.frame(
  dates = seq.Date(as.Date("2022-08-23"), 
                   as.Date("2022-09-11"),
                   by = "days"),
  names = c(
    "a", 
    "a",
    "b",
    "c",
    "a",
    "a",
    "c",
    "b",
    "a",
    "a",
    "b",
    "c",
    "a",
    "b",
    "c",
    "d",
    "b",
    "a",
    "b",
    "c"   
    ),
  value = rnorm(20,0,1)
)

glimpse(df)

df

How could I use mutate to insert a column that repeats the largest date value for each name until the name changes?

CodePudding user response:

library(dplyr)
df %>%
  arrange(names, dates) %>% 
  group_by(names) %>% 
  mutate(last_val = last(value))

# A tibble: 20 × 4
# Groups:   names [4]
   dates      names   value max_val
   <date>     <chr>   <dbl>   <dbl>
 1 2022-08-23 a     -1.21    -0.911
 2 2022-08-24 a      0.277   -0.911
 3 2022-08-27 a      0.429   -0.911
 4 2022-08-28 a      0.506   -0.911
 5 2022-08-31 a     -0.564   -0.911
 6 2022-09-01 a     -0.890   -0.911
 7 2022-09-04 a     -0.776   -0.911
 8 2022-09-09 a     -0.911   -0.911
 9 2022-08-25 b      1.08    -0.837
10 2022-08-30 b     -0.547   -0.837
11 2022-09-02 b     -0.477   -0.837
12 2022-09-05 b      0.0645  -0.837
13 2022-09-08 b     -0.511   -0.837
14 2022-09-10 b     -0.837   -0.837
15 2022-08-26 c     -2.35     2.42 
16 2022-08-29 c     -0.575    2.42 
17 2022-09-03 c     -0.998    2.42 
18 2022-09-06 c      0.959    2.42 
19 2022-09-11 c      2.42     2.42 
20 2022-09-07 d     -0.110   -0.110

Or if you don't care about the ordering

df %>% 
  group_by(names) %>% 
  mutate(last_val = last(value, order_by = dates))

CodePudding user response:

Is this what you're looking for ? :

Select the row with the maximum value in each group

Or maybe your question is to first select the names with max date then compute the max value for the selected date ? :

I first select the right time :

df_date = df %>% group_by(names) %>%  mutate(group_max_dates = max(dates)) %>% ungroup()

Then I selected the right time from first step :

df_date_selected = df[df$dates %in% unique(df_date$group_max_dates),]

Then I computed the max values :

results = df_date_selected  %>% group_by(dates, names) %>%  mutate(group_max_dates_names = max(value)) %>% ungroup()
df %>% group_by(dates, names) %>%  mutate(group_max_dates_names = max(value)) %>% ungroup()

I'm not sure because with your data there is only one value for name in max date ! But if they were multiples of them it will be more crystal clear :)

Hope it helps !

  • Related