Home > Mobile >  Calculate % Change Avoiding NA's, By Group
Calculate % Change Avoiding NA's, By Group

Time:05-10

I have the following sample data:

df <- structure(list(person = c("a", "a", "a", "a", "a", "a", "a", 
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", "d", "d", 
"d", "e", "f"), data = c(4.51646448148296, 6.50396548317118, 
NA, 9.84425670640439, NA, NA, 1.67850554309375, NA, 2.06950894895984, 
4.69548965710482, 5.12795074939009, 0.209845957108408, 6.47290211779118, 
6.26481009615426, NA, NA, 5.24543093106204, 4.75242871394851, 
6.8182202925863, NA, 5.3162641705813, 6.05862341995543, NA)), row.names = c(NA, 
-23L), class = "data.frame")

Which looks like this

  person     data
1       a 4.516464
2       a 6.503965
3       a       NA
4       a 9.844257
5       a       NA
6       a       NA
7       a 1.678506
8       b       NA
9       b 2.069509
10      b 4.695490
11      b 5.127951
12      c 0.209846
13      c 6.472902
14      c 6.264810
15      c       NA
16      c       NA
17      c 5.245431
18      c 4.752429
19      d 6.818220
20      d       NA
21      d 5.316264
22      e 6.058623
23      f       NA

I'm trying to be able to calculate the percent change from the latest and second latest entry in the data column based on person. The entries are logged from first to latest by person (i.e., row 1 was person A's first entry, and row 7 was their latest entry).

I have tried various combinations of the following code with no luck:

df %>%
  group_by(person) %>%
  mutate(percChange = c(rep(NA, ((df %>% count(person))[2]) - 
                              length(((na.omit(data) - lag(na.omit(data)))/na.omit(data)*100))), 
                        (na.omit(data) - lag(na.omit(data)))/na.omit(data)*100)) %>%
  summarise(max = max(na.omit(data)),
            avg = mean(na.omit(data)),
            lastPercChange = last(na.omit(percChange)))

The desired output would look like this, with example calculation of the last % change:

 person     max    avg        lastPercChange                                                                       
1 a         9.84   5.64       33.93138    (1.678506-9.844257)/1.678506 * 100 
2 b         5.13   3.96       8.433407    (5.127951-4.695490)/5.127951 * 100 
3 c         6.47   4.59       -10.37368   (4.752429-5.245431)/4.752429 * 100
4 d         6.82   6.07       14.66658    (5.316264-6.818220)/5.316264 * 100 
5 e         6.06   6.06       NA          (since only 1 observation)                 
6 f      -Inf    NaN          NA          (since only 1 observation)

Any help would be greatly appreciated.

CodePudding user response:

For simplicity I would drop all the NA rows first then it is easy to calculate the values that you want. Function nth(..) or head/tail can help us provide last and second last values.

If a person has all the NA values it would be dropped completely from the output to keep it intact we do a right_join with the original data.

library(dplyr)

df %>%
  filter(!is.na(data)) %>%
  group_by(person) %>%
  summarise(max = max(data), 
            avg = mean(data), 
            lastPercChange = (nth(data,-1) - nth(data,-2))/nth(data, -1) *100)%>%
  right_join(df %>% distinct(person), by = 'person')

# person   max   avg lastPercChange
#  <chr>  <dbl> <dbl>          <dbl>
#1 a       9.84  5.64        -486.  
#2 b       5.13  3.96           8.43
#3 c       6.47  4.59         -10.4 
#4 d       6.82  6.07         -28.3 
#5 e       6.06  6.06          NA   
#6 f      NA    NA             NA   
  • Related