I have the following sample data:
df <- structure(list(person = c("a", "a", "a", "a", "a", "a", "a",
"b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", "d", "d",
"d", "e", "f"), data = c(4.51646448148296, 6.50396548317118,
NA, 9.84425670640439, NA, NA, 1.67850554309375, NA, 2.06950894895984,
4.69548965710482, 5.12795074939009, 0.209845957108408, 6.47290211779118,
6.26481009615426, NA, NA, 5.24543093106204, 4.75242871394851,
6.8182202925863, NA, 5.3162641705813, 6.05862341995543, NA)), row.names = c(NA,
-23L), class = "data.frame")
Which looks like this
person data
1 a 4.516464
2 a 6.503965
3 a NA
4 a 9.844257
5 a NA
6 a NA
7 a 1.678506
8 b NA
9 b 2.069509
10 b 4.695490
11 b 5.127951
12 c 0.209846
13 c 6.472902
14 c 6.264810
15 c NA
16 c NA
17 c 5.245431
18 c 4.752429
19 d 6.818220
20 d NA
21 d 5.316264
22 e 6.058623
23 f NA
I'm trying to be able to calculate the percent change from the latest and second latest entry in the data
column based on person
. The entries are logged from first to latest by person (i.e., row 1 was person A's first entry, and row 7 was their latest entry).
I have tried various combinations of the following code with no luck:
df %>%
group_by(person) %>%
mutate(percChange = c(rep(NA, ((df %>% count(person))[2]) -
length(((na.omit(data) - lag(na.omit(data)))/na.omit(data)*100))),
(na.omit(data) - lag(na.omit(data)))/na.omit(data)*100)) %>%
summarise(max = max(na.omit(data)),
avg = mean(na.omit(data)),
lastPercChange = last(na.omit(percChange)))
The desired output would look like this, with example calculation of the last % change:
person max avg lastPercChange
1 a 9.84 5.64 33.93138 (1.678506-9.844257)/1.678506 * 100
2 b 5.13 3.96 8.433407 (5.127951-4.695490)/5.127951 * 100
3 c 6.47 4.59 -10.37368 (4.752429-5.245431)/4.752429 * 100
4 d 6.82 6.07 14.66658 (5.316264-6.818220)/5.316264 * 100
5 e 6.06 6.06 NA (since only 1 observation)
6 f -Inf NaN NA (since only 1 observation)
Any help would be greatly appreciated.
CodePudding user response:
For simplicity I would drop all the NA
rows first then it is easy to calculate the values that you want. Function nth(..)
or head
/tail
can help us provide last and second last values.
If a person
has all the NA
values it would be dropped completely from the output to keep it intact we do a right_join
with the original data.
library(dplyr)
df %>%
filter(!is.na(data)) %>%
group_by(person) %>%
summarise(max = max(data),
avg = mean(data),
lastPercChange = (nth(data,-1) - nth(data,-2))/nth(data, -1) *100)%>%
right_join(df %>% distinct(person), by = 'person')
# person max avg lastPercChange
# <chr> <dbl> <dbl> <dbl>
#1 a 9.84 5.64 -486.
#2 b 5.13 3.96 8.43
#3 c 6.47 4.59 -10.4
#4 d 6.82 6.07 -28.3
#5 e 6.06 6.06 NA
#6 f NA NA NA