I have an issue with for looping with dplyrs mutate. Basically, all I want to do is to fill the first row in the mutated column with the first row from column B and then keep the mutated values from the second row onwards.
- Group data
- Mutate number of occurrences in each group and filter n > 1
- Mutate new column by filling the first row with the first row from column B and filling row 2 and onwards with the mutated values
Itemnumber column_A column_B column_C column_D column_E
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 80126 106. 6 0 5 0
2 80380 131 8.38 0 0.45 2.5
3 80676 95.4 -246 0 40 0
4 80142 138. -41 0 14 10
5 80824 128. 3 0 1 0
6 80045 124. 22 0 0 70
7 81532 147. 5 0 7 5
8 80610 109. 4.71 0 0 11
9 82720 107. 39.7 0 0 10
10 82623 112. 51 0 0 14
This can be done if I subset one group and do it in a basic for loop. But I don't know how to do it for each group and also follow my requirements above.
The code below is what I've done so far but it doesn't fill the first row of new_col with the first row from column_B and it doesn't fill it from the 2nd row onwards.
require(dplyr)
df %>% group_by(Itemnumber) %>% mutate(n = n()) %>% filter(n > 1) %>%
mutate(new_col = column_B lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L)))
If we look at one group it should put the value 23 in the first row of new_col on the 2nd row, whereas the first row should be filled with the value 6 from column_B to avoid NA in the last row. Essentially, it should be shifted downwards by 1 row.
Itemnumber column_A column_B column_C column_D column_E n new_col
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 80126 96.3 6 0 5 0 15 23
2 80126 123. 17 0 3 20 15 26
3 80126 96.7 24 0 1 10 15 18
4 80126 129. 10 0 6 0 15 20
5 80126 117. 18 0 0 10 15 16
6 80126 112. 12 0 2 0 15 22
7 80126 106. 13 0 0 10 15 23
8 80126 108. 21 0 0 10 15 29
9 80126 124. 28 0 2 10 15 37
10 80126 99.9 24 0 11 20 15 22
11 80126 127. 10 0 2 0 15 18
12 80126 111. 9 0 2 10 15 19
13 80126 115. 10 0 0 10 15 19
14 80126 124. 17 0 1 10 15 37
15 80126 111. 29 0 0 20 15 NA
Below is what I would like to do with new_col (I know it doesn't work) =(
df %>% group_by(Itemnumber) %>%
mutate(n = n()) %>% filter(n > 1) %>%
mutate(new_col[i 1] = column_B lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L))) %>%
fill(ifelse(is.na(new_col[1]),
column_B[1],
fill(new_col, .direction="downup"))
)
I appreciate all help that I can get!
Edit:
Adding sample dput() of three groups (Itemnumber column) from the dataset.
structure(list(Itemnumber = c("80126", "81532", "82181", "80126",
"80126", "82181", "81532", "80126", "82181", "80126", "81532",
"81532", "82181", "80126", "81532", "81532", "81532", "82181",
"80126", "81532", "81532", "80126", "82181", "80126", "82181",
"82181", "82181", "80126", "82181", "80126", "80126", "82181",
"80126", "82181", "80126", "82181", "82181", "81532", "80126"
), column_A = c(96.3, 107.1, 122.8, 122.9, 96.7, 117.1, 88, 129.3,
115, 117.2, 123.5, 99.5, 120.6, 111.9, 117.5, 135.5, 132.6, 120.4,
105.6, 127.7, 133.1, 107.8, 119.1, 124.5, 129.7, 110.2, 92.8,
99.9, 71.6, 126.7, 110.8, 110.1, 114.9, 124.8, 124.4, 115.7,
108.1, 89.3, 110.8), column_B = c(6, 5, 49, 17, 24, 36, 4, 10,
44, 18, 10, 10, 29, 12, 10, 12, 8, 29, 13, 12, 13, 21, 156, 28,
263, 240, 200, 24, 129, 10, 9, 47, 10, 65, 17, 69, 79, 18, 29
), column_C = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0), column_D = c(5, 7, 0, 3, 1, 8, 0, 6, 0, 0, 1, 0, 0,
2, 1, 0, 2, 0, 0, 0, 0, 0, 2, 2, 0, 17, 20, 11, 52, 2, 2, 126,
0, 0, 1, 0, 0, 0, 0), column_E = c(0, 5, 14, 20, 10, 14, 5, 0,
14, 10, 5, 0, 14, 0, 0, 5, 0, 14, 10, 5, 5, 10, 140, 10, 126,
0, 0, 20, 0, 0, 10, 84, 10, 42, 10, 14, 14, 5, 20)), row.names = c(NA,
-39L), class = c("tbl_df", "tbl", "data.frame"))
And the expected output should look like this
Itemnumber column_A column_B column_C column_D column_E n new_col
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 80126 96.3 6 0 5 0 15 6
2 80126 123. 17 0 3 20 15 23
3 80126 96.7 24 0 1 10 15 26
4 80126 129. 10 0 6 0 15 18
5 80126 117. 18 0 0 10 15 20
6 80126 112. 12 0 2 0 15 16
7 80126 106. 13 0 0 10 15 22
8 80126 108. 21 0 0 10 15 23
9 80126 124. 28 0 2 10 15 29
10 80126 99.9 24 0 11 20 15 37
11 80126 127. 10 0 2 0 15 22
12 80126 111. 9 0 2 10 15 18
13 80126 115. 10 0 0 10 15 19
14 80126 124. 17 0 1 10 15 19
15 80126 111. 29 0 0 20 15 37
CodePudding user response:
It is difficult to answer without a proper expected output (especially as your example with 80126 doesn't match your dput()
dataset), but here is an answer based on what I understood from the text:
df %>%
group_by(Itemnumber) %>%
filter(n() > 1) %>%
mutate(new_col = column_B lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L))) %>%
mutate(new_col2 = ifelse(row_number()==1, column_B, lag(new_col))) %>%
ungroup() %>%
arrange(Itemnumber)
# # A tibble: 39 x 8
# Itemnumber column_A column_B column_C column_D column_E new_col new_col2
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 80126 96.3 6 0 5 0 146. 6
# 2 80126 123. 17 0 3 20 123. 146.
# 3 80126 96.7 24 0 1 10 147. 123.
# 4 80126 129. 10 0 6 0 137. 147.
# 5 80126 117. 18 0 0 10 128. 137.
# 6 80126 112. 12 0 2 0 128. 128.
# 7 80126 106. 13 0 0 10 131. 128.
# 8 80126 108. 21 0 0 10 154. 131.
# 9 80126 124. 28 0 2 10 137. 154.
#10 80126 99.9 24 0 11 20 149. 137.
new_col2
is new_col
stepped downward with its first row (groupwise) replaced by the first value of column_B
. Condition row_number()==1
checks that this is the first row for each group, but you might want to replace it with is.na(lag(new_col))
.
Remember that using lead()
and lag()
only make sense if you are 100% sure of the order of your dataset. I would highly recommend using arrange()
beforehand.
Also, note that you can use n()
inside filter()
, you don't have to use mutate()
.