Home > Net >  How do I select row index to iterate over with mutate in dplyr by skipping n-rows?
How do I select row index to iterate over with mutate in dplyr by skipping n-rows?

Time:10-02

I have an issue with for looping with dplyrs mutate. Basically, all I want to do is to fill the first row in the mutated column with the first row from column B and then keep the mutated values from the second row onwards.

  • Group data
  • Mutate number of occurrences in each group and filter n > 1
  • Mutate new column by filling the first row with the first row from column B and filling row 2 and onwards with the mutated values
   Itemnumber column_A column_B column_C column_D column_E
   <chr>         <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
 1 80126         106.      6           0     5         0  
 2 80380         131       8.38        0     0.45      2.5
 3 80676          95.4  -246           0    40         0  
 4 80142         138.    -41           0    14        10  
 5 80824         128.      3           0     1         0  
 6 80045         124.     22           0     0        70  
 7 81532         147.      5           0     7         5  
 8 80610         109.      4.71        0     0        11  
 9 82720         107.     39.7         0     0        10  
10 82623         112.     51           0     0        14 

This can be done if I subset one group and do it in a basic for loop. But I don't know how to do it for each group and also follow my requirements above.

The code below is what I've done so far but it doesn't fill the first row of new_col with the first row from column_B and it doesn't fill it from the 2nd row onwards.

require(dplyr)

df %>% group_by(Itemnumber) %>% mutate(n = n()) %>% filter(n > 1) %>%
  mutate(new_col = column_B   lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L)))

If we look at one group it should put the value 23 in the first row of new_col on the 2nd row, whereas the first row should be filled with the value 6 from column_B to avoid NA in the last row. Essentially, it should be shifted downwards by 1 row.

  Itemnumber column_A column_B column_C column_D column_E     n new_col
   <chr>         <dbl>    <dbl>    <dbl>    <dbl>    <dbl> <int>   <dbl>
 1 80126          96.3        6        0        5        0    15      23
 2 80126         123.        17        0        3       20    15      26
 3 80126          96.7       24        0        1       10    15      18
 4 80126         129.        10        0        6        0    15      20
 5 80126         117.        18        0        0       10    15      16
 6 80126         112.        12        0        2        0    15      22
 7 80126         106.        13        0        0       10    15      23
 8 80126         108.        21        0        0       10    15      29
 9 80126         124.        28        0        2       10    15      37
10 80126          99.9       24        0       11       20    15      22
11 80126         127.        10        0        2        0    15      18
12 80126         111.         9        0        2       10    15      19
13 80126         115.        10        0        0       10    15      19
14 80126         124.        17        0        1       10    15      37
15 80126         111.        29        0        0       20    15      NA
   

Below is what I would like to do with new_col (I know it doesn't work) =(


df %>% group_by(Itemnumber) %>% 
  mutate(n = n()) %>% filter(n > 1) %>%
  mutate(new_col[i 1] = column_B   lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L))) %>% 
  fill(ifelse(is.na(new_col[1]), 
              column_B[1], 
              fill(new_col, .direction="downup"))
       )

I appreciate all help that I can get!

Edit:

Adding sample dput() of three groups (Itemnumber column) from the dataset.

structure(list(Itemnumber = c("80126", "81532", "82181", "80126", 
"80126", "82181", "81532", "80126", "82181", "80126", "81532", 
"81532", "82181", "80126", "81532", "81532", "81532", "82181", 
"80126", "81532", "81532", "80126", "82181", "80126", "82181", 
"82181", "82181", "80126", "82181", "80126", "80126", "82181", 
"80126", "82181", "80126", "82181", "82181", "81532", "80126"
), column_A = c(96.3, 107.1, 122.8, 122.9, 96.7, 117.1, 88, 129.3, 
115, 117.2, 123.5, 99.5, 120.6, 111.9, 117.5, 135.5, 132.6, 120.4, 
105.6, 127.7, 133.1, 107.8, 119.1, 124.5, 129.7, 110.2, 92.8, 
99.9, 71.6, 126.7, 110.8, 110.1, 114.9, 124.8, 124.4, 115.7, 
108.1, 89.3, 110.8), column_B = c(6, 5, 49, 17, 24, 36, 4, 10, 
44, 18, 10, 10, 29, 12, 10, 12, 8, 29, 13, 12, 13, 21, 156, 28, 
263, 240, 200, 24, 129, 10, 9, 47, 10, 65, 17, 69, 79, 18, 29
), column_C = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), column_D = c(5, 7, 0, 3, 1, 8, 0, 6, 0, 0, 1, 0, 0, 
2, 1, 0, 2, 0, 0, 0, 0, 0, 2, 2, 0, 17, 20, 11, 52, 2, 2, 126, 
0, 0, 1, 0, 0, 0, 0), column_E = c(0, 5, 14, 20, 10, 14, 5, 0, 
14, 10, 5, 0, 14, 0, 0, 5, 0, 14, 10, 5, 5, 10, 140, 10, 126, 
0, 0, 20, 0, 0, 10, 84, 10, 42, 10, 14, 14, 5, 20)), row.names = c(NA, 
-39L), class = c("tbl_df", "tbl", "data.frame"))

And the expected output should look like this

   Itemnumber column_A column_B column_C column_D column_E     n new_col
   <chr>         <dbl>    <dbl>    <dbl>    <dbl>    <dbl> <int>   <dbl>
 1 80126          96.3        6        0        5        0    15       6
 2 80126         123.        17        0        3       20    15      23
 3 80126          96.7       24        0        1       10    15      26
 4 80126         129.        10        0        6        0    15      18
 5 80126         117.        18        0        0       10    15      20
 6 80126         112.        12        0        2        0    15      16
 7 80126         106.        13        0        0       10    15      22
 8 80126         108.        21        0        0       10    15      23
 9 80126         124.        28        0        2       10    15      29
10 80126          99.9       24        0       11       20    15      37
11 80126         127.        10        0        2        0    15      22
12 80126         111.         9        0        2       10    15      18
13 80126         115.        10        0        0       10    15      19
14 80126         124.        17        0        1       10    15      19
15 80126         111.        29        0        0       20    15      37

CodePudding user response:

It is difficult to answer without a proper expected output (especially as your example with 80126 doesn't match your dput() dataset), but here is an answer based on what I understood from the text:

df %>% 
  group_by(Itemnumber) %>% 
  filter(n() > 1) %>%
  mutate(new_col = column_B   lead(column_A, n=1L) - (lead(column_D, n=1L) - lead(column_E, n=1L))) %>%
  mutate(new_col2 = ifelse(row_number()==1, column_B, lag(new_col))) %>% 
  ungroup() %>% 
  arrange(Itemnumber)

  # # A tibble: 39 x 8
  # Itemnumber column_A column_B column_C column_D column_E new_col new_col2
  # <chr>         <dbl>    <dbl>    <dbl>    <dbl>    <dbl>   <dbl>    <dbl>
  # 1 80126          96.3        6        0        5        0    146.       6
  # 2 80126         123.        17        0        3       20    123.     146.
  # 3 80126          96.7       24        0        1       10    147.     123.
  # 4 80126         129.        10        0        6        0    137.     147.
  # 5 80126         117.        18        0        0       10    128.     137.
  # 6 80126         112.        12        0        2        0    128.     128.
  # 7 80126         106.        13        0        0       10    131.     128.
  # 8 80126         108.        21        0        0       10    154.     131.
  # 9 80126         124.        28        0        2       10    137.     154.
  #10 80126          99.9       24        0       11       20    149.     137.

new_col2 is new_col stepped downward with its first row (groupwise) replaced by the first value of column_B. Condition row_number()==1 checks that this is the first row for each group, but you might want to replace it with is.na(lag(new_col)) .

Remember that using lead() and lag() only make sense if you are 100% sure of the order of your dataset. I would highly recommend using arrange() beforehand.

Also, note that you can use n() inside filter(), you don't have to use mutate().

  • Related