How to handle a number of rows conditional statement using dplyr?-CodePudding

The reduced code below concatenates all myData data frame columns, with the exception of element R which is only concatenated if there are more than one R elements in the data frame, and in the case of only one R in the Element column, the single R is then shown without concatenating. The code should also work when the are no R elements in the Element column of the data frame -- but it doesn't.

Below are the correct outputs when the code is run with two R elements present and only one R element present --

With two R elements:

  Element Group ElementCnt finalCode
1       C     4          1     C.4.1
2       R     0          1     R.0.1
3       C     1          2     C.1.2
4       D     3          1     D.3.1
5       C     8          3     C.8.3
6       R     5          2     R.5.2

With one R element:

  Element Group ElementCnt finalCode
1       C     4          1     C.4.1
2       R     0          1         R
3       C     1          2     C.1.2
4       D     3          1     D.3.1
5       C     8          3     C.8.3

The object created in the code called rCount shows the number of times the R element appears in the element column of myData. When I run the code with no R elements present in the data frame, the code crashes. I tried addressing this scenario with the line ifelse(Element == "R" & nrow(rCount) > 0 & rCount$counted == 1 but it doesn't work.

How do I correctly test for the number of rows in my conditional statement, so the code runs correctly for scenarios where there are no R elements?

Reduced code, with various myData dataframes shown for easily running the 2/1/0 R element scenarios:

library(dplyr)

# Two R elements present
myData <-
  data.frame(
    Element = c("C","R","C","D","C","R"),
    Group = c(4,0,1,3,8,5)
  )

# One R element present
myData <-
  data.frame(
    Element = c("C","R","C","D","C"),
    Group = c(4,0,1,3,8)
  )

# No R elements present
myData <-
  data.frame(
    Element = c("C","C","D","C"),
    Group = c(4,1,3,8)
  )

# Code:
rCount <- myData %>% filter(Element == 'R') %>% count(Element, name = 'counted')
seqLabel <- myData %>%
  group_by(Element) %>% 
  mutate(ElementCnt = row_number()) %>%
  ungroup() %>%
  mutate(finalCode = 
           ifelse(Element == "R" & nrow(rCount) > 0 & rCount$counted == 1,
                  Element,
                  paste(Element, Group, ElementCnt,sep = '.')
                  )
        )

print.data.frame(seqLabel)

So in summary, here are the desired outputs for the above dataframes --

Two instances of "R" elements in the Elements column of myData data frame (above code does this fine):

  Element Group ElementCnt finalCode
1       C     4          1     C.4.1
2       R     0          1     R.0.1
3       C     1          2     C.1.2
4       D     3          1     D.3.1
5       C     8          3     C.8.3
6       R     5          2     R.5.2

One instance of "R" element in the Elements column of myData data frame (above code does this fine):

  Element Group ElementCnt finalCode
1       C     4          1     C.4.1
2       R     0          1         R
3       C     1          2     C.1.2
4       D     3          1     D.3.1
5       C     8          3     C.8.3

No "R" element in the Elements column of myData data frame (above code crashes but here is what I would like to see output):

  Element Group ElementCnt finalCode
1       C     4          1     C.4.1
2       C     1          2     C.1.2
3       D     3          1     D.3.1
4       C     8          3     C.8.3

CodePudding user response：

You could do make the calculations inside the grouping instead. The problem with your approach is that it'll return something empty when there's no R. You could check for that to avoid the error, but I believe this is more clear and efficient:

myData |>
    group_by(Element) |>
    mutate(ElementCnt = row_number(),
           finalCode = ifelse(Element == "R" & n() == 1, Element, paste(Element, Group, ElementCnt, sep = "."))) |>
    ungroup()

Output:

# A tibble: 6 × 4
  Element Group ElementCnt finalCode
  <chr>   <dbl>      <int> <chr>    
1 C           4          1 C.4.1    
2 R           0          1 R.0.1    
3 C           1          2 C.1.2    
4 D           3          1 D.3.1    
5 C           8          3 C.8.3    
6 R           5          2 R.5.2    

# A tibble: 5 × 4
  Element Group ElementCnt finalCode
  <chr>   <dbl>      <int> <chr>    
1 C           4          1 C.4.1    
2 R           0          1 R        
3 C           1          2 C.1.2    
4 D           3          1 D.3.1    
5 C           8          3 C.8.3    

# A tibble: 4 × 4
  Element Group ElementCnt finalCode
  <chr>   <dbl>      <int> <chr>    
1 C           4          1 C.4.1    
2 C           1          2 C.1.2    
3 D           3          1 D.3.1    
4 C           8          3 C.8.3

30/jul: Update with correct output and deletion of a misunderstanding of OP.