Returning/identify values based on conditions in R-CodePudding

Help me out! I Have a column named N_AIH, and I have another column named IDENT, it contains the values 1 and 5.

I need to create A new column 'LP' with the following condition: if the same N_AIH value repeats on IDENT 1 and 5.

data:

N_AIH           IDENT
122010            1
122010            5
123123            1
123412            1
123434            5

NOTE that 122010 repeats, in the case i need it to be identified

N_AIH           IDENT   LP 
122010            1     YES
122010            5     YES
123123            1     NO
123412            1     NO
123434            5     NO

my csv has 23 milion lines, so I cant use excel.

Im trying to use if and sql sintaxes but with no success.

CodePudding user response：

You can try ave var to check if duplicates of N_AIH exist across different IDENTs

> transform(df, LP = ifelse(is.na(ave(IDENT, N_AIH, FUN = var)), "NO", "YES"))
   N_AIH IDENT  LP
1 122010     1 YES
2 122010     5 YES
3 123123     1  NO
4 123412     1  NO
5 123434     5  NO

Data

> dput(df)
structure(list(N_AIH = c(122010L, 122010L, 123123L, 123412L, 
123434L), IDENT = c(1L, 5L, 1L, 1L, 5L)), class = "data.frame", row.names = c(NA,
-5L))

CodePudding user response：

You could group and see if all numbers are there:

library(tidyverse)

df |>
  group_by(N_AIH) |>
  mutate(LP = c("NO", "YES")[all( c(1,5) %in% IDENT) 1])
#> # A tibble: 5 x 3
#> # Groups:   N_AIH [4]
#>    N_AIH IDENT LP   
#>    <dbl> <dbl> <chr>
#> 1 122010     1 YES  
#> 2 122010     5 YES  
#> 3 123123     1 NO   
#> 4 123412     1 NO   
#> 5 123434     5 NO

or with base R:

df$LP <- by(df, df$N_AIH, \(x) c("NO", "YES")[all( c(1,5) %in% x$IDENT) 1]) |>
rep(times = table(df$N_AIH))

df
#> # A tibble: 5 x 3
#>    N_AIH IDENT LP   
#>    <dbl> <dbl> <chr>
#> 1 122010     1 YES  
#> 2 122010     5 YES  
#> 3 123123     1 NO   
#> 4 123412     1 NO   
#> 5 123434     5 NO