Help me out! I Have a column named N_AIH, and I have another column named IDENT, it contains the values 1 and 5.
I need to create A new column 'LP' with the following condition: if the same N_AIH value repeats on IDENT 1 and 5.
data:
N_AIH IDENT
122010 1
122010 5
123123 1
123412 1
123434 5
NOTE that 122010 repeats, in the case i need it to be identified
N_AIH IDENT LP
122010 1 YES
122010 5 YES
123123 1 NO
123412 1 NO
123434 5 NO
my csv has 23 milion lines, so I cant use excel.
Im trying to use if and sql sintaxes but with no success.
CodePudding user response:
You can try ave
var
to check if duplicates of N_AIH
exist across different IDENT
s
> transform(df, LP = ifelse(is.na(ave(IDENT, N_AIH, FUN = var)), "NO", "YES"))
N_AIH IDENT LP
1 122010 1 YES
2 122010 5 YES
3 123123 1 NO
4 123412 1 NO
5 123434 5 NO
Data
> dput(df)
structure(list(N_AIH = c(122010L, 122010L, 123123L, 123412L,
123434L), IDENT = c(1L, 5L, 1L, 1L, 5L)), class = "data.frame", row.names = c(NA,
-5L))
CodePudding user response:
You could group and see if all numbers are there:
library(tidyverse)
df |>
group_by(N_AIH) |>
mutate(LP = c("NO", "YES")[all( c(1,5) %in% IDENT) 1])
#> # A tibble: 5 x 3
#> # Groups: N_AIH [4]
#> N_AIH IDENT LP
#> <dbl> <dbl> <chr>
#> 1 122010 1 YES
#> 2 122010 5 YES
#> 3 123123 1 NO
#> 4 123412 1 NO
#> 5 123434 5 NO
or with base R:
df$LP <- by(df, df$N_AIH, \(x) c("NO", "YES")[all( c(1,5) %in% x$IDENT) 1]) |>
rep(times = table(df$N_AIH))
df
#> # A tibble: 5 x 3
#> N_AIH IDENT LP
#> <dbl> <dbl> <chr>
#> 1 122010 1 YES
#> 2 122010 5 YES
#> 3 123123 1 NO
#> 4 123412 1 NO
#> 5 123434 5 NO