Hello Everyone I hope you guys are having a great week,
I have a small dataset of 4 variables one is subject
the second is key
which is a code that a subject uses to log onto a system, third is order
which will keep track of chronological year and finally the variable Period
that indicates whether the key was used in a previous time "past
" or the current month "current
"
This is the dataset:
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-cbind(subjects,keys,period,order)
> head(df)
subjects keys period order
[1,] "James" "IX08-8" "past" "1"
[2,] "James" "IX08-8" "past" "2"
[3,] "James" "IX08-8" "current" "3"
[4,] "Alex" "UX-007" "past" "4"
[5,] "Alex" "HH-011" "current" "5"
[6,] "Mila" "PO_85" "past" "6"
ultimately I have to be able to tell if a subject is using a previously used key
to log onto the system in the current
Period, if a subject
uses a new key
to log onto the system in the current period
then I will assign the value "1" to a column named "result
", if the user did not use a previously used key
to log onto the system during the current period
the value assigned should be "0" and "NA" otherwise
My desired output would look like this:
subjects keys period order result
[1,] "James" "IX08-8" "past" "1" NA
[2,] "James" "IX08-8" "past" "2" NA
[3,] "James" "IX08-8" "current" "3" "0"
[4,] "Alex" "UX-007" "past" "4" NA
[5,] "Alex" "HH-011" "current" "5" "1"
[6,] "Mila" "PO_85" "past" "6" NA
[7,] "Mila" "PO_85" "past" "7" NA
[8,] "Mila" "PO_85" "past" "8" NA
[9,] "Mila" "PO_85" "past" "9" NA
[10,] "Mila" "PO_85" "past" "10" NA
[11,] "Mila" "PO_85" "past" "11" NA
[12,] "Mila" "PO_85" "current" "12" "0"
[13,] "Mila" "UJ_8" "current" "13" "1"
[14,] "Mark" "785_PO" "current" "14" "1"
for instance, in row # 3 James has assigned value 0 in result because he has used a previously used key to log on to the system in the current month which was key "IX08-8", but Mark has a value of 1 in the result column because the system has only tracked one key and that happens to be the key that he used to log on this current period which is technically a "new key"
What have I done to solve this?
I can group the dataset by subject
and make sure this is arrange by order
in a descending manner, but I can only think about creating a vector of keys (vector.of.previous.keys
) per subject based on (period="past") and then evaluate if the current key is %in%
vector.of.previous.keys
, but if there is a way to only check whithin groups this criteria it would be a lot more efficient thank you so much guys for your help
CodePudding user response:
Another dplyr
method here, where all of your conditions are coded in a case_when
statement.
Code
library(dplyr)
df %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & n() == 1 ~ "1",
period == "current" & keys == first(keys) ~ "0",
period == "current" & keys != first(keys) & n() > 1 ~ "1",
period == "past" ~ NA_character_,
TRUE == "past" ~ NA_character_))
# A tibble: 14 × 5
# Groups: subjects [4]
subjects keys period order result
<chr> <chr> <chr> <int> <chr>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1
data
Note that I've changed your cbind()
to data.frame
(data frame is easier to work with compared to matrix).
subjects<-c(rep("James",3),
rep("Alex",2),
rep("Mila",8),
rep("Mark",1))
keys<-c(rep("IX08-8",2),"IX08-8",
"UX-007","HH-011",rep("PO_85",7),"UJ_8","785_PO")
order<-c(1:14)
period<-c("past","past","current","past","current",rep("past",6),"current","current","current")
df<-data.frame(subjects,keys,period,order)
CodePudding user response:
Assuming your data is stored in a data.frame
df <- data.frame(subjects,keys,period,order)
you could use
library(dplyr)
df %>%
group_by(subjects, keys) %>%
mutate(count = row_number()) %>%
group_by(subjects) %>%
mutate(result = case_when(period == "current" & count == 1 ~ 1,
period == "current" & count >= 1 ~ 0,
TRUE ~ NA_real_)) %>%
ungroup() %>%
select(-count)
to get
# A tibble: 14 x 5
subjects keys period order result
<chr> <chr> <chr> <int> <dbl>
1 James IX08-8 past 1 NA
2 James IX08-8 past 2 NA
3 James IX08-8 current 3 0
4 Alex UX-007 past 4 NA
5 Alex HH-011 current 5 1
6 Mila PO_85 past 6 NA
7 Mila PO_85 past 7 NA
8 Mila PO_85 past 8 NA
9 Mila PO_85 past 9 NA
10 Mila PO_85 past 10 NA
11 Mila PO_85 past 11 NA
12 Mila PO_85 current 12 0
13 Mila UJ_8 current 13 1
14 Mark 785_PO current 14 1