Home > database >  Check whether the content in one row equals to the first appeared value, if so, record ID: how to ac
Check whether the content in one row equals to the first appeared value, if so, record ID: how to ac

Time:03-04

I have people's trip records as a data frame in R. Which looks as below:

   t_participant_id t_destination_PostCode tripReasonString tripSequence
1               304                   1082             work            1
2               304                   1000             work            2
3               304                   1000             work            3
4               304                   1000            other            4
5               304                   1000            other            5
6               304                   1082            other            6
7               304                   1082             home            7
8               362                   1070         shopping            1
9               362                   1070             work            2
10              362                   1070             work            3
11              362                   1070         shopping            4
12              362                   1180             home            5
13              362                   1070          leisure            6
14              362                   1180             home            7
15              482                   2800             work            1
16              482                   2800             work            2
17              482                   1020             home            3

Basically, t_participant_id is a person's unique ID, tripReason is the reason for people's trip, and the t_destination_postcode is the area that they are going to (for instance, in row 1, person 304 go to area 1082 for work).

Different people have different numbers of trips during one day (person 304 and 362has 7 trips in total, whereas person 482 has 3). I would like to filter out the person that the t_destination_PostCode for the trip purpose work is the same as other work trips under this person's activity chain.

In short, for person 304, his first work trip is at line 1, and going to the area 1082. The next work trip is at line 2, and going to the area 1000. So I know he is not the person I would like to filter out. However, for person 362, he has the first work activity at tripSequence == 2, heading to 1070. The next work trip is at tripSequence 3, and also to 1070. No other work trips in the activity chain anymore. So in this case, we would like to filter out person 362, and record the first work tripSequence, which is 2.

So the final result should be a data frame that says

t_participant_id    firstWorkTrip
             362                2
             482                1

I really don't want to do this manually but not sure how can I achieve this in R. Thanks very much for your help in advance!

Here is the reprex for the dataframe

test <- data.frame(list(t_participant_id = c(304L, 304L, 304L, 304L, 304L, 
                                             304L, 304L, 362L, 362L, 362L, 362L, 362L, 362L, 362L, 482L, 482L, 
                                             482L), t_destination_PostCode = c(1082L, 1000L, 1000L, 1000L, 
                                                                               1000L, 1082L, 1082L, 1070L, 1070L, 1070L, 1070L, 1180L, 1070L, 
                                                                               1180L, 2800L, 2800L, 1020L), tripReasonString = c("work", "work", 
                                                                                                                                 "work", "other", "other", "other", "home", "shopping", "work", 
                                                                                                                                 "work", "shopping", "home", "leisure", "home", "work", "work", 
                                                                                                                                 "home"), tripSequence = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 
                                                                                                                                                           3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L)))

         

CodePudding user response:

df %>%
  filter(tripReasonString == "work") %>%
  group_by(t_participant_id) %>%
  filter(n_distinct(t_destination_PostCode) == 1) %>%
  summarize(first_work_trip = min(tripSequence))
# # A tibble: 2 × 2
#   t_participant_id first_work_trip
#              <int>           <int>
# 1              362               2
# 2              482               1
  • Related