Home > Net >  Mutating a binary variable based on a continuous variable (dplyr)
Mutating a binary variable based on a continuous variable (dplyr)

Time:12-21

I have a dataset of Reddit users and their posts, and I am trying to create an indicator variable that is coded 1 if the user has a number of posts that are in the 80th percentile, and 0 otherwise. I am essentially interested in categorizing users into "active" versus "passive" users.

I have created a variable that counts the number of posts by username:

df <-
df %>% group_by(username) %>% mutate(count = n()) 
  #count(username, sort = TRUE)

Here is a data example:

df %>% 
  select(username, count) %>% 
  head(., 4)

output:

username
cyz
crash
conan
xyz
<chr>
count
14
12
7
13
<int>

I have tried the following to identify users with a number of posts in the top 20th percentile:

df %>% 
  group_by(username) %>% 
    do(tidy(t(quantile(.$count))))

Here is a data example for the variable "count", which counts the number of posts per row.

dput(df$count)

output:

c(15L, 9L, 1L, 1L, 1L, 1L, 1L, 1L, 15L, 15L, 15L, 1L, 15L, 1L, 
1L, 15L, 1L, 1L, 15L, 2L, 15L, 1L, 15L, 1L, 15L, 2L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 15L, 191L, 3L, 191L, 
191L, 1L, 191L, 191L, 2L, 191L, 191L, 1L, 191L, 1L, 191L, 191L, 
191L, 3L, 191L, 98L, 191L, 1L, 191L, 2L, 191L, 9L, 1L, 191L, 
1L, 1L, 3L, 191L, 191L, 191L, 2L, 3L, 1L, 1L, 2L, 2L, 191L, 191L, 
191L, 191L, 17L, 1L, 3L, 4L, 3L, 22L, 2L, 3L, 3L, 191L)

CodePudding user response:

You could use mutate to get a new column with activity coded as you expected.

EDIT: updated the dataframe with the supplied dput for count variable.

df <- data.frame(ID = as.character(1:92),
                 count = count)


df_with_activity <- df %>% 
  mutate(active = ifelse(count >= quantile(count, 0.8), 1, 0))

   ID count active
1   1    15      0
2   2     9      0
3   3     1      0
4   4     1      0
5   5     1      0
6   6     1      0
7   7     1      0
8   8     1      0
9   9    15      0
10 10    15      0
11 11    15      0
12 12     1      0
13 13    15      0
14 14     1      0
15 15     1      0
16 16    15      0
17 17     1      0
18 18     1      0
19 19    15      0
20 20     2      0
21 21    15      0
22 22     1      0
23 23    15      0
24 24     1      0
25 25    15      0
26 26     2      0
27 27     1      0
28 28     1      0
29 29     1      0
30 30     1      0
31 31     1      0
32 32     1      0
33 33     1      0
34 34     1      0
35 35     1      0
36 36     1      0
37 37     1      0
38 38     3      0
39 39    15      0
40 40   191      1
41 41     3      0
42 42   191      1
43 43   191      1
44 44     1      0
45 45   191      1
46 46   191      1
47 47     2      0
48 48   191      1
49 49   191      1
50 50     1      0
51 51   191      1
52 52     1      0
53 53   191      1
54 54   191      1
55 55   191      1
56 56     3      0
57 57   191      1
58 58    98      0
59 59   191      1
60 60     1      0
61 61   191      1
62 62     2      0
63 63   191      1
64 64     9      0
65 65     1      0
66 66   191      1
67 67     1      0
68 68     1      0
69 69     3      0
70 70   191      1
71 71   191      1
72 72   191      1
73 73     2      0
74 74     3      0
75 75     1      0
76 76     1      0
77 77     2      0
78 78     2      0
79 79   191      1
80 80   191      1
81 81   191      1
82 82   191      1
83 83    17      0
84 84     1      0
85 85     3      0
86 86     4      0
87 87     3      0
88 88    22      0
89 89     2      0
90 90     3      0
91 91     3      0
92 92   191      1

And these are the ones that should be labelled active:

df_with_activity %>% 
  filter(active == 1)
   ID count active
1  40   191      1
2  42   191      1
3  43   191      1
4  45   191      1
5  46   191      1
6  48   191      1
7  49   191      1
8  51   191      1
9  53   191      1
10 54   191      1
11 55   191      1
12 57   191      1
13 59   191      1
14 61   191      1
15 63   191      1
16 66   191      1
17 70   191      1
18 71   191      1
19 72   191      1
20 79   191      1
21 80   191      1
22 81   191      1
23 82   191      1
24 92   191      1
  • Related