Home > Mobile >  How to use head() and tail() with case_when to categorize data
How to use head() and tail() with case_when to categorize data

Time:11-26

I try to categorize simply by the first and last ten data rows using head and tail function:

My dataframe:

df <- structure(list(x = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 
22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 
33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 19.7, 
15, 21.4), y = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8, 
167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7, 75.7, 
71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351, 145, 301, 
121)), row.names = c(NA, -32L), class = c("tbl_df", "tbl", "data.frame"


       x     y
   <dbl> <dbl>
 1  21    160 
 2  21    160 
 3  22.8  108 
 4  21.4  258 
 5  18.7  360 
 6  18.1  225 
 7  14.3  360 
 8  24.4  147.
 9  22.8  141.
10  19.2  168.
# ... with 22 more rows

Desired output:

      x     y      z
1  21.0 160.0  top_n
2  21.0 160.0  top_n
3  22.8 108.0  top_n
4  21.4 258.0  top_n
5  18.7 360.0  top_n
6  18.1 225.0  top_n
7  14.3 360.0  top_n
8  24.4 146.7  top_n
9  22.8 140.8  top_n
10 19.2 167.6  top_n
11 17.8 167.6 middle
12 16.4 275.8 middle
13 17.3 275.8 middle
14 15.2 275.8 middle
15 10.4 472.0 middle
16 10.4 460.0 middle
17 14.7 440.0 middle
18 32.4  78.7 middle
19 30.4  75.7 middle
20 33.9  71.1 middle
21 21.5 120.1 middle
22 15.5 318.0 middle
23 15.2 304.0 last_n
24 13.3 350.0 last_n
25 19.2 400.0 last_n
26 27.3  79.0 last_n
27 26.0 120.3 last_n
28 30.4  95.1 last_n
29 15.8 351.0 last_n
30 19.7 145.0 last_n
31 15.0 301.0 last_n
32 21.4 121.0 last_n

I have tried:

df %>%   
  mutate(category = case_when(head(10) ~ "top_10",
                              tail(10) ~ "last_10",
                              TRUE ~ "middle"))

I am aware of other possible solutions like using row_number() etc...

I want exclusively learn whether it is possible to use head() and tail() function within a case_when statement.

CodePudding user response:

It's not might be you wanted, but you can use these way to get the same results.

as.data.frame is just to see output easily. It's not needed at all.

library(dplyr)
library(xts)

row_number()

df %>%
  mutate(category = case_when(
    row_number() %in% c(1:10) ~ "top_10",
    row_number() %in% c((max(row_number())-9) : max(row_number()) ) ~ "last_10",
    TRUE ~ "middle"
  )) %>%
  as.data.frame()

xts::first and xts::last

df %>%
  mutate(cat = 1:n()) %>%
  mutate(category = case_when(
    cat %in% first(cat,10) ~ "top_10",
    cat %in% last(cat,10) ~ "last_10",
    TRUE ~ "middle"
  )) %>%
  as.data.frame() %>%
  select(-cat)

head and tail

df %>%
  mutate(cat = 1:n()) %>%
  mutate(category = case_when(
    cat %in% head(cat,10) ~ "top_10",
    cat %in% tail(cat,10) ~ "last_10",
    TRUE ~ "middle"
  )) %>%
  as.data.frame() %>%
  select(-cat)

Result

      x     y category
1  21.0 160.0   top_10
2  21.0 160.0   top_10
3  22.8 108.0   top_10
4  21.4 258.0   top_10
5  18.7 360.0   top_10
6  18.1 225.0   top_10
7  14.3 360.0   top_10
8  24.4 146.7   top_10
9  22.8 140.8   top_10
10 19.2 167.6   top_10
11 17.8 167.6   middle
12 16.4 275.8   middle
13 17.3 275.8   middle
14 15.2 275.8   middle
15 10.4 472.0   middle
16 10.4 460.0   middle
17 14.7 440.0   middle
18 32.4  78.7   middle
19 30.4  75.7   middle
20 33.9  71.1   middle
21 21.5 120.1   middle
22 15.5 318.0   middle
23 15.2 304.0  last_10
24 13.3 350.0  last_10
25 19.2 400.0  last_10
26 27.3  79.0  last_10
27 26.0 120.3  last_10
28 30.4  95.1  last_10
29 15.8 351.0  last_10
30 19.7 145.0  last_10
31 15.0 301.0  last_10
32 21.4 121.0  last_10

CodePudding user response:

Not advisable, but possible.

Alternatively, since it wasn't mentioned in the known appraoches, directly create a vector relying on magrittr's dot functionality:

df %>%
  mutate(category = c(rep("top_n", 10), rep("middle", nrow(.)-20), rep("last_n", 10)))
  • Related