Home > Enterprise >  Transform or remove excess number of rows in data frame
Transform or remove excess number of rows in data frame

Time:01-06

I have a dataframe that contains personal data like this:

data2 <- data.frame(Name = c("Peter Parker", "Peter Parker", "Peter Parker", 
"Peter Parker", "Peter Parker", "Peter Parker", "Peter Parker", "Benny Thompson", "Benny Thompson", 
"Benny Thompson", "Benny Thompson", "Benny Thompson", "Mylo Thony"
), Result = c(150L, 155L, 156L, 154L, 158L, 159L, 159L, 130L, 132L, 138L, 
140L, 139L, 177L), Date = c("2018-03-03", "2018-03-04", "2018-03-05", 
"2018-03-06", "2018-03-07", "2018-03-09", "2018-03-13", "2018-03-03", "2018-03-04", "2018-03-05", 
"2018-03-07", "2018-03-09", "2018-03-11"))
> data2
             Name Result        Date
1    Peter Parker    150  2018-03-03
2    Peter Parker    155  2018-03-04
3    Peter Parker    156  2018-03-05
4    Peter Parker    154  2018-03-06
5    Peter Parker    158  2018-03-07
6    Peter Parker    159  2018-03-09
7    Peter Parker    159  2018-03-13
8  Benny Thompson    130  2018-03-03
9  Benny Thompson    132  2018-03-04
10 Benny Thompson    138  2018-03-05
11 Benny Thompson    140  2018-03-07
12 Benny Thompson    139  2018-03-09
13     Mylo Thony    177  2018-03-11

It has a lot more columns which I omitted for brevity.

I want to remove the excessive entries of everyone with more than 5 entries. In this case Benny has exaclty 5 entries, so he can stay "as is". Mylo does only have one entry, so his entry will eventually be dealt with at a later stage in my code (will be removed). For Peter my preferred solution would be to create 3 sets of 5 entries, with "Peter Parker1", "Peter Parker2" and "Peter Parker3" as names. I don't really care about the exact names, since only the results and eventually the dates are used for some predictions. I want it to look something like this:

            Name Result        Date
   Peter Parker1    150  2018-03-03
   Peter Parker1    155  2018-03-04
   Peter Parker1    156  2018-03-05
   Peter Parker1    154  2018-03-06
   Peter Parker1    158  2018-03-07
   Peter Parker2    155  2018-03-04
   Peter Parker2    156  2018-03-05
   Peter Parker2    154  2018-03-06
   Peter Parker2    158  2018-03-07
   Peter Parker2    159  2018-03-09
   Peter Parker3    156  2018-03-05
   Peter Parker3    154  2018-03-06
   Peter Parker3    158  2018-03-07
   Peter Parker3    159  2018-03-09
   Peter Parker3    159  2018-03-13
  Benny Thompson    130  2018-03-03
  Benny Thompson    132  2018-03-04
  Benny Thompson    138  2018-03-05
  Benny Thompson    140  2018-03-07
  Benny Thompson    139  2018-03-09
      Mylo Thony    177  2018-03-11

I have no clue how to achieve that. In case this is not possible I just want to remove all excessive entries of Peter and just keep the first 5 so I can make at least good use of those first 5

CodePudding user response:

library(tidyverse)

data2 <- data.frame(Name = c("Peter Parker", "Peter Parker", "Peter Parker", 
                             "Peter Parker", "Peter Parker", "Peter Parker", "Peter Parker", "Benny Thompson", "Benny Thompson", 
                             "Benny Thompson", "Benny Thompson", "Benny Thompson", "Mylo Thony"
), Result = c(150L, 155L, 156L, 154L, 158L, 159L, 159L, 130L, 132L, 138L, 
              140L, 139L, 177L), Date = c("2018-03-03", "2018-03-04", "2018-03-05", 
                                          "2018-03-06", "2018-03-07", "2018-03-09", "2018-03-13", "2018-03-03", "2018-03-04", "2018-03-05", 
                                          "2018-03-07", "2018-03-09", "2018-03-11"))

data2 %>%
  group_by(Name) %>%
  mutate(
    n = n(),
    group = ceiling(row_number()/5),
    Name = if_else(n>5, paste0(Name, group), Name)
  ) %>%
  select(-n,-group)
#> # A tibble: 13 × 3
#> # Groups:   Name [4]
#>    Name           Result Date      
#>    <chr>           <int> <chr>     
#>  1 Peter Parker1     150 2018-03-03
#>  2 Peter Parker1     155 2018-03-04
#>  3 Peter Parker1     156 2018-03-05
#>  4 Peter Parker1     154 2018-03-06
#>  5 Peter Parker1     158 2018-03-07
#>  6 Peter Parker2     159 2018-03-09
#>  7 Peter Parker2     159 2018-03-13
#>  8 Benny Thompson    130 2018-03-03
#>  9 Benny Thompson    132 2018-03-04
#> 10 Benny Thompson    138 2018-03-05
#> 11 Benny Thompson    140 2018-03-07
#> 12 Benny Thompson    139 2018-03-09
#> 13 Mylo Thony        177 2018-03-11

Created on 2023-01-06 by the reprex package (v2.0.1)

Is something like this what you're after?

  •  Tags:  
  • r
  • Related