i'am using “storms” tibble in dplyr package,in R i want to know
- if there are storms that occur in different years but given the same name?
- if any Which storm names, were reused in which years?
for example:
name year
-------- -----------
Alberto 1997
Alberto 2001
Gordon 1993
Felix 2000
so Alberto display in different years and have the same name
CodePudding user response:
This code will return all storms who have had names reused more than once, instead of only returning the names along with the number of times they were used in the year.
library(dplyr)
library(tidyr)
storms %>%
select(name, year) %>%
distinct() %>%
group_by(name, year) %>%
summarise(Count = n()) %>%
group_by(name) %>%
filter(n() > 1) %>%
select(-Count)
--- Output
# Groups: name [106]
name year
<chr> <dbl>
1 Alberto 1982
2 Alberto 1988
3 Alberto 1994
4 Alberto 2000
5 Alberto 2006
6 Alberto 2012
7 Alberto 2018
8 Alex 1998
9 Alex 2004
10 Alex 2010
To get the actual list of names itself
storms %>%
select(name, year) %>%
distinct() %>%
group_by(name, year) %>%
summarise(Count = n()) %>%
group_by(name) %>%
filter(n() > 1) %>%
select(-Count) %>%
pull(name) %>%
unique()
--- Output
`summarise()` has grouped output by 'name'. You can override using the `.groups` argument.
[1] "Alberto" "Alex" "Allison" "Ana" "Andrew" "Arthur" "Barry" "Beryl" "Beta" "Bill" "Bob"
[12] "Bonnie" "Cesar" "Chantal" "Charley" "Chris" "Claudette" "Colin" "Cristobal" "Danielle" "Danny" "Dean"
[23] "Debby" "Diana" "Don" "Dorian" "Edouard" "Eight" "Emily" "Epsilon" "Erika" "Erin" "Ernesto"
[34] "Fabian" "Fay" "Felix" "Fernand" "Fifteen" "Fiona" "Floyd" "Franklin" "Fred" "Gabrielle" "Gamma"
[45] "Gaston" "Georges" "Gert" "Gloria" "Gonzalo" "Gordon" "Gustav" "Hanna" "Harvey" "Henri" "Hermine"
[56] "Hortense" "Humberto" "Ida" "Ingrid" "Iris" "Isaac" "Isabel" "Isidore" "Ivan" "Jeanne" "Jerry"
[67] "Josephine" "Joyce" "Juan" "Julia" "Karen" "Karl" "Kate" "Katia" "Katrina" "Keith" "Kirk"
[78] "Klaus" "Kyle" "Lee" "Leslie" "Lili" "Lisa" "Lorenzo" "Marco" "Maria" "Matthew" "Melissa"
[89] "Michael" "Nadine" "Nana" "Nate" "Nicole" "Noel" "Olga" "Omar" "Ophelia" "Oscar" "Otto"
[100] "Pablo" "Philippe" "Rina" "Sebastien" "Ten" "Two" "Zeta"
CodePudding user response:
A simple aggregate
solution in base-r
storms_by_year <- aggregate(year~name, data=storms, \(y) paste(unique(y), collapse="|"))
> tail(storms_by_year)
name year
209 Two 2010|2014
210 Vicky 2020
211 Vince 2005
212 Wilfred 2020
213 Wilma 2005
214 Zeta 2005|2006|2020
The storms that occur in multiple years are simply those with a long string in year
> tail(storms_by_year[nchar(storms_by_year$year)>4,])
name year
187 Philippe 2005|2011|2017
191 Rina 2011|2017
197 Sebastien 1995|2019
204 Ten 2005|2007|2011|2020
209 Two 2010|2014
214 Zeta 2005|2006|2020
CodePudding user response:
dplyr::storms %>%
count(name, year) %>%
arrange(name)
# A tibble: 512 x 3
name year n
<chr> <dbl> <int>
1 AL011993 1993 8
2 AL012000 2000 4
3 AL021992 1992 5
4 AL021994 1994 6
5 AL021999 1999 4
6 AL022000 2000 12
7 AL022001 2001 5
8 AL022003 2003 4
9 AL022006 2006 5
10 AL031987 1987 32
# ... with 502 more rows
For Alberto for example:
dplyr::storms %>%
count(name, year) %>%
arrange(name) %>%
filter(name == "Alberto")
# A tibble: 7 x 3
name year n
<chr> <dbl> <int>
1 Alberto 1982 17
2 Alberto 1988 11
3 Alberto 1994 32
4 Alberto 2000 79
5 Alberto 2006 18
6 Alberto 2012 13
7 Alberto 2018 14
Or, with distinct()
dplyr::storms %>%
distinct(name, year) %>%
arrange(name)
CodePudding user response:
A simple way would be to count
by name and year, then take the result and group by name, filtering out those groups with only a single entry:
storms %>%
count(name, year) %>%
group_by(name) %>%
filter(n() > 1) %>%
select(-n)
#> # A tibble: 404 x 2
#> # Groups: name [106]
#> name year
#> <chr> <dbl>
#> 1 Alberto 1982
#> 2 Alberto 1988
#> 3 Alberto 1994
#> 4 Alberto 2000
#> 5 Alberto 2006
#> 6 Alberto 2012
#> 7 Alberto 2018
#> 8 Alex 1998
#> 9 Alex 2004
#> 10 Alex 2010
#> # ... with 394 more rows
Created on 2022-11-14 with reprex v2.0.2
CodePudding user response:
Summarize data, check for duplicates, arrange:
library(dplyr)
storms %>%
distinct(name, year) %>%
filter(duplicated(name) | duplicated(name, fromLast = TRUE)) %>%
arrange(name)
# # A tibble: 404 × 2
# name year
# <chr> <dbl>
# 1 Alberto 1982
# 2 Alberto 1988
# 3 Alberto 1994
# 4 Alberto 2000
# 5 Alberto 2006
# 6 Alberto 2012
# 7 Alberto 2018
# 8 Alex 1998
# 9 Alex 2004
# 10 Alex 2010
# # … with 394 more rows
# # ℹ Use `print(n = ...)` to see more rows
@Ottie's summarized format:
library(dplyr)
storms %>%
distinct(name, year) %>%
group_by(name) %>%
filter(n() > 1) %>%
summarize(year = paste(year, collapse = "|"))
# # A tibble: 106 × 2
# name year
# <chr> <chr>
# 1 Alberto 1982|1988|1994|2000|2006|2012|2018
# 2 Alex 1998|2004|2010|2016
# 3 Allison 1989|1995|2001
# 4 Ana 1979|1985|1991|1997|2003|2009|2015
# 5 Andrew 1986|1992
# 6 Arthur 1984|1990|1996|2002|2008|2014|2020
# 7 Barry 1983|1989|1995|2001|2007|2013|2019
# 8 Beryl 1982|1988|1994|2000|2006|2012|2018
# 9 Beta 2005|2020
# 10 Bill 1997|2003|2009|2015
# # … with 96 more rows
# # ℹ Use `print(n = ...)` to see more rows