I have a dataframe of annual values at multiple sites for two species. I need to add a character variable (we'll call this position) that designates the position of each row in the timeseries. The issue is that I have a number of sites (grouping variable), and the lengths of the timeseries of each site differ. If it helps, I would like to split each timeseries into thirds, so that the resulting data looks something like this (simplified):
Species Site Year Position
1 1 2000 Early
1 1 2001 Mid
1 1 2002 Late
1 2 2000 Early
1 2 2001 Early
1 2 2002 Mid
1 2 2003 Mid
1 2 2004 Late
1 2 2005 Late
2 1 2000 Early
2 1 2001 Mid
2 1 2002 Late
2 2 2000 Early
2 2 2001 Early
2 2 2002 Mid
2 2 2003 Mid
2 2 2004 Late
2 2 2005 Late
Where:
- Early = Position (or year) <= 1/3 of total timeseries (or number of rows)
- Mid = Position >= 1/3 and <= 2/3
- Late = Position >= 2/3 and <= 3/3
Here is a sample of my data (cols 1-3 of first 200 of 5,000 rows):
structure(list(Site = c("CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW",
"CT_CW", "CT_CW", "CT_CW", "IN_PM", "IN_PM", "IN_PM", "IN_PM",
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM",
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM",
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM",
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM",
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM"), Year = c(1901L,
1901L, 1902L, 1902L, 1903L, 1903L, 1904L, 1904L, 1905L, 1905L,
1906L, 1906L, 1907L, 1907L, 1908L, 1908L, 1909L, 1909L, 1910L,
1910L, 1911L, 1911L, 1912L, 1912L, 1913L, 1913L, 1914L, 1914L,
1915L, 1915L, 1916L, 1916L, 1917L, 1917L, 1918L, 1918L, 1919L,
1919L, 1920L, 1920L, 1921L, 1921L, 1922L, 1922L, 1923L, 1923L,
1924L, 1924L, 1925L, 1925L, 1926L, 1926L, 1927L, 1927L, 1928L,
1928L, 1929L, 1929L, 1930L, 1930L, 1931L, 1931L, 1932L, 1932L,
1933L, 1933L, 1934L, 1934L, 1935L, 1935L, 1936L, 1936L, 1937L,
1937L, 1938L, 1938L, 1939L, 1939L, 1940L, 1940L, 1941L, 1941L,
1942L, 1942L, 1943L, 1943L, 1944L, 1944L, 1945L, 1945L, 1946L,
1946L, 1947L, 1947L, 1948L, 1948L, 1949L, 1949L, 1950L, 1950L,
1951L, 1951L, 1952L, 1952L, 1953L, 1953L, 1954L, 1954L, 1955L,
1955L, 1956L, 1956L, 1957L, 1957L, 1958L, 1958L, 1959L, 1959L,
1960L, 1960L, 1961L, 1961L, 1962L, 1962L, 1963L, 1963L, 1964L,
1964L, 1965L, 1965L, 1966L, 1966L, 1967L, 1967L, 1968L, 1968L,
1969L, 1969L, 1970L, 1970L, 1971L, 1971L, 1972L, 1972L, 1973L,
1973L, 1974L, 1974L, 1975L, 1975L, 1976L, 1976L, 1977L, 1977L,
1978L, 1978L, 1979L, 1979L, 1980L, 1980L, 1981L, 1981L, 1901L,
1901L, 1902L, 1902L, 1903L, 1903L, 1904L, 1904L, 1905L, 1905L,
1906L, 1906L, 1907L, 1907L, 1908L, 1908L, 1909L, 1909L, 1910L,
1910L, 1911L, 1911L, 1912L, 1912L, 1913L, 1913L, 1914L, 1914L,
1915L, 1915L, 1916L, 1916L, 1917L, 1917L, 1918L, 1918L, 1919L,
1919L), Species = c("QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA",
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR",
"QA", "QR", "QA", "QR", "QA", "QR")), row.names = c(NA, 200L), class = "data.frame")
CodePudding user response:
We could group by 'Species', 'Site' and then create the Position
(either with quantile/cut
) or use case_when
(for custom cuts) to create the values based on logical expression
library(dplyr)
df1 %>%
arrange(Species, Site, Year) %>%
select(Species, Site, Year) %>%
group_by(Species, Site) %>%
mutate(Position =
case_when(row_number() <= n()/3 ~ 'Early',
between(row_number(), n()/3, 2*n()/3) ~ "Mid",
TRUE ~ "Late")) %>%
ungroup