Home > Mobile >  Add character variable to dataframe designating the portion of a timeseries each row falls within ov
Add character variable to dataframe designating the portion of a timeseries each row falls within ov

Time:02-13

I have a dataframe of annual values at multiple sites for two species. I need to add a character variable (we'll call this position) that designates the position of each row in the timeseries. The issue is that I have a number of sites (grouping variable), and the lengths of the timeseries of each site differ. If it helps, I would like to split each timeseries into thirds, so that the resulting data looks something like this (simplified):

Species  Site Year Position
   1       1  2000  Early
   1       1  2001  Mid
   1       1  2002  Late
   1       2  2000  Early
   1       2  2001  Early
   1       2  2002  Mid
   1       2  2003  Mid
   1       2  2004  Late
   1       2  2005  Late
   2       1  2000  Early
   2       1  2001  Mid
   2       1  2002  Late
   2       2  2000  Early
   2       2  2001  Early
   2       2  2002  Mid
   2       2  2003  Mid
   2       2  2004  Late
   2       2  2005  Late

Where:

  • Early = Position (or year) <= 1/3 of total timeseries (or number of rows)
  • Mid = Position >= 1/3 and <= 2/3
  • Late = Position >= 2/3 and <= 3/3

Here is a sample of my data (cols 1-3 of first 200 of 5,000 rows):

structure(list(Site = c("CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", "CT_CW", 
"CT_CW", "CT_CW", "CT_CW", "IN_PM", "IN_PM", "IN_PM", "IN_PM", 
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", 
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", 
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", 
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", 
"IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM", "IN_PM"), Year = c(1901L, 
1901L, 1902L, 1902L, 1903L, 1903L, 1904L, 1904L, 1905L, 1905L, 
1906L, 1906L, 1907L, 1907L, 1908L, 1908L, 1909L, 1909L, 1910L, 
1910L, 1911L, 1911L, 1912L, 1912L, 1913L, 1913L, 1914L, 1914L, 
1915L, 1915L, 1916L, 1916L, 1917L, 1917L, 1918L, 1918L, 1919L, 
1919L, 1920L, 1920L, 1921L, 1921L, 1922L, 1922L, 1923L, 1923L, 
1924L, 1924L, 1925L, 1925L, 1926L, 1926L, 1927L, 1927L, 1928L, 
1928L, 1929L, 1929L, 1930L, 1930L, 1931L, 1931L, 1932L, 1932L, 
1933L, 1933L, 1934L, 1934L, 1935L, 1935L, 1936L, 1936L, 1937L, 
1937L, 1938L, 1938L, 1939L, 1939L, 1940L, 1940L, 1941L, 1941L, 
1942L, 1942L, 1943L, 1943L, 1944L, 1944L, 1945L, 1945L, 1946L, 
1946L, 1947L, 1947L, 1948L, 1948L, 1949L, 1949L, 1950L, 1950L, 
1951L, 1951L, 1952L, 1952L, 1953L, 1953L, 1954L, 1954L, 1955L, 
1955L, 1956L, 1956L, 1957L, 1957L, 1958L, 1958L, 1959L, 1959L, 
1960L, 1960L, 1961L, 1961L, 1962L, 1962L, 1963L, 1963L, 1964L, 
1964L, 1965L, 1965L, 1966L, 1966L, 1967L, 1967L, 1968L, 1968L, 
1969L, 1969L, 1970L, 1970L, 1971L, 1971L, 1972L, 1972L, 1973L, 
1973L, 1974L, 1974L, 1975L, 1975L, 1976L, 1976L, 1977L, 1977L, 
1978L, 1978L, 1979L, 1979L, 1980L, 1980L, 1981L, 1981L, 1901L, 
1901L, 1902L, 1902L, 1903L, 1903L, 1904L, 1904L, 1905L, 1905L, 
1906L, 1906L, 1907L, 1907L, 1908L, 1908L, 1909L, 1909L, 1910L, 
1910L, 1911L, 1911L, 1912L, 1912L, 1913L, 1913L, 1914L, 1914L, 
1915L, 1915L, 1916L, 1916L, 1917L, 1917L, 1918L, 1918L, 1919L, 
1919L), Species = c("QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", 
"QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", "QA", "QR", 
"QA", "QR", "QA", "QR", "QA", "QR")), row.names = c(NA, 200L), class = "data.frame")

CodePudding user response:

We could group by 'Species', 'Site' and then create the Position (either with quantile/cut) or use case_when (for custom cuts) to create the values based on logical expression

library(dplyr)
df1 %>% 
 arrange(Species, Site, Year) %>% 
 select(Species, Site, Year) %>%
 group_by(Species, Site) %>% 
 mutate(Position = 
    case_when(row_number() <= n()/3 ~ 'Early', 
              between(row_number(), n()/3, 2*n()/3) ~ "Mid",
             TRUE ~ "Late")) %>%
 ungroup
  •  Tags:  
  • r
  • Related