I have 11 dataframes with various observations from seagrass surveys in the Chesapeake. Each dataframe contains the following variables (with example values included). There are 11 dataframes as each one represents observations from a single SAMPYR. So:
> head(density.2007)
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
1 HI 2 1.0 50 2006 2007 1 6.0
2 HI 5 0.5 100 2006 2007 1 11.6
3 HI 7 0.5 50 2006 2007 1 6.0
4 HI 9 0.5 100 2006 2007 1 9.6
5 HI 10 1.0 100 2006 2007 1 30.0
6 HI 23 1.0 50 2006 2007 1 40.4
> head(density.2008)
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS NOTES id
29 HI 1 1.0 100 2007 2008 1 39.6 29
30 HI 2 1.0 50 2006 2008 2 54.8 30
31 HI 3 0.5 100 2007 2008 1 11.2 31
32 HI 4 1.0 100 2007 2008 1 8.8 32
33 HI 5 0.5 100 2006 2008 2 24.0 33
34 HI 7 0.5 50 2006 2008 2 0.0 34
I would like to write a for loop that takes the number of unique characters from the PLOT column, and calculates the frequency of each one (so I can then filter so only those that appear more than once are listed).
What I have so far is:
density.names <- c("density.2007",
"density.2008",
"density.2009",
"density.2010",
"density.2011",
"density.2012",
"density.2013",
"density.2014",
"density.2015",
"density.2016",
"density.2017"
)
for(i in 1:length(density.names)) {
get(density.names[i]) %>%
count(PLOT) %>%
print()
}
This code outputs
print()
PLOT n
1 HI 1 1
2 HI 10 1
3 HI 100 1
4 HI 103 1
5 HI 104 1
6 HI 11 1
7 HI 13 1
8 HI 14 1
9 HI 15 1
10 HI 17 1
11 HI 18 1
12 HI 2 1
13 HI 20 1
14 HI 21 1
15 HI 23 1
16 HI 25 1
17 HI 27 1
18 HI 29 1
19 HI 3 1
20 HI 31 1
21 HI 32 1
22 HI 36 1
23 HI 37 1
24 HI 38 1
25 HI 39 1
26 HI 4 1
27 HI 40 1
But I can't do anything further with that. Is there a way for me to filter rows so only those with a n=2 show up? Or to print 11 dataframes from the for loop, so I can further manipulate them but at least I'll have a copy of them in the global environment?
Thank you! I can provide additional details if that helps.
CodePudding user response:
Don't do it in a loop !! It is done completely different. I'll show you step by step. My first step is to prepare a function that will generate data similar to yours.
library(tidyverse)
dens = function(year, n) tibble(
PLOT = paste("HI", sample(1:(n/7), n, replace = T)),
SIZE = runif(n, 0.1, 3),
DENSITY = sample(seq(50,200, by=50), n, replace = T),
SEEDYR = year-1,
SAMPYR = year,
AGE = sample(1:5, n, replace = T),
SHOOTS = runif(n, 0.1, 3)
)
Let's see how it works and generate some sample data frames
set.seed(123)
density.2007 = dens(2007, 120)
density.2008 = dens(2008, 88)
density.2009 = dens(2009, 135)
density.2010 = dens(2010, 156)
The density.2007
data frame looks like this
# A tibble: 120 x 7
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 HI 15 1.67 200 2006 2007 4 1.80
2 HI 14 0.270 150 2006 2007 2 2.44
3 HI 3 0.856 50 2006 2007 3 0.686
4 HI 10 1.25 200 2006 2007 5 1.43
5 HI 11 0.673 50 2006 2007 5 1.40
6 HI 5 2.51 150 2006 2007 3 2.23
7 HI 14 0.543 150 2006 2007 2 2.17
8 HI 5 2.43 200 2006 2007 5 2.51
9 HI 9 1.69 100 2006 2007 4 2.67
10 HI 3 2.02 50 2006 2007 2 2.86
# ... with 110 more rows
Now they need to be combined into one frame
df = density.2007 %>%
bind_rows(density.2008) %>%
bind_rows(density.2009) %>%
bind_rows(density.2010)
output
# A tibble: 499 x 7
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 HI 15 1.67 200 2006 2007 4 1.80
2 HI 14 0.270 150 2006 2007 2 2.44
3 HI 3 0.856 50 2006 2007 3 0.686
4 HI 10 1.25 200 2006 2007 5 1.43
5 HI 11 0.673 50 2006 2007 5 1.40
6 HI 5 2.51 150 2006 2007 3 2.23
7 HI 14 0.543 150 2006 2007 2 2.17
8 HI 5 2.43 200 2006 2007 5 2.51
9 HI 9 1.69 100 2006 2007 4 2.67
10 HI 3 2.02 50 2006 2007 2 2.86
# ... with 489 more rows
In the next step, count how many times each value of the PLOT
variable occurs
PLOT.count = df %>%
group_by(PLOT) %>%
summarise(PLOT.n = n()) %>%
arrange(PLOT.n)
ouptut
# A tibble: 22 x 2
PLOT PLOT.n
<chr> <int>
1 HI 20 3
2 HI 22 5
3 HI 21 7
4 HI 18 12
5 HI 2 19
6 HI 1 20
7 HI 15 20
8 HI 17 21
9 HI 6 22
10 HI 11 23
# ... with 12 more rows
In the penultimate step, let's append these counters to the original data frame
df = df %>% left_join(PLOT.count, by="PLOT")
output
# A tibble: 499 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 15 1.67 200 2006 2007 4 1.80 20
2 HI 14 0.270 150 2006 2007 2 2.44 32
3 HI 3 0.856 50 2006 2007 3 0.686 27
4 HI 10 1.25 200 2006 2007 5 1.43 25
5 HI 11 0.673 50 2006 2007 5 1.40 23
6 HI 5 2.51 150 2006 2007 3 2.23 38
7 HI 14 0.543 150 2006 2007 2 2.17 32
8 HI 5 2.43 200 2006 2007 5 2.51 38
9 HI 9 1.69 100 2006 2007 4 2.67 26
10 HI 3 2.02 50 2006 2007 2 2.86 27
# ... with 489 more rows
Now filter it at will
df %>% filter(PLOT.n > 30)
ouptut
# A tibble: 139 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 14 0.270 150 2006 2007 2 2.44 32
2 HI 5 2.51 150 2006 2007 3 2.23 38
3 HI 14 0.543 150 2006 2007 2 2.17 32
4 HI 5 2.43 200 2006 2007 5 2.51 38
5 HI 8 0.598 50 2006 2007 1 1.70 34
6 HI 7 1.94 50 2006 2007 4 1.61 35
7 HI 14 2.91 50 2006 2007 4 0.215 32
8 HI 7 0.846 150 2006 2007 4 0.506 35
9 HI 7 2.38 150 2006 2007 3 1.34 35
10 HI 7 2.62 100 2006 2007 3 0.167 35
# ... with 129 more rows
Or this way
df %>% filter(PLOT.n == min(PLOT.n))
df %>% filter(PLOT.n == median(PLOT.n))
df %>% filter(PLOT.n == max(PLOT.n))
output
# A tibble: 3 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 20 0.392 200 2009 2010 1 0.512 3
2 HI 20 0.859 150 2009 2010 5 2.62 3
3 HI 20 0.882 200 2009 2010 5 1.06 3
> df %>% filter(PLOT.n == median(PLOT.n))
# A tibble: 26 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 9 1.69 100 2006 2007 4 2.67 26
2 HI 9 2.20 50 2006 2007 4 1.49 26
3 HI 9 0.587 200 2006 2007 3 1.13 26
4 HI 9 1.27 50 2006 2007 1 2.55 26
5 HI 9 1.56 150 2006 2007 3 2.01 26
6 HI 9 0.198 100 2006 2007 3 2.08 26
7 HI 9 2.72 150 2007 2008 3 0.421 26
8 HI 9 0.251 200 2007 2008 2 0.328 26
9 HI 9 1.83 50 2007 2008 1 0.192 26
10 HI 9 1.97 100 2007 2008 1 0.900 26
# ... with 16 more rows
> df %>% filter(PLOT.n == max(PLOT.n))
# A tibble: 38 x 8
PLOT SIZE DENSITY SEEDYR SAMPYR AGE SHOOTS PLOT.n
<chr> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <int>
1 HI 5 2.51 150 2006 2007 3 2.23 38
2 HI 5 2.43 200 2006 2007 5 2.51 38
3 HI 5 2.06 100 2006 2007 5 1.93 38
4 HI 5 1.25 150 2006 2007 4 2.29 38
5 HI 5 2.29 200 2006 2007 1 2.97 38
6 HI 5 0.789 150 2006 2007 2 1.59 38
7 HI 5 1.11 100 2007 2008 4 2.61 38
8 HI 5 2.38 150 2007 2008 4 2.95 38
9 HI 5 2.67 200 2007 2008 3 1.77 38
10 HI 5 2.63 100 2007 2008 1 1.90 38
# ... with 28 more rows