I have an origin dataframe (oridf) with remote sensing sampling data as below. The reflection-values of blue, green, red were measured at about 1700 points on several days. (In the 'real' experimental setup the points were arranged in multiple rectangles - called 'spots' - each spot consisting of 6 rows with 18 points per row. On the whole there were 16 spots distributed among four sites.)
Each single sampling point can be identified by a combination of the following variables:
Site_ID ranging from A ot D
Spot_Nr ranging from 1 to 4
Row_Nr ranging from 1 to 6 and
Point_Nr ranging from 1 to 18
> str(oridf)
'data.frame': 16530 obs. of 8 variables:
$ Date : int 20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 ...
$ Site_ID : chr "A" "A" "A" "A" ...
$ Spot_Nr : chr "1" "1" "1" "1" ...
$ Row_Nr : chr "1" "1" "1" "2" ...
$ Point_Nr: chr "3" "4" "12" "1" ...
$ BLUE : num 244 538 1145 68 523 ...
$ GREEN : num 355 702 1298 218 665 ...
$ RED : num 483 641 1184 360 658 ...
The number of observations differs from point to point (e.g. because of cloud state). So, some points have more Date-entries than others. This led me to figured out
the three most frequently sampled points for each row at each spot at each site
by creating a ranking with dplyr
rankedNumbofObs_df <- oridf %>%
group_by(Site_ID, Spot_Nr, Row_Nr, Point_Nr) %>%
summarize(n = n()) %>%
group_by(Site_ID, Spot_Nr, Row_Nr) %>%
mutate(rank = rank(-n, ties.method = 'random'))%>%
filter(rank %in% 1:3) %>%
arrange(Site_ID, Spot_Nr, Row_Nr, rank)
with this result:
> rankedNumbofObs_df
# A tibble: 288 x 6
# Groups: Site_ID, Spot_Nr, Row_Nr [96]
Site_ID Spot_Nr Row_Nr Point_Nr n rank
<chr> <chr> <chr> <chr> <int> <int>
1 A 1 1 7 45 1
2 A 1 1 6 45 2
3 A 1 1 5 34 3
4 A 1 2 7 46 1
5 A 1 2 8 46 2
6 A 1 2 6 45 3
7 A 1 3 8 46 1
8 A 1 3 7 45 2
9 A 1 3 9 44 3
10 A 1 4 12 45 1
# ... with 278 more rows
Now I would like to filter oridf by rankedNumbofObs_df in order to get a subset containing all observations for only the top-3-points of each row according to the ranking result.
If there was just a single 'identifier' variable something like oridf %>% filter(idvar %in% rankedNumbofObs_df$idvar)
should be the solution.
But in my case it would need a combination of four variables (Site_ID Spot_Nr Row_Nr Point_Nr) for addressing the points right.
And that's the point I'm struggling now. My research for any according posts had no results.
May anybody help here? (Preferabely I would like just using dplyr
/ (tidyverse
))
Edit: dput
of oridf according to @deschen's comment:
> dput(oridf %>% slice_sample(n=250))
structure(list(Date = c(20190208L, 20190215L, 20180424L, 20190419L,
20180526L, 20190325L, 20180424L, 20180926L, 20190419L, 20181117L,
20191210L, 20180419L, 20190305L, 20180718L, 20181117L, 20190213L,
20180921L, 20180822L, 20180506L, 20190903L, 20180414L, 20180519L,
20190218L, 20180519L, 20180401L, 20190401L, 20180414L, 20181031L,
20181115L, 20190322L, 20180506L, 20190501L, 20190903L, 20180526L,
20190302L, 20181013L, 20181031L, 20190625L, 20180414L, 20191110L,
20190506L, 20190330L, 20190730L, 20190208L, 20190421L, 20190124L,
20190205L, 20190524L, 20180421L, 20180521L, 20180610L, 20181115L,
20191110L, 20190223L, 20190421L, 20190419L, 20180419L, 20190330L,
20190320L, 20191110L, 20190302L, 20180725L, 20181021L, 20180827L,
20181013L, 20190916L, 20181031L, 20190320L, 20190524L, 20180715L,
20180620L, 20191130L, 20180824L, 20180725L, 20180615L, 20181031L,
20190506L, 20180926L, 20180610L, 20180620L, 20191130L, 20180926L,
20190501L, 20190213L, 20190705L, 20191210L, 20180506L, 20190710L,
20191210L, 20191210L, 20190401L, 20180506L, 20190330L, 20191110L,
20180325L, 20190501L, 20180928L, 20180401L, 20180610L, 20191023L,
20190215L, 20180414L, 20190401L, 20180414L, 20180918L, 20180419L,
20180526L, 20181031L, 20180401L, 20180419L, 20190325L, 20180615L,
20180506L, 20180526L, 20190416L, 20181003L, 20190705L, 20180819L,
20181031L, 20180406L, 20180421L, 20191130L, 20180419L, 20190921L,
20190501L, 20190705L, 20180521L, 20180322L, 20190305L, 20181115L,
20191026L, 20180730L, 20180506L, 20190325L, 20190903L, 20180424L,
20180419L, 20181031L, 20190320L, 20180519L, 20181115L, 20190215L,
20180725L, 20180521L, 20180401L, 20180414L, 20180615L, 20190625L,
20180421L, 20190218L, 20180521L, 20190208L, 20190401L, 20181117L,
20190506L, 20190218L, 20190228L, 20180623L, 20180421L, 20190827L,
20180521L, 20180526L, 20190228L, 20191023L, 20190421L, 20191210L,
20181031L, 20180827L, 20191130L, 20191026L, 20180322L, 20190330L,
20180419L, 20190322L, 20180715L, 20190723L, 20191130L, 20181117L,
20180521L, 20180506L, 20180401L, 20190730L, 20180414L, 20180421L,
20180424L, 20180401L, 20180526L, 20180804L, 20190205L, 20180715L,
20180401L, 20180506L, 20180414L, 20180322L, 20180526L, 20190325L,
20180713L, 20190531L, 20190322L, 20180819L, 20180421L, 20180426L,
20190531L, 20190320L, 20180414L, 20180414L, 20180526L, 20191023L,
20190305L, 20180531L, 20180715L, 20180506L, 20180401L, 20180509L,
20180414L, 20180630L, 20190305L, 20180817L, 20190208L, 20181115L,
20190809L, 20180908L, 20180424L, 20190302L, 20180630L, 20180419L,
20180730L, 20190330L, 20181115L, 20180419L, 20180804L, 20180419L,
20180414L, 20180421L, 20180424L, 20191110L, 20180421L, 20180421L,
20180406L, 20190921L, 20180526L, 20190630L, 20180421L, 20180419L,
20180605L, 20190223L, 20181115L, 20190620L, 20180424L, 20190223L
), Site_ID = c("A", "A", "D", "A", "C", "A", "D", "B", "A", "B",
"A", "B", "A", "D", "D", "A", "B", "D", "C", "A", "C", "D", "A",
"D", "D", "A", "B", "C", "B", "A", "B", "A", "A", "D", "A", "D",
"B", "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B",
"D", "D", "D", "A", "A", "A", "A", "D", "A", "A", "A", "A", "D",
"D", "B", "B", "A", "B", "A", "A", "D", "C", "A", "B", "D", "D",
"B", "A", "D", "D", "B", "A", "B", "A", "A", "A", "A", "D", "A",
"A", "A", "A", "C", "A", "A", "D", "A", "B", "D", "D", "A", "A",
"D", "A", "D", "B", "C", "C", "B", "D", "B", "A", "B", "B", "D",
"A", "D", "A", "D", "C", "B", "C", "A", "B", "A", "A", "A", "B",
"D", "A", "B", "A", "B", "D", "A", "A", "C", "B", "B", "A", "D",
"D", "A", "D", "B", "D", "C", "B", "A", "C", "A", "B", "A", "A",
"D", "A", "A", "A", "D", "D", "A", "C", "D", "A", "A", "A", "A",
"B", "B", "A", "A", "B", "A", "C", "A", "D", "A", "A", "D", "D",
"D", "D", "A", "B", "D", "C", "D", "C", "D", "A", "D", "D", "B",
"B", "D", "C", "A", "D", "A", "A", "D", "C", "D", "A", "A", "D",
"C", "C", "A", "A", "B", "D", "C", "D", "D", "D", "D", "A", "B",
"A", "D", "A", "B", "C", "A", "D", "D", "B", "A", "B", "D", "D",
"B", "C", "D", "D", "A", "C", "B", "D", "A", "C", "A", "D", "D",
"C", "A", "B", "A", "C", "A"), Spot_Nr = c("1", "1", "2", "4",
"4", "1", "2", "2", "2", "4", "3", "1", "4", "4", "3", "3", "2",
"2", "4", "1", "3", "2", "3", "3", "2", "4", "4", "4", "2", "1",
"2", "3", "3", "4", "4", "4", "4", "1", "1", "4", "4", "3", "4",
"3", "1", "4", "2", "1", "1", "3", "4", "1", "4", "1", "4", "2",
"1", "4", "2", "3", "2", "4", "3", "2", "1", "2", "4", "1", "3",
"4", "2", "4", "2", "2", "4", "1", "3", "4", "2", "2", "2", "2",
"1", "3", "4", "1", "2", "1", "1", "1", "1", "1", "4", "2", "2",
"4", "2", "3", "3", "2", "1", "3", "1", "3", "2", "1", "3", "4",
"2", "4", "3", "1", "1", "4", "1", "3", "1", "2", "2", "3", "2",
"1", "1", "3", "1", "1", "1", "2", "3", "3", "4", "2", "1", "1",
"1", "1", "4", "2", "1", "4", "2", "4", "2", "1", "3", "2", "2",
"3", "3", "3", "3", "2", "1", "1", "3", "4", "4", "2", "1", "1",
"2", "4", "4", "1", "2", "4", "4", "1", "4", "4", "3", "4", "1",
"2", "2", "4", "3", "1", "1", "1", "1", "1", "1", "1", "3", "1",
"2", "2", "3", "3", "2", "3", "4", "2", "1", "1", "4", "2", "1",
"4", "1", "1", "1", "1", "3", "1", "2", "1", "4", "4", "2", "3",
"1", "2", "2", "2", "1", "2", "4", "4", "1", "2", "3", "4", "4",
"4", "1", "2", "1", "3", "2", "3", "1", "3", "3", "4", "3", "2",
"2", "2", "3", "4", "4", "3", "1", "3", "2", "4", "4", "1"),
Row_Nr = c("4", "4", "5", "5", "5", "3", "6", "1", "2", "2",
"3", "3", "4", "5", "4", "1", "5", "1", "6", "3", "2", "1",
"6", "5", "1", "2", "6", "2", "5", "6", "5", "5", "2", "4",
"1", "5", "2", "2", "2", "1", "6", "3", "3", "6", "4", "3",
"4", "4", "6", "6", "6", "3", "5", "5", "3", "1", "5", "6",
"5", "4", "1", "4", "4", "6", "3", "6", "6", "5", "4", "2",
"6", "6", "2", "6", "3", "1", "2", "4", "6", "6", "4", "6",
"5", "6", "4", "5", "5", "6", "6", "5", "6", "1", "1", "5",
"1", "2", "5", "6", "1", "4", "3", "2", "3", "2", "6", "3",
"2", "3", "2", "6", "3", "5", "4", "5", "6", "3", "6", "1",
"4", "2", "5", "2", "5", "1", "4", "3", "5", "4", "1", "1",
"2", "4", "3", "1", "6", "4", "4", "1", "2", "2", "3", "1",
"3", "5", "3", "4", "2", "4", "1", "5", "2", "3", "1", "4",
"1", "5", "6", "5", "5", "3", "2", "5", "3", "4", "6", "6",
"5", "4", "6", "3", "3", "5", "6", "1", "6", "6", "4", "1",
"4", "3", "2", "4", "3", "5", "4", "1", "5", "6", "1", "3",
"2", "2", "6", "5", "4", "4", "4", "1", "5", "5", "2", "5",
"3", "4", "5", "4", "1", "5", "3", "1", "3", "3", "3", "1",
"1", "6", "1", "4", "4", "6", "6", "6", "1", "3", "3", "2",
"4", "4", "2", "2", "4", "6", "6", "4", "4", "4", "2", "1",
"4", "5", "1", "6", "5", "3", "1", "4", "3", "3", "3", "4"
), Point_Nr = c("14", "5", "18", "4", "11", "10", "10", "10",
"2", "5", "8", "7", "11", "10", "7", "17", "13", "7", "11",
"8", "18", "10", "12", "7", "15", "10", "8", "15", "14",
"5", "11", "7", "8", "11", "10", "10", "9", "8", "12", "2",
"6", "12", "12", "12", "9", "8", "3", "14", "13", "9", "14",
"4", "5", "7", "3", "3", "13", "6", "5", "6", "4", "9", "4",
"12", "13", "8", "15", "11", "5", "6", "10", "7", "10", "13",
"8", "12", "10", "10", "11", "14", "1", "10", "10", "16",
"13", "12", "13", "15", "7", "6", "8", "12", "1", "6", "11",
"12", "12", "9", "10", "7", "13", "6", "7", "12", "12", "10",
"6", "14", "7", "8", "13", "12", "6", "10", "5", "10", "10",
"7", "15", "15", "9", "7", "9", "10", "12", "15", "14", "11",
"17", "7", "10", "11", "9", "8", "16", "9", "10", "3", "5",
"5", "7", "10", "8", "13", "5", "3", "9", "6", "1", "11",
"15", "4", "11", "7", "13", "4", "10", "14", "13", "11",
"8", "11", "7", "9", "8", "6", "15", "14", "5", "10", "8",
"5", "13", "5", "12", "14", "7", "6", "9", "11", "2", "9",
"14", "7", "3", "7", "8", "11", "14", "10", "9", "14", "9",
"14", "7", "5", "10", "5", "8", "12", "13", "10", "9", "4",
"4", "7", "9", "12", "9", "8", "12", "7", "7", "12", "18",
"11", "4", "10", "1", "15", "15", "14", "5", "8", "8", "6",
"13", "8", "14", "9", "12", "13", "11", "5", "8", "4", "6",
"8", "15", "8", "7", "13", "11", "5", "11", "13", "15", "13",
"11", "12"), BLUE = c(1390, 667, 1847, 1243, 1269, 1137,
2156, 780, 1336, 978, 1022, 2423, 891, 793, 1317, 1129, 1097,
763, 1754, 1045, 1875, 1432, 1030, 1383, 2074, 850, 2230,
694, 1685, 820, 2040, 2249, 878, 950, 955, 455, 1712, 1303,
2164, 2170, 2141, 1298, 1204, 1017, 1274, 592, 2198, 2539,
2553, 946, 1090, 1441, 2137, 887, 1125, 1298, 2007, 786,
1028, 1997, 940, 640, 894, 1318, 829, 1509, 1549, 1148, 1996,
802, 1220, 2483, 797, 345, 940, 1713, 2347, 251, 1131, 1140,
2936, 543, 2023, 1385, 915, 2135, 1564, 1215, 1530, 1736,
899, 1520, 897, 2399, 1208, 1840, 686, 2043, 1339, 1088,
977, 2022, 1002, 1963, 815, 2375, 1256, 977, 1992, 2216,
968, 932, 1633, 932, 1094, 748, 730, 800, 826, 1940, 2367,
2898, 1917, 1226, 2092, 892, 1394, 860, 1171, 1620, 741,
747, 1568, 1127, 1158, 2267, 1682, 1795, 1083, 1526, 1645,
963, 795, 1335, 2036, 2211, 918, 1099, 2168, 1493, 1840,
1149, 1085, 1602, 2021, 855, 925, 805, 1845, 1781, 1235,
940, 922, 1168, 1177, 1456, 1420, 1336, 2473, 804, 1076,
777, 1880, 1175, 408, 1258, 996, 1315, 820, 1489, 2103, 1222,
2129, 1632, 2123, 2107, 1254, 443, 1985, 1031, 2091, 1976,
2097, 817, 835, 950, 842, 2747, 1057, 643, 1771, 2007, 2711,
801, 1951, 2060, 1267, 1135, 841, 1232, 521, 1685, 2103,
1250, 2007, 742, 655, 735, 1218, 1674, 1065, 1235, 1997,
753, 906, 2057, 868, 1191, 1284, 2151, 394, 2616, 2236, 1818,
2118, 2225, 1929, 2470, 1585, 1285, 1245, 1034, 1916, 2157,
1084, 955, 1701, 1207, 2355, 1205), GREEN = c(1477, 769,
2181, 1562, 1556, 1390, 2549, 1167, 1552, 1082, 1151, 2767,
1330, 1244, 1438, 1241, 1465, 1052, 1921, 1288, 2269, 1706,
1242, 1699, 2595, 1325, 2682, 793, 1840, 992, 2171, 2381,
1080, 1273, 1433, 789, 2192, 1498, 2576, 2434, 2210, 1536,
1589, 1040, 1510, 861, 2221, 2604, 2755, 1136, 1387, 1603,
2420, 1183, 1473, 1441, 2463, 1196, 1290, 2354, 1106, 1018,
1010, 1579, 1099, 1753, 2013, 1419, 2089, 1193, 1528, 2667,
1181, 798, 1266, 2204, 2473, 624, 1391, 1472, 3183, 942,
2152, 1532, 1298, 2048, 1778, 1394, 1519, 1699, 1266, 1769,
1357, 2717, 1409, 1962, 1056, 2561, 1617, 1235, 1087, 2609,
1361, 2519, 1133, 2728, 1574, 1311, 2477, 2473, 1196, 1289,
1786, 1212, 1217, 1113, 920, 1089, 946, 2062, 2742, 3177,
2132, 1453, 2186, 1057, 1592, 1093, 1349, 1749, 1040, 1080,
1776, 1418, 1403, 2689, 1976, 2280, 1249, 1770, 1885, 1268,
1184, 1564, 2510, 2711, 1143, 1295, 2506, 1705, 1933, 1176,
1298, 1718, 2120, 1200, 1260, 1274, 2243, 2137, 1450, 1219,
1247, 1342, 1374, 1462, 1843, 1584, 2636, 1128, 1225, 1207,
2162, 1381, 827, 1779, 1147, 1466, 1108, 1746, 2622, 1534,
2486, 2002, 2408, 2649, 1529, 810, 2145, 1415, 2580, 2118,
2450, 1109, 1184, 1123, 1247, 3077, 1407, 981, 2001, 2351,
3034, 970, 2509, 2543, 1544, 1328, 1245, 1484, 1045, 1817,
2644, 1562, 2596, 1110, 761, 1079, 1337, 1884, 1302, 1467,
2296, 1142, 1256, 2596, 1231, 1458, 1474, 2704, 800, 3005,
2749, 2196, 2553, 2491, 2169, 2755, 1769, 1481, 1593, 1449,
2375, 2672, 1403, 1205, 1886, 1570, 2774, 1367), RED = c(1177,
596, 1864, 1357, 1240, 1111, 1954, 1024, 1187, 748, 1290,
2023, 963, 1025, 1120, 1116, 1401, 923, 1366, 1014, 2423,
1494, 730, 1594, 2899, 975, 2585, 851, 1317, 685, 1304, 1551,
638, 825, 1154, 509, 2637, 1166, 2474, 2222, 1688, 1203,
1450, 572, 1388, 616, 1376, 1617, 1877, 920, 1153, 1387,
2198, 832, 1273, 1009, 2418, 863, 794, 2215, 713, 876, 966,
1182, 1013, 1645, 2416, 1100, 1333, 1082, 1449, 1842, 934,
380, 1134, 2639, 1647, 395, 986, 1400, 2210, 733, 1617, 1319,
1198, 1164, 1060, 1068, 1294, 1022, 983, 1306, 1035, 2377,
924, 1600, 936, 2900, 1449, 965, 843, 2958, 1087, 2882, 984,
2142, 1581, 1422, 2835, 1965, 815, 1170, 1084, 806, 937,
988, 940, 972, 1042, 1680, 2075, 2235, 1914, 1372, 1590,
1011, 1088, 727, 1332, 1449, 920, 909, 1114, 1267, 1232,
2132, 1934, 2769, 848, 1232, 1142, 1088, 1067, 1061, 2841,
2795, 754, 987, 2033, 1442, 1088, 601, 1001, 1836, 1663,
867, 888, 961, 2100, 1894, 1385, 771, 892, 1185, 984, 979,
2184, 1393, 1875, 1036, 1102, 881, 2103, 932, 405, 1687,
1130, 1231, 701, 1218, 2961, 1308, 2548, 1873, 2140, 3024,
1377, 522, 1590, 1254, 2933, 1372, 2406, 775, 912, 786, 1040,
2207, 1045, 823, 1775, 1714, 2206, 775, 2868, 2587, 1548,
1136, 871, 1194, 626, 1438, 3057, 1325, 2891, 746, 617, 868,
1213, 1465, 1068, 1446, 2185, 757, 1190, 2398, 1115, 1092,
1347, 2529, 492, 2250, 2760, 1944, 1947, 2247, 1909, 1985,
1021, 1322, 1618, 1400, 2041, 2435, 1192, 730, 1329, 1408,
2277, 1066)), row.names = c(NA, -250L), class = "data.frame")
CodePudding user response:
You may inner_join
the filtered data with the original data.
library(dplyr)
oridf %>%
count(Site_ID, Spot_Nr, Row_Nr, Point_Nr) %>%
group_by(Site_ID, Spot_Nr, Row_Nr) %>%
mutate(rank = rank(-n, ties.method = 'random')) %>%
filter(rank %in% 1:3) %>%
arrange(Site_ID, Spot_Nr, Row_Nr, rank) %>%
ungroup %>%
inner_join(oridf, by = c("Site_ID", "Spot_Nr", "Row_Nr", "Point_Nr"))