Home > Mobile >  dplyr: Filtering a dataframe by a second dataframe/tibble with having a combination of 4 variables a
dplyr: Filtering a dataframe by a second dataframe/tibble with having a combination of 4 variables a

Time:09-25

I have an origin dataframe (oridf) with remote sensing sampling data as below. The reflection-values of blue, green, red were measured at about 1700 points on several days. (In the 'real' experimental setup the points were arranged in multiple rectangles - called 'spots' - each spot consisting of 6 rows with 18 points per row. On the whole there were 16 spots distributed among four sites.) Each single sampling point can be identified by a combination of the following variables:
Site_ID ranging from A ot D
Spot_Nr ranging from 1 to 4
Row_Nr ranging from 1 to 6 and
Point_Nr ranging from 1 to 18

> str(oridf)
'data.frame':   16530 obs. of  8 variables:
 $ Date    : int  20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 20190124 ...
 $ Site_ID : chr  "A" "A" "A" "A" ...
 $ Spot_Nr : chr  "1" "1" "1" "1" ...
 $ Row_Nr  : chr  "1" "1" "1" "2" ...
 $ Point_Nr: chr  "3" "4" "12" "1" ...
 $ BLUE    : num  244 538 1145 68 523 ...
 $ GREEN   : num  355 702 1298 218 665 ...
 $ RED     : num  483 641 1184 360 658 ...

The number of observations differs from point to point (e.g. because of cloud state). So, some points have more Date-entries than others. This led me to figured out
the three most frequently sampled points for each row at each spot at each site
by creating a ranking with dplyr

rankedNumbofObs_df <- oridf %>%
  group_by(Site_ID, Spot_Nr, Row_Nr, Point_Nr) %>%
  summarize(n = n()) %>%
  group_by(Site_ID, Spot_Nr, Row_Nr) %>%
  mutate(rank = rank(-n, ties.method = 'random'))%>%
  filter(rank %in% 1:3) %>%
  arrange(Site_ID, Spot_Nr, Row_Nr, rank)

with this result:

> rankedNumbofObs_df
# A tibble: 288 x 6
# Groups:   Site_ID, Spot_Nr, Row_Nr [96]
   Site_ID Spot_Nr Row_Nr Point_Nr     n  rank
   <chr>   <chr>   <chr>  <chr>    <int> <int>
 1 A       1       1      7           45     1
 2 A       1       1      6           45     2
 3 A       1       1      5           34     3
 4 A       1       2      7           46     1
 5 A       1       2      8           46     2
 6 A       1       2      6           45     3
 7 A       1       3      8           46     1
 8 A       1       3      7           45     2
 9 A       1       3      9           44     3
10 A       1       4      12          45     1
# ... with 278 more rows

Now I would like to filter oridf by rankedNumbofObs_df in order to get a subset containing all observations for only the top-3-points of each row according to the ranking result.
If there was just a single 'identifier' variable something like oridf %>% filter(idvar %in% rankedNumbofObs_df$idvar) should be the solution. But in my case it would need a combination of four variables (Site_ID Spot_Nr Row_Nr Point_Nr) for addressing the points right.

And that's the point I'm struggling now. My research for any according posts had no results. May anybody help here? (Preferabely I would like just using dplyr / (tidyverse))

Edit: dput of oridf according to @deschen's comment:

> dput(oridf %>% slice_sample(n=250))
structure(list(Date = c(20190208L, 20190215L, 20180424L, 20190419L, 
20180526L, 20190325L, 20180424L, 20180926L, 20190419L, 20181117L, 
20191210L, 20180419L, 20190305L, 20180718L, 20181117L, 20190213L, 
20180921L, 20180822L, 20180506L, 20190903L, 20180414L, 20180519L, 
20190218L, 20180519L, 20180401L, 20190401L, 20180414L, 20181031L, 
20181115L, 20190322L, 20180506L, 20190501L, 20190903L, 20180526L, 
20190302L, 20181013L, 20181031L, 20190625L, 20180414L, 20191110L, 
20190506L, 20190330L, 20190730L, 20190208L, 20190421L, 20190124L, 
20190205L, 20190524L, 20180421L, 20180521L, 20180610L, 20181115L, 
20191110L, 20190223L, 20190421L, 20190419L, 20180419L, 20190330L, 
20190320L, 20191110L, 20190302L, 20180725L, 20181021L, 20180827L, 
20181013L, 20190916L, 20181031L, 20190320L, 20190524L, 20180715L, 
20180620L, 20191130L, 20180824L, 20180725L, 20180615L, 20181031L, 
20190506L, 20180926L, 20180610L, 20180620L, 20191130L, 20180926L, 
20190501L, 20190213L, 20190705L, 20191210L, 20180506L, 20190710L, 
20191210L, 20191210L, 20190401L, 20180506L, 20190330L, 20191110L, 
20180325L, 20190501L, 20180928L, 20180401L, 20180610L, 20191023L, 
20190215L, 20180414L, 20190401L, 20180414L, 20180918L, 20180419L, 
20180526L, 20181031L, 20180401L, 20180419L, 20190325L, 20180615L, 
20180506L, 20180526L, 20190416L, 20181003L, 20190705L, 20180819L, 
20181031L, 20180406L, 20180421L, 20191130L, 20180419L, 20190921L, 
20190501L, 20190705L, 20180521L, 20180322L, 20190305L, 20181115L, 
20191026L, 20180730L, 20180506L, 20190325L, 20190903L, 20180424L, 
20180419L, 20181031L, 20190320L, 20180519L, 20181115L, 20190215L, 
20180725L, 20180521L, 20180401L, 20180414L, 20180615L, 20190625L, 
20180421L, 20190218L, 20180521L, 20190208L, 20190401L, 20181117L, 
20190506L, 20190218L, 20190228L, 20180623L, 20180421L, 20190827L, 
20180521L, 20180526L, 20190228L, 20191023L, 20190421L, 20191210L, 
20181031L, 20180827L, 20191130L, 20191026L, 20180322L, 20190330L, 
20180419L, 20190322L, 20180715L, 20190723L, 20191130L, 20181117L, 
20180521L, 20180506L, 20180401L, 20190730L, 20180414L, 20180421L, 
20180424L, 20180401L, 20180526L, 20180804L, 20190205L, 20180715L, 
20180401L, 20180506L, 20180414L, 20180322L, 20180526L, 20190325L, 
20180713L, 20190531L, 20190322L, 20180819L, 20180421L, 20180426L, 
20190531L, 20190320L, 20180414L, 20180414L, 20180526L, 20191023L, 
20190305L, 20180531L, 20180715L, 20180506L, 20180401L, 20180509L, 
20180414L, 20180630L, 20190305L, 20180817L, 20190208L, 20181115L, 
20190809L, 20180908L, 20180424L, 20190302L, 20180630L, 20180419L, 
20180730L, 20190330L, 20181115L, 20180419L, 20180804L, 20180419L, 
20180414L, 20180421L, 20180424L, 20191110L, 20180421L, 20180421L, 
20180406L, 20190921L, 20180526L, 20190630L, 20180421L, 20180419L, 
20180605L, 20190223L, 20181115L, 20190620L, 20180424L, 20190223L
), Site_ID = c("A", "A", "D", "A", "C", "A", "D", "B", "A", "B", 
"A", "B", "A", "D", "D", "A", "B", "D", "C", "A", "C", "D", "A", 
"D", "D", "A", "B", "C", "B", "A", "B", "A", "A", "D", "A", "D", 
"B", "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "B", 
"D", "D", "D", "A", "A", "A", "A", "D", "A", "A", "A", "A", "D", 
"D", "B", "B", "A", "B", "A", "A", "D", "C", "A", "B", "D", "D", 
"B", "A", "D", "D", "B", "A", "B", "A", "A", "A", "A", "D", "A", 
"A", "A", "A", "C", "A", "A", "D", "A", "B", "D", "D", "A", "A", 
"D", "A", "D", "B", "C", "C", "B", "D", "B", "A", "B", "B", "D", 
"A", "D", "A", "D", "C", "B", "C", "A", "B", "A", "A", "A", "B", 
"D", "A", "B", "A", "B", "D", "A", "A", "C", "B", "B", "A", "D", 
"D", "A", "D", "B", "D", "C", "B", "A", "C", "A", "B", "A", "A", 
"D", "A", "A", "A", "D", "D", "A", "C", "D", "A", "A", "A", "A", 
"B", "B", "A", "A", "B", "A", "C", "A", "D", "A", "A", "D", "D", 
"D", "D", "A", "B", "D", "C", "D", "C", "D", "A", "D", "D", "B", 
"B", "D", "C", "A", "D", "A", "A", "D", "C", "D", "A", "A", "D", 
"C", "C", "A", "A", "B", "D", "C", "D", "D", "D", "D", "A", "B", 
"A", "D", "A", "B", "C", "A", "D", "D", "B", "A", "B", "D", "D", 
"B", "C", "D", "D", "A", "C", "B", "D", "A", "C", "A", "D", "D", 
"C", "A", "B", "A", "C", "A"), Spot_Nr = c("1", "1", "2", "4", 
"4", "1", "2", "2", "2", "4", "3", "1", "4", "4", "3", "3", "2", 
"2", "4", "1", "3", "2", "3", "3", "2", "4", "4", "4", "2", "1", 
"2", "3", "3", "4", "4", "4", "4", "1", "1", "4", "4", "3", "4", 
"3", "1", "4", "2", "1", "1", "3", "4", "1", "4", "1", "4", "2", 
"1", "4", "2", "3", "2", "4", "3", "2", "1", "2", "4", "1", "3", 
"4", "2", "4", "2", "2", "4", "1", "3", "4", "2", "2", "2", "2", 
"1", "3", "4", "1", "2", "1", "1", "1", "1", "1", "4", "2", "2", 
"4", "2", "3", "3", "2", "1", "3", "1", "3", "2", "1", "3", "4", 
"2", "4", "3", "1", "1", "4", "1", "3", "1", "2", "2", "3", "2", 
"1", "1", "3", "1", "1", "1", "2", "3", "3", "4", "2", "1", "1", 
"1", "1", "4", "2", "1", "4", "2", "4", "2", "1", "3", "2", "2", 
"3", "3", "3", "3", "2", "1", "1", "3", "4", "4", "2", "1", "1", 
"2", "4", "4", "1", "2", "4", "4", "1", "4", "4", "3", "4", "1", 
"2", "2", "4", "3", "1", "1", "1", "1", "1", "1", "1", "3", "1", 
"2", "2", "3", "3", "2", "3", "4", "2", "1", "1", "4", "2", "1", 
"4", "1", "1", "1", "1", "3", "1", "2", "1", "4", "4", "2", "3", 
"1", "2", "2", "2", "1", "2", "4", "4", "1", "2", "3", "4", "4", 
"4", "1", "2", "1", "3", "2", "3", "1", "3", "3", "4", "3", "2", 
"2", "2", "3", "4", "4", "3", "1", "3", "2", "4", "4", "1"), 
    Row_Nr = c("4", "4", "5", "5", "5", "3", "6", "1", "2", "2", 
    "3", "3", "4", "5", "4", "1", "5", "1", "6", "3", "2", "1", 
    "6", "5", "1", "2", "6", "2", "5", "6", "5", "5", "2", "4", 
    "1", "5", "2", "2", "2", "1", "6", "3", "3", "6", "4", "3", 
    "4", "4", "6", "6", "6", "3", "5", "5", "3", "1", "5", "6", 
    "5", "4", "1", "4", "4", "6", "3", "6", "6", "5", "4", "2", 
    "6", "6", "2", "6", "3", "1", "2", "4", "6", "6", "4", "6", 
    "5", "6", "4", "5", "5", "6", "6", "5", "6", "1", "1", "5", 
    "1", "2", "5", "6", "1", "4", "3", "2", "3", "2", "6", "3", 
    "2", "3", "2", "6", "3", "5", "4", "5", "6", "3", "6", "1", 
    "4", "2", "5", "2", "5", "1", "4", "3", "5", "4", "1", "1", 
    "2", "4", "3", "1", "6", "4", "4", "1", "2", "2", "3", "1", 
    "3", "5", "3", "4", "2", "4", "1", "5", "2", "3", "1", "4", 
    "1", "5", "6", "5", "5", "3", "2", "5", "3", "4", "6", "6", 
    "5", "4", "6", "3", "3", "5", "6", "1", "6", "6", "4", "1", 
    "4", "3", "2", "4", "3", "5", "4", "1", "5", "6", "1", "3", 
    "2", "2", "6", "5", "4", "4", "4", "1", "5", "5", "2", "5", 
    "3", "4", "5", "4", "1", "5", "3", "1", "3", "3", "3", "1", 
    "1", "6", "1", "4", "4", "6", "6", "6", "1", "3", "3", "2", 
    "4", "4", "2", "2", "4", "6", "6", "4", "4", "4", "2", "1", 
    "4", "5", "1", "6", "5", "3", "1", "4", "3", "3", "3", "4"
    ), Point_Nr = c("14", "5", "18", "4", "11", "10", "10", "10", 
    "2", "5", "8", "7", "11", "10", "7", "17", "13", "7", "11", 
    "8", "18", "10", "12", "7", "15", "10", "8", "15", "14", 
    "5", "11", "7", "8", "11", "10", "10", "9", "8", "12", "2", 
    "6", "12", "12", "12", "9", "8", "3", "14", "13", "9", "14", 
    "4", "5", "7", "3", "3", "13", "6", "5", "6", "4", "9", "4", 
    "12", "13", "8", "15", "11", "5", "6", "10", "7", "10", "13", 
    "8", "12", "10", "10", "11", "14", "1", "10", "10", "16", 
    "13", "12", "13", "15", "7", "6", "8", "12", "1", "6", "11", 
    "12", "12", "9", "10", "7", "13", "6", "7", "12", "12", "10", 
    "6", "14", "7", "8", "13", "12", "6", "10", "5", "10", "10", 
    "7", "15", "15", "9", "7", "9", "10", "12", "15", "14", "11", 
    "17", "7", "10", "11", "9", "8", "16", "9", "10", "3", "5", 
    "5", "7", "10", "8", "13", "5", "3", "9", "6", "1", "11", 
    "15", "4", "11", "7", "13", "4", "10", "14", "13", "11", 
    "8", "11", "7", "9", "8", "6", "15", "14", "5", "10", "8", 
    "5", "13", "5", "12", "14", "7", "6", "9", "11", "2", "9", 
    "14", "7", "3", "7", "8", "11", "14", "10", "9", "14", "9", 
    "14", "7", "5", "10", "5", "8", "12", "13", "10", "9", "4", 
    "4", "7", "9", "12", "9", "8", "12", "7", "7", "12", "18", 
    "11", "4", "10", "1", "15", "15", "14", "5", "8", "8", "6", 
    "13", "8", "14", "9", "12", "13", "11", "5", "8", "4", "6", 
    "8", "15", "8", "7", "13", "11", "5", "11", "13", "15", "13", 
    "11", "12"), BLUE = c(1390, 667, 1847, 1243, 1269, 1137, 
    2156, 780, 1336, 978, 1022, 2423, 891, 793, 1317, 1129, 1097, 
    763, 1754, 1045, 1875, 1432, 1030, 1383, 2074, 850, 2230, 
    694, 1685, 820, 2040, 2249, 878, 950, 955, 455, 1712, 1303, 
    2164, 2170, 2141, 1298, 1204, 1017, 1274, 592, 2198, 2539, 
    2553, 946, 1090, 1441, 2137, 887, 1125, 1298, 2007, 786, 
    1028, 1997, 940, 640, 894, 1318, 829, 1509, 1549, 1148, 1996, 
    802, 1220, 2483, 797, 345, 940, 1713, 2347, 251, 1131, 1140, 
    2936, 543, 2023, 1385, 915, 2135, 1564, 1215, 1530, 1736, 
    899, 1520, 897, 2399, 1208, 1840, 686, 2043, 1339, 1088, 
    977, 2022, 1002, 1963, 815, 2375, 1256, 977, 1992, 2216, 
    968, 932, 1633, 932, 1094, 748, 730, 800, 826, 1940, 2367, 
    2898, 1917, 1226, 2092, 892, 1394, 860, 1171, 1620, 741, 
    747, 1568, 1127, 1158, 2267, 1682, 1795, 1083, 1526, 1645, 
    963, 795, 1335, 2036, 2211, 918, 1099, 2168, 1493, 1840, 
    1149, 1085, 1602, 2021, 855, 925, 805, 1845, 1781, 1235, 
    940, 922, 1168, 1177, 1456, 1420, 1336, 2473, 804, 1076, 
    777, 1880, 1175, 408, 1258, 996, 1315, 820, 1489, 2103, 1222, 
    2129, 1632, 2123, 2107, 1254, 443, 1985, 1031, 2091, 1976, 
    2097, 817, 835, 950, 842, 2747, 1057, 643, 1771, 2007, 2711, 
    801, 1951, 2060, 1267, 1135, 841, 1232, 521, 1685, 2103, 
    1250, 2007, 742, 655, 735, 1218, 1674, 1065, 1235, 1997, 
    753, 906, 2057, 868, 1191, 1284, 2151, 394, 2616, 2236, 1818, 
    2118, 2225, 1929, 2470, 1585, 1285, 1245, 1034, 1916, 2157, 
    1084, 955, 1701, 1207, 2355, 1205), GREEN = c(1477, 769, 
    2181, 1562, 1556, 1390, 2549, 1167, 1552, 1082, 1151, 2767, 
    1330, 1244, 1438, 1241, 1465, 1052, 1921, 1288, 2269, 1706, 
    1242, 1699, 2595, 1325, 2682, 793, 1840, 992, 2171, 2381, 
    1080, 1273, 1433, 789, 2192, 1498, 2576, 2434, 2210, 1536, 
    1589, 1040, 1510, 861, 2221, 2604, 2755, 1136, 1387, 1603, 
    2420, 1183, 1473, 1441, 2463, 1196, 1290, 2354, 1106, 1018, 
    1010, 1579, 1099, 1753, 2013, 1419, 2089, 1193, 1528, 2667, 
    1181, 798, 1266, 2204, 2473, 624, 1391, 1472, 3183, 942, 
    2152, 1532, 1298, 2048, 1778, 1394, 1519, 1699, 1266, 1769, 
    1357, 2717, 1409, 1962, 1056, 2561, 1617, 1235, 1087, 2609, 
    1361, 2519, 1133, 2728, 1574, 1311, 2477, 2473, 1196, 1289, 
    1786, 1212, 1217, 1113, 920, 1089, 946, 2062, 2742, 3177, 
    2132, 1453, 2186, 1057, 1592, 1093, 1349, 1749, 1040, 1080, 
    1776, 1418, 1403, 2689, 1976, 2280, 1249, 1770, 1885, 1268, 
    1184, 1564, 2510, 2711, 1143, 1295, 2506, 1705, 1933, 1176, 
    1298, 1718, 2120, 1200, 1260, 1274, 2243, 2137, 1450, 1219, 
    1247, 1342, 1374, 1462, 1843, 1584, 2636, 1128, 1225, 1207, 
    2162, 1381, 827, 1779, 1147, 1466, 1108, 1746, 2622, 1534, 
    2486, 2002, 2408, 2649, 1529, 810, 2145, 1415, 2580, 2118, 
    2450, 1109, 1184, 1123, 1247, 3077, 1407, 981, 2001, 2351, 
    3034, 970, 2509, 2543, 1544, 1328, 1245, 1484, 1045, 1817, 
    2644, 1562, 2596, 1110, 761, 1079, 1337, 1884, 1302, 1467, 
    2296, 1142, 1256, 2596, 1231, 1458, 1474, 2704, 800, 3005, 
    2749, 2196, 2553, 2491, 2169, 2755, 1769, 1481, 1593, 1449, 
    2375, 2672, 1403, 1205, 1886, 1570, 2774, 1367), RED = c(1177, 
    596, 1864, 1357, 1240, 1111, 1954, 1024, 1187, 748, 1290, 
    2023, 963, 1025, 1120, 1116, 1401, 923, 1366, 1014, 2423, 
    1494, 730, 1594, 2899, 975, 2585, 851, 1317, 685, 1304, 1551, 
    638, 825, 1154, 509, 2637, 1166, 2474, 2222, 1688, 1203, 
    1450, 572, 1388, 616, 1376, 1617, 1877, 920, 1153, 1387, 
    2198, 832, 1273, 1009, 2418, 863, 794, 2215, 713, 876, 966, 
    1182, 1013, 1645, 2416, 1100, 1333, 1082, 1449, 1842, 934, 
    380, 1134, 2639, 1647, 395, 986, 1400, 2210, 733, 1617, 1319, 
    1198, 1164, 1060, 1068, 1294, 1022, 983, 1306, 1035, 2377, 
    924, 1600, 936, 2900, 1449, 965, 843, 2958, 1087, 2882, 984, 
    2142, 1581, 1422, 2835, 1965, 815, 1170, 1084, 806, 937, 
    988, 940, 972, 1042, 1680, 2075, 2235, 1914, 1372, 1590, 
    1011, 1088, 727, 1332, 1449, 920, 909, 1114, 1267, 1232, 
    2132, 1934, 2769, 848, 1232, 1142, 1088, 1067, 1061, 2841, 
    2795, 754, 987, 2033, 1442, 1088, 601, 1001, 1836, 1663, 
    867, 888, 961, 2100, 1894, 1385, 771, 892, 1185, 984, 979, 
    2184, 1393, 1875, 1036, 1102, 881, 2103, 932, 405, 1687, 
    1130, 1231, 701, 1218, 2961, 1308, 2548, 1873, 2140, 3024, 
    1377, 522, 1590, 1254, 2933, 1372, 2406, 775, 912, 786, 1040, 
    2207, 1045, 823, 1775, 1714, 2206, 775, 2868, 2587, 1548, 
    1136, 871, 1194, 626, 1438, 3057, 1325, 2891, 746, 617, 868, 
    1213, 1465, 1068, 1446, 2185, 757, 1190, 2398, 1115, 1092, 
    1347, 2529, 492, 2250, 2760, 1944, 1947, 2247, 1909, 1985, 
    1021, 1322, 1618, 1400, 2041, 2435, 1192, 730, 1329, 1408, 
    2277, 1066)), row.names = c(NA, -250L), class = "data.frame")

CodePudding user response:

You may inner_join the filtered data with the original data.

library(dplyr)

oridf %>%
  count(Site_ID, Spot_Nr, Row_Nr, Point_Nr) %>%
  group_by(Site_ID, Spot_Nr, Row_Nr) %>%
  mutate(rank = rank(-n, ties.method = 'random')) %>%
  filter(rank %in% 1:3) %>%
  arrange(Site_ID, Spot_Nr, Row_Nr, rank) %>%
  ungroup %>%
  inner_join(oridf, by = c("Site_ID", "Spot_Nr", "Row_Nr", "Point_Nr"))
  • Related