Home > other >  Loop through columns combining two data frames in R
Loop through columns combining two data frames in R

Time:11-04

I have the 2 following separate data frames:

itens sp1 sp2
item1 20 10
item2 30 15
item3 30 15
item4 30 15

and

itens sp5 sp6
item7 20 10
item8 30 15
item9 30 15

How could I create, in R, a for function that produce the following result?

sps sum N
sp1 110 4
sp2 55 4
sp5 80 3
sp6 40 3

Where each N and sum columns , in this new data frame, are number and sum of itens of each species (sps).

Thank you

CodePudding user response:

df1 <- structure(list(itens = c("item1", "item2", "item3", "item4"), 
                      sp1 = c(20L, 30L, 30L, 30L), 
                      sp2 = c(10L, 15L, 15L, 15L)), 
                 class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(itens = c("item7", "item8", "item9"),
                      sp5 = c(20L,  30L, 30L), 
                      sp6 = c(10L, 15L, 15L)), 
                 class = "data.frame", row.names = c(NA,  -3L))

Solution not using a loop but tidyr::pivot_longer and dplyr::full_join

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(-itens) %>% 
  full_join(df2 %>%  pivot_longer(-itens)) %>% 
  group_by(sps = name) %>% 
  summarise(N = n(),
            sum = sum(value))

Returns:

  sps      N   sum
  <chr> <int> <int>
1 sp1       4   110
2 sp2       4    55
3 sp5       3    80
4 sp6       3    40

CodePudding user response:

Another solution, based on and dplyr::bind_rows and tidyr::pivot_longer:

library(tidyverse)

df1 <- data.frame(
  stringsAsFactors = FALSE,
             itens = c("item1", "item2", "item3", "item4"),
               sp1 = c(20L, 30L, 30L, 30L),
               sp2 = c(10L, 15L, 15L, 15L)
       )
  
df2 <- data.frame(
  stringsAsFactors = FALSE,
             itens = c("item7", "item8", "item9"),
               sp5 = c(20L, 30L, 30L),
               sp6 = c(10L, 15L, 15L)
       )

df1 %>% 
  bind_rows(df2) %>% 
  pivot_longer(-itens, names_to = "sps") %>% 
  group_by(sps) %>% 
  summarise(sum = sum(value,na.rm = T), N = sum(!is.na(value)))

#> # A tibble: 4 × 3
#>   sps     sum     N
#>   <chr> <int> <int>
#> 1 sp1     110     4
#> 2 sp2      55     4
#> 3 sp5      80     3
#> 4 sp6      40     3
  • Related