Home > Blockchain >  Making a binary wider df from a longer df
Making a binary wider df from a longer df

Time:08-27

I have a data.frame such as

data.frame(x = c(51, 51, 13, 51, 51, 61),
          y = c(24, 24, 52, 24, 24, 13),
          plot=c("ua1", "ua1", "ua2", "ua1", "ua1", "ua3"), 
          spp=c("sp1", "sp1", "sp2", "sp3", "sp4", "sp1"))

What I'm trying to do is:

  1. keep x, y, and plot as columns and make each unique string inside ``spp```a new column.

  2. For each new column (e.g., "sp1"), check if it occurs in each plot and add the value (1 for occurrence, 0 for absence) to the respective cell, making it a plot x spp matrix that maintains x and y coordinates.

The closer I got was via

tmp %>% pivot_wider(names_from = "spp", values_from = "plot")

But it doesn't generate a proper plot x spp matrix, it only identifies the plot where each spp occurs and returns NA where it doesn't...

A desirable result would look like

data.frame(x = c(51, 13, 61),
           y = c(24, 52, 13),
           plot=c("ua1", "ua2", "ua3"),
           sp1=c(1, 0, 1),
           sp2=c(0, 1, 0),
           sp3=c(1, 0, 0),
           sp4=c(1, 0, 0))

Any ideas?

CodePudding user response:

You can use the values_fn to handle the multiple values. For example

tmp %>% 
  pivot_wider(
    names_from = spp, values_from = spp, 
    values_fn = function(x) pmin(1, length(x)), 
    values_fill = 0
  )
# A tibble: 3 × 7
#       x     y plot    sp1   sp2   sp3   sp4
#   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
# 1    51    24 ua1       1     0     1     1
# 2    13    52 ua2       0     1     0     0
# 3    61    13 ua3       1     0     0     0

CodePudding user response:

library(dplyr)
library(tidyr)
df %>%
  distinct() %>%  ## need to get rid of the duplicate row
  mutate(val = 1) %>%
  pivot_wider(names_from = spp, values_from = val, values_fill = list(val = 0))
# # A tibble: 3 × 7
#       x     y plot    sp1   sp2   sp3   sp4
#   <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
# 1    51    24 ua1       1     0     1     1
# 2    13    52 ua2       0     1     0     0
# 3    61    13 ua3       1     0     0     0

CodePudding user response:

Here is a dplyr only approach:

library(dplyr)
df  %>% 
  group_by(x,y,plot) %>% 
  cbind((model.matrix(~ spp   0, .) == 1)*1) %>% 
  distinct() %>% 
  summarise(across(starts_with("sppsp"), ~sum(.)), .groups = "drop") %>% 
  arrange(plot)
      x     y plot  sppsp1 sppsp2 sppsp3 sppsp4
  <dbl> <dbl> <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1    51    24 ua1        1      0      1      1
2    13    52 ua2        0      1      0      0
3    61    13 ua3        1      0      0      0
  • Related