Home > front end >  How to consolidate duplicates
How to consolidate duplicates

Time:09-28

I have several duplicate entries in a data set. I want to combine and then add the end column together. See below.

What I have:

Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|

Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|

What I want

Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus | 2|

Is this possible? I have many data points like this throughout and I would like to combine.

CodePudding user response:

Try,

library(dplyr)

df %>%
  group_by(across(c( - lastcolumn))) %>%
  summarise(lastcolumn = n())

where lastcolumn is the column that was 1 and needed to become two.

CodePudding user response:

First, put the duplicates into bins with increasing numbers, count them and unique the data frame.

In R>4.1 you could do:

dat <- transform(dat, bin=cumsum(!duplicated(dat[-(ncol(dat))]))) |>
  transform(V13=ave(V13, bin, FUN=sum)) |>
  unique()
dat
#     V1        V2    V3     V4    V5   V6         V7 V8    V9  V10          V11           V12 V13 bin
# 1 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03  0 Creek chub    Semotilus atromaculatus   3   1
# 4 Main FLOW22016 FLOW2   Nile river 2016 2016-10-03  0 Creek chub Chelaethiops         bibie   2   2

Data

dat <- read.table(header=TRUE, text='
    V1        V2    V3     V4    V5   V6         V7 V8    V9  V10       V11           V12 V13
 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03  0 Creek chub Semotilus atromaculatus   1           
 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03  0 Creek chub Semotilus atromaculatus   1           
 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03  0 Creek chub Semotilus atromaculatus   1           
 Main FLOW22016 FLOW2 Nile river 2016 2016-10-03  0 Creek chub Chelaethiops bibie   1           
 Main FLOW22016 FLOW2 Nile river 2016 2016-10-03  0 Creek chub Chelaethiops bibie   1           
           ')

CodePudding user response:

You mean:

library(dplyr)
df %>%
  group_by(lastcolumn) %>%
  mutate(lastcolumn = sum(lastcolumn))
  • Related