I have several duplicate entries in a data set. I want to combine and then add the end column together. See below.
What I have:
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus| 1|
What I want
Main | FLOW22016| FLOW2| Forest Lakes| 2016| 2016-10-03| 0| Creek chub| Semotilus atromaculatus | 2|
Is this possible? I have many data points like this throughout and I would like to combine.
CodePudding user response:
Try,
library(dplyr)
df %>%
group_by(across(c( - lastcolumn))) %>%
summarise(lastcolumn = n())
where lastcolumn
is the column that was 1 and needed to become two.
CodePudding user response:
First, put the duplicates into bins with increasing numbers, count them and unique
the data frame.
In R>4.1 you could do:
dat <- transform(dat, bin=cumsum(!duplicated(dat[-(ncol(dat))]))) |>
transform(V13=ave(V13, bin, FUN=sum)) |>
unique()
dat
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 bin
# 1 Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 3 1
# 4 Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 2 2
Data
dat <- read.table(header=TRUE, text='
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Forest Lakes 2016 2016-10-03 0 Creek chub Semotilus atromaculatus 1
Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 1
Main FLOW22016 FLOW2 Nile river 2016 2016-10-03 0 Creek chub Chelaethiops bibie 1
')
CodePudding user response:
You mean:
library(dplyr)
df %>%
group_by(lastcolumn) %>%
mutate(lastcolumn = sum(lastcolumn))