So I have the following data frame with columns "Longitude", "Latitude" and "Prob", for probability. Prob is always either 0, 1 or 2 (note you can't see a row with Prob=2 in the example below).
Longitude Latitude Prob
1 28.00403 -26.50431 1
2 28.01236 -26.50431 1
3 28.02069 -26.50431 1
4 28.02903 -26.50431 0
5 28.03736 -26.50431 0
6 28.04569 -26.50431 0
7 28.05403 -26.50431 0
8 28.06236 -26.50431 1
9 28.07069 -26.50431 1
What I want to do is identify rows that have the same Longitude and Latitude values. If, for example, rows 5 and 7 show the same Longitude and Latitude values, and row 5 has Prob 1 and row 7 has probability 2, I want to remove one of those rows and make Prob equal 3 for the row that remains.
I.e., I want to select duplicated rows based on two columns, Longitude and Latitude. If two rows have the same longitude and latitude values, and if they show Prob 1 and 2, I want to "merge" these rows and make Prob 3.
I hope this makes sense. Thanks.
CodePudding user response:
If I understood you correctly, you just want to group by Long, Lat and take the sum of Prob.
df %>%
group_by(Longitude, Latitude) %>%
summarise(Prob = sum(Prob))
CodePudding user response:
You should try to solve the problem and post your challenges here, rather than just the starting question. This page might also help.
Anyway, something like this might work:
#fake data
df = data.frame(
x = c(1,2,3,3,5),
y = c(2,3,3,3,4),
prob = rep('a',5)
)
# find duplicates
index_both = duplicated(df) | duplicated(df, fromLast=TRUE)
index_drop = duplicated(df)
# assign new value
df[index_both,'prob'] = 'b'
# drop duplicate
df = df[!index_drop,]