Good morning community, I wanted to ask about any proposals you have to solve the following problem I have with a dataset. It turns out, that I want that in the column "Municipio" of the image on the left, every time I change the name of the municipality, the numerical value of the column increases, in order to be able to group all the data and classify them according to the "codigo municipio" that you see in the image on the right. I do not do it manually because there are more than 1000 municipalities and it would take me more than a whole day to do this task, so I would like to hear if anyone has a proposal, thank you very much.
CodePudding user response:
I have used the package dplyr
in R, but you could also just do this in Excel if you wanted to.
library(dplyr)
# Mockup approximating your data
df <- data.frame(
EM = c("ABEJORRAL", "AEXSAT S.A.", "AZTECA COM", "ABREGO", "AXESAT S.A.", "ABRIAQUI"),
Numero = c(890,2,0,259,4,64)
)
municipios <- data.frame(
Municipios = c("ABEJORRAL", "ABREGO", "ABRIAQUI"),
Validacion = c("Municipio")
)
# create a new column with the Municipios ID by just counting up from 1.
df <- df %>% mutate(
Municipio = cumsum(EM %in% municipios$Municipios)
)
This solution assumes the municipios are in the same order in both tables, and none are missing from the main data set, as it's just creating a grouping variable.
output:
EM Numero Municipio
1 ABEJORRAL 890 1
2 AEXSAT S.A. 2 1
3 AZTECA COM 0 1
4 ABREGO 259 2
5 AXESAT S.A. 4 2
6 ABRIAQUI 64 3