I have a data frame with different areas. The naming of the areas has changed in the last year, so I would like to change to the old names. I have a list of the names as in the DataSet and how I want them to be, witch looks like:
> A <- c("Agri & Forestry", "Econs & Econometrics")
> B <- c("Agriculture", "Economics & Econometrics")
> cbind(A, B)
A B
[1,] "Agri & Forestry" "Agriculture"
[2,] "Econs & Econometrics" "Economics & Econometrics"
For instance, I don't want "Agriculture" I want "Agri & Forestry".
My list has around 30 different changes. It's possible to open this list on R and make this change? My DF looks like that:
> year <- c(2018, 2018, 2019,2020,2021,2021)
> area <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
"Econs & Econometrics", "Agriculture", "Economics & Econometrics")
> cbind(year, area)
year area
[1,] "2018" "Agri & Forestry"
[2,] "2018" "Agri & Forestry"
[3,] "2019" "Econs & Econometrics"
[4,] "2020" "Econs & Econometrics"
[5,] "2021" "Agriculture"
[6,] "2021" "Economics & Econometrics"
And I want something like:
> area2 <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
"Econs & Econometrics", "Agri & Forestry", "Econs & Econometrics")
> cbind(year, area2)
year area2
[1,] "2018" "Agri & Forestry"
[2,] "2018" "Agri & Forestry"
[3,] "2019" "Econs & Econometrics"
[4,] "2020" "Econs & Econometrics"
[5,] "2021" "Agri & Forestry"
[6,] "2021" "Econs & Econometrics"
CodePudding user response:
You can get row indices where there is a match
with the undesired character values, and then substitute values based on those indices.
Here, we assume df1
is the first data.frame with columns A
and B
, and df2
is the second data.frame, with columns year
and area
.
idx <- match(df2$area, df1$B)
df2$area[!is.na(idx)] <- df1$A[na.omit(idx)]
Output
year area
1 2018 Agri & Forestry
2 2018 Agri & Forestry
3 2019 Econs & Econometrics
4 2020 Econs & Econometrics
5 2021 Agri & Forestry
6 2021 Econs & Econometrics
Data
df1 <- structure(list(A = c("Agri & Forestry", "Econs & Econometrics"
), B = c("Agriculture", "Economics & Econometrics")), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(year = c(2018, 2018, 2019, 2020, 2021, 2021),
area = c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
"Econs & Econometrics", "Agri & Forestry", "Econs & Econometrics"
)), row.names = c(NA, -6L), class = "data.frame")
CodePudding user response:
I use merge to add a column containing the new value of "area" for each row of the data frame with an outdated value. Then I use ifelse to define a new column containing only the new values. Using merge shuffled the rows, so I reorder it at the end, and I also remove the redundant column added at the beginning
A <- c("Agri & Forestry", "Econs & Econometrics")
B <- c("Agriculture", "Economics & Econometrics")
dict <- cbind(A, B)
year <- c(2018, 2018, 2019,2020,2021,2021)
area <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
"Econs & Econometrics", "Agriculture", "Economics & Econometrics")
df <- cbind(year, area)
df <- merge(df, dict, by.x="area", by.y="B", all.x=TRUE)
df$area2 <- ifelse(is.na(df$A), df$area, df$A)
df[order(df$year), c("year", "area", "area2")]