Home > OS >  How can I find and replace the list of variables on R
How can I find and replace the list of variables on R

Time:04-11

I have a data frame with different areas. The naming of the areas has changed in the last year, so I would like to change to the old names. I have a list of the names as in the DataSet and how I want them to be, witch looks like:

> A <- c("Agri & Forestry", "Econs & Econometrics")
> B <- c("Agriculture", "Economics & Econometrics")
> cbind(A, B)


     A                      B                         
[1,] "Agri & Forestry"      "Agriculture"             
[2,] "Econs & Econometrics" "Economics & Econometrics"

For instance, I don't want "Agriculture" I want "Agri & Forestry".

My list has around 30 different changes. It's possible to open this list on R and make this change? My DF looks like that:

> year <- c(2018, 2018, 2019,2020,2021,2021)
> area <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
            "Econs & Econometrics", "Agriculture", "Economics & Econometrics")
> cbind(year, area)

     year   area                      
[1,] "2018" "Agri & Forestry"         
[2,] "2018" "Agri & Forestry"         
[3,] "2019" "Econs & Econometrics"    
[4,] "2020" "Econs & Econometrics"    
[5,] "2021" "Agriculture"             
[6,] "2021" "Economics & Econometrics"

And I want something like:

> area2 <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
            "Econs & Econometrics", "Agri & Forestry", "Econs & Econometrics")
> cbind(year, area2)
     year   area2                 
[1,] "2018" "Agri & Forestry"     
[2,] "2018" "Agri & Forestry"     
[3,] "2019" "Econs & Econometrics"
[4,] "2020" "Econs & Econometrics"
[5,] "2021" "Agri & Forestry"     
[6,] "2021" "Econs & Econometrics"

CodePudding user response:

You can get row indices where there is a match with the undesired character values, and then substitute values based on those indices.

Here, we assume df1 is the first data.frame with columns A and B, and df2 is the second data.frame, with columns year and area.

idx <- match(df2$area, df1$B)
df2$area[!is.na(idx)] <- df1$A[na.omit(idx)]

Output

  year                 area
1 2018      Agri & Forestry
2 2018      Agri & Forestry
3 2019 Econs & Econometrics
4 2020 Econs & Econometrics
5 2021      Agri & Forestry
6 2021 Econs & Econometrics

Data

df1 <- structure(list(A = c("Agri & Forestry", "Econs & Econometrics"
), B = c("Agriculture", "Economics & Econometrics")), class = "data.frame", row.names = c(NA, 
-2L))

df2 <- structure(list(year = c(2018, 2018, 2019, 2020, 2021, 2021), 
    area = c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics", 
    "Econs & Econometrics", "Agri & Forestry", "Econs & Econometrics"
    )), row.names = c(NA, -6L), class = "data.frame")

CodePudding user response:

I use merge to add a column containing the new value of "area" for each row of the data frame with an outdated value. Then I use ifelse to define a new column containing only the new values. Using merge shuffled the rows, so I reorder it at the end, and I also remove the redundant column added at the beginning

A <- c("Agri & Forestry", "Econs & Econometrics")
B <- c("Agriculture", "Economics & Econometrics")
dict <- cbind(A, B)

year <- c(2018, 2018, 2019,2020,2021,2021)
area <- c("Agri & Forestry", "Agri & Forestry", "Econs & Econometrics",
          "Econs & Econometrics", "Agriculture", "Economics & Econometrics")
df <- cbind(year, area)

df <- merge(df, dict, by.x="area", by.y="B", all.x=TRUE)
df$area2 <- ifelse(is.na(df$A), df$area, df$A)

df[order(df$year), c("year", "area", "area2")]
  • Related