Home > Back-end >  Transfer Many Stata Replaces to R
Transfer Many Stata Replaces to R

Time:11-16

I have a couple thousand lines of Stata code that generally aims to replace negative (missing) values with a proper missing value (.) from a peer, and I need to transfer this code to R. To do so, I have taken the code and saved it as a single column of character strings. Replacements essentially look like the following, ad nauseam:

replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )

These R04_ are variables in a data set, so I hope to essentially transfer these lines of Stata to R efficiently.

I have tried taking this and separating/replacing to easily iterate over a list of variables that need replacing, but I am running low on ideas. Any ideas on how to easily transfer these replaces en masse to R if I have them in the form of a character string data set? My expected output is essentially conducting many Stata replaces in R, which I have presented in data below.

Dput of the head of the data (rawMissing). Thanks!

# Data (many Stata replaces
dput(head(rawMissing))
structure(list(replacements = c("replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )", 
"replace R04R_A_AT0047 = . if (R04R_A_AT0047 <= -1 )", "replace R04R_A_AM0069 = . if (R04R_A_AM0069 <= -1 )", 
"replace R04R_A_AM0065_V2 = . if (R04R_A_AM0065_V2 <= -1 )", 
"replace R04_AM0066 = . if (R04_AM0066 <= -1 )", "replace R04_AM0070 = . if (R04_AM0070 <= -1 )"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

# Expected output would be efficiently conducting these many replaces in R

CodePudding user response:

We may extract the column names, operator and the value to be replaced as separate columns

library(dplyr)
library(tidyr)
keydat <- rawMissing %>%
     extract(replacements, into = c('colnm', 'operator', 'value'), 
         '^[^(] \\((\\w )\\s ([[:punct:]] )\\s (-?[0-9] )')

then, using the above data, loop across the original dataset say 'df1' by looping across the columns specified in the 'keydat' and do the replacements

df2 <- df1 %>%
   mutate(across(all_of(keydat$colnm), ~ 
         {
         op <- keydat$operator[match(cur_column(), keydat$colnm)]
         val <-  keydat$value[match(cur_column(), keydat$colnm)]
         replace(., match.fun(op)(., val), NA)
        


        }))

CodePudding user response:

An alternative to @akrun's answer would be to write a new R script and then source that script. This may be helpful to, for example, look through the code and to document exactly what has been done (e.g., for replicable data analysis, etc.). I think that the following would generally work where statareplace.do is the filename of the original Stata file to be read and statareplace.R is the filename of the resulting R script:

fin <- "statareplace.do"
fout <- "statareplace.R"

f <- readLines(fin)
g <- gsub(
    "^\\w \\s (\\w )(\\s )?=(\\s )?. if\\((. )\\)$", "\\1 = ifelse(\\4", f
)
g <- gsub(
    "^\\w \\s (\\w )(\\s )?=(\\s )?(. ) if(\\s )?\\((. )\\)$", 
    "\\1 = ifelse(\\6, \\4, \\1),", f
)
g <- gsub("\\.", "NA", g)
g

writeLines(c("library(dplyr)", "df <- df %>%", "mutate(", g, ")"), fout)
source(fout)
  • Related