Basic Function to remove 0 values from a df based on a specific column-CodePudding

I'm trying to create a very basic function in which I want to remove any 0 values (or less) from the df based on a specific column in the df. When I run these lines outside of the function they work but when I try to run them within the function I get this error "Error in $<-.data.frame(*tmp*, name, value = numeric(0)) : replacement has 0 rows". Does anyone know what the problem is?

Remove_Missing=function(x,name){
  x$name=as.numeric(x$name)
  x=x[x$name>0,]
}

EDIT:
Example Code:

#First two lines work but those same two lines won't work if function is called
merged_data$name=as.numeric(merged_data$HETENURE)
merged_data=merged_data[merged_data$HETENURE>0,]
Remove_Missing(merged_data, HETENURE) #Call function

Data

structure(list(HRHHID = c("008906910993941", "008906910993941", 
"648061954059610", "160916068405549", "160916068405549", "168069009100998"
), HRYEAR4 = c("2010", "2010", "2010", "2010", "2010", "2010"
), HETENURE = c(" 1", " 1", " 3", " 1", " 1", " 1"), HEFAMINC = c("11", 
"11", "10", "13", "13", "14"), HRNUMHOU = c(" 2", " 2", " 1", 
" 2", " 2", " 3"), GESTFIPS = c("01", "01", "01", "01", "01", 
"01"), GTMETSTA = c("2", "2", "1", "1", "1", "1"), PEMARITL = c(" 1", 
" 1", " 4", " 1", " 1", " 1"), PESEX = c(" 2", " 1", " 1", " 2", 
" 1", " 2"), PEEDUCA = c("40", "45", "40", "42", "41", "39"), 
    PTDTRACE = c(" 1", " 1", " 1", " 1", " 1", " 1"), PEHSPNON = c(" 2", 
    " 2", " 2", " 2", " 2", " 2"), PEMLR = c(" 5", " 5", " 5", 
    " 1", " 1", " 7"), PRFTLF = c("-1", "-1", "-1", " 1", " 1", 
    "-1"), PRHRUSL = c("-1", "-1", "-1", " 4", " 4", "-1"), HESP1 = c("-1", 
    "-1", "-1", "-1", "-1", "-1"), HESP6 = c("-1", "-1", "-1", 
    "-1", "-1", "-1"), HESP7A = c("-1", "-1", "-1", "-1", "-1", 
    "-1"), HESP8 = c("-1", "-1", "-1", "-1", "-1", "-1"), HRFS12M1 = c(" 1", " 1", " 1", " 1", " 1", " 1")), row.names = c(9L, 10L, 11L, 
12L, 13L, 15L), class = "data.frame")

CodePudding user response：

There are two problems and an enabling mistake here:

You define your function with function(x, name) but then try to reference the particular column as x$name, which should fail. That is, if name is supposed to identify (via standard-evaluation) a column, then it should really be a string, and $ does not work that way. You should instead be using x[[name]] (see The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe).

This is not reporting as a problem (though it should), however, because of the next two bugs.
You are calling your function as
```
Remove_Missing(merged_data, HETENURE)
```
but since you are not attempting to do non-standard evaluation (NSE), the use of HETENURE is wrong. What should be happening is that in your function, when its name is referenced, it should look for an object named HETENURE and not find it; it should err with Error: object 'HETENURE' not found. What I think you should be doing is
```
Remove_Missing(merged_data, "HETENURE")
```
Not a bug so much as a weakness that allowed other bugs to remain undiscovered: you assigned merged_data$name <- as.numeric(...), so in your function when x$name should have been referencing x$HETENURE and should have failed, it instead found a column named name in your data (and therefore the function's passed argument of name was never referenced/used).

First, let's remove the tempting hidden-bug of the column named name:

merged_data$name <- NULL

Second, the fixed function:

Remove_Missing = function(x, name) {
  x[[name]] = as.numeric(x[[name]])
  x[x[[name]] > 0,]
}

Third, fixing the invocation and getting return data:

Remove_Missing(merged_data, "HETENURE")
#             HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 9  008906910993941    2010        1       11        2       01        2        1     2      40        1        2     5     -1      -1    -1    -1     -1    -1        1    1
# 10 008906910993941    2010        1       11        2       01        2        1     1      45        1        2     5     -1      -1    -1    -1     -1    -1        1    1
# 11 648061954059610    2010        3       10        1       01        1        4     1      40        1        2     5     -1      -1    -1    -1     -1    -1        1    3
# 12 160916068405549    2010        1       13        2       01        1        1     2      42        1        2     1      1       4    -1    -1     -1    -1        1    1
# 13 160916068405549    2010        1       13        2       01        1        1     1      41        1        2     1      1       4    -1    -1     -1    -1        1    1
# 15 168069009100998    2010        1       14        3       01        1        1     2      39        1        2     7     -1      -1    -1    -1     -1    -1        1    1

Granted, in this case nothing was filtered out (since all of your data passed the condition), so if I temporarily revise the function to condition on > 1 instead, we'll see the change:

Remove_Missing = function(x, name) {
  x[[name]] = as.numeric(x[[name]])
  x[x[[name]] > 1,]
}
Remove_Missing(merged_data, "HETENURE")
#             HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 11 648061954059610    2010        3       10        1       01        1        4     1      40        1        2     5     -1      -1    -1    -1     -1    -1        1    3