I'm trying to create a very basic function in which I want to remove any 0 values (or less) from the df based on a specific column in the df. When I run these lines outside of the function they work but when I try to run them within the function I get this error "Error in $<-.data.frame
(*tmp*
, name, value = numeric(0)) : replacement has 0 rows". Does anyone know what the problem is?
Remove_Missing=function(x,name){
x$name=as.numeric(x$name)
x=x[x$name>0,]
}
EDIT:
Example Code:
#First two lines work but those same two lines won't work if function is called
merged_data$name=as.numeric(merged_data$HETENURE)
merged_data=merged_data[merged_data$HETENURE>0,]
Remove_Missing(merged_data, HETENURE) #Call function
Data
structure(list(HRHHID = c("008906910993941", "008906910993941",
"648061954059610", "160916068405549", "160916068405549", "168069009100998"
), HRYEAR4 = c("2010", "2010", "2010", "2010", "2010", "2010"
), HETENURE = c(" 1", " 1", " 3", " 1", " 1", " 1"), HEFAMINC = c("11",
"11", "10", "13", "13", "14"), HRNUMHOU = c(" 2", " 2", " 1",
" 2", " 2", " 3"), GESTFIPS = c("01", "01", "01", "01", "01",
"01"), GTMETSTA = c("2", "2", "1", "1", "1", "1"), PEMARITL = c(" 1",
" 1", " 4", " 1", " 1", " 1"), PESEX = c(" 2", " 1", " 1", " 2",
" 1", " 2"), PEEDUCA = c("40", "45", "40", "42", "41", "39"),
PTDTRACE = c(" 1", " 1", " 1", " 1", " 1", " 1"), PEHSPNON = c(" 2",
" 2", " 2", " 2", " 2", " 2"), PEMLR = c(" 5", " 5", " 5",
" 1", " 1", " 7"), PRFTLF = c("-1", "-1", "-1", " 1", " 1",
"-1"), PRHRUSL = c("-1", "-1", "-1", " 4", " 4", "-1"), HESP1 = c("-1",
"-1", "-1", "-1", "-1", "-1"), HESP6 = c("-1", "-1", "-1",
"-1", "-1", "-1"), HESP7A = c("-1", "-1", "-1", "-1", "-1",
"-1"), HESP8 = c("-1", "-1", "-1", "-1", "-1", "-1"), HRFS12M1 = c(" 1", " 1", " 1", " 1", " 1", " 1")), row.names = c(9L, 10L, 11L,
12L, 13L, 15L), class = "data.frame")
CodePudding user response:
There are two problems and an enabling mistake here:
You define your function with
function(x, name)
but then try to reference the particular column asx$name
, which should fail. That is, ifname
is supposed to identify (via standard-evaluation) a column, then it should really be a string, and$
does not work that way. You should instead be usingx[[name]]
(see The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe).This is not reporting as a problem (though it should), however, because of the next two bugs.
You are calling your function as
Remove_Missing(merged_data, HETENURE)
but since you are not attempting to do non-standard evaluation (NSE), the use of
HETENURE
is wrong. What should be happening is that in your function, when itsname
is referenced, it should look for an object namedHETENURE
and not find it; it should err withError: object 'HETENURE' not found
. What I think you should be doing isRemove_Missing(merged_data, "HETENURE")
Not a bug so much as a weakness that allowed other bugs to remain undiscovered: you assigned
merged_data$name <- as.numeric(...)
, so in your function whenx$name
should have been referencingx$HETENURE
and should have failed, it instead found a column namedname
in your data (and therefore the function's passed argument ofname
was never referenced/used).
First, let's remove the tempting hidden-bug of the column named name
:
merged_data$name <- NULL
Second, the fixed function:
Remove_Missing = function(x, name) {
x[[name]] = as.numeric(x[[name]])
x[x[[name]] > 0,]
}
Third, fixing the invocation and getting return data:
Remove_Missing(merged_data, "HETENURE")
# HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 9 008906910993941 2010 1 11 2 01 2 1 2 40 1 2 5 -1 -1 -1 -1 -1 -1 1 1
# 10 008906910993941 2010 1 11 2 01 2 1 1 45 1 2 5 -1 -1 -1 -1 -1 -1 1 1
# 11 648061954059610 2010 3 10 1 01 1 4 1 40 1 2 5 -1 -1 -1 -1 -1 -1 1 3
# 12 160916068405549 2010 1 13 2 01 1 1 2 42 1 2 1 1 4 -1 -1 -1 -1 1 1
# 13 160916068405549 2010 1 13 2 01 1 1 1 41 1 2 1 1 4 -1 -1 -1 -1 1 1
# 15 168069009100998 2010 1 14 3 01 1 1 2 39 1 2 7 -1 -1 -1 -1 -1 -1 1 1
Granted, in this case nothing was filtered out (since all of your data passed the condition), so if I temporarily revise the function to condition on > 1
instead, we'll see the change:
Remove_Missing = function(x, name) {
x[[name]] = as.numeric(x[[name]])
x[x[[name]] > 1,]
}
Remove_Missing(merged_data, "HETENURE")
# HRHHID HRYEAR4 HETENURE HEFAMINC HRNUMHOU GESTFIPS GTMETSTA PEMARITL PESEX PEEDUCA PTDTRACE PEHSPNON PEMLR PRFTLF PRHRUSL HESP1 HESP6 HESP7A HESP8 HRFS12M1 name
# 11 648061954059610 2010 3 10 1 01 1 4 1 40 1 2 5 -1 -1 -1 -1 -1 -1 1 3