In R dataframe, how to add margin column using COUNT function just like EXCEL ?
For instance, there is dataframe as red area, i want to add column and row (yellow area) for counting the cells which have numeric content by rows/columns.
mdata <- data.frame(
"CATEGORY"=c("A","B","C","D","E","F"),
"SALES"=c(1,20,2,2,0,3),
"QTY"=c(0,4,0,0,0,2),
"RETURN"=c(0,3,1,0,9,1)
)
CodePudding user response:
There are a couple of things that make this not straightforward for your example:
You're using
0
for both the values that are zero in the Excel spreadsheet, and the empty cells - I've replaced the empty cells withNA
in the exampleFor the columns you're storing labels in the first column, so you don't want to include this in your count, whereas the first row is part of the data
Taking this into account, one solution would be:
mdata <- data.frame(
"CATEGORY"=c("A","B","C","D","E","F"),
"SALES"=c(1,20,2,2,0,3),
"QTY"=c(NA,4,NA,NA,NA,2),
"RETURN"=c(NA,3,1,NA,9,1)
)
mdata <- rbind(mdata, c("rowCount", colSums(!is.na(mdata))[-1]))
mdata <- cbind(mdata, columnCount = c(head(rowSums(!is.na(mdata[,-1])),-1),NA))
mdata
# CATEGORY SALES QTY RETURN columnCount
# 1 A 1 <NA> <NA> 1
# 2 B 20 4 3 3
# 3 C 2 <NA> 1 2
# 4 D 2 <NA> <NA> 1
# 5 E 0 <NA> 9 2
# 6 F 3 2 1 3
# 7 rowCount 6 2 4 NA
The main trick is to use colSums(!is.na())
/rowSums(!is.na())
to count the number of non-NA
values in each row/column.
CodePudding user response:
Base R solution, note unless this is an output this is not an advisable data-structure:
# Function to resolve numeric column vectors:
# resolve_num_vecs => function()
resolve_num_vecs <- function(df){
# Resolve the numeric vectors: num_cols => logical vector
num_cols <- vapply(
df,
is.numeric,
logical(1)
)
# Explicitly define the returned object:
# logical vector => env
return(num_cols)
}
# Apply the function: num_cols => logical vector
num_cols <- resolve_num_vecs(mdata)
# Create data.frame as requried, note this is not an
# advisable data structure: res => data.frame
res <- rbind(
transform(
mdata,
colCOUNT = rowSums(
Negate(is.na)(mdata[,num_cols]) && mdata[,num_cols] != 0
)
),
c(
"rowCOUNT",
colSums(
Negate(is.na)(mdata[,num_cols]) && mdata[,num_cols] != 0
),
NA_integer_
)
)