After reading the context, if you felt the title could be enhanced to fit the question and you had an idea, feel free to update it.
Suppose I have the following DataFrame:
using DataFrames
df = DataFrame(
g=["a","b","a","c",missing,missing,missing,missing],
a=[1,2,3,4,missing,missing,missing,missing],
Column1=[missing,missing,missing,missing,false,false,false,true],
Column2=[missing,missing,missing,missing,false,true,true,true],
Column3=[missing,missing,missing,missing,true,true,false,false],
)
# 8×5 DataFrame
# Row │ g a Column1 Column2 Column3
# │ String? Int64? Bool? Bool? Bool?
# ─────┼─────────────────────────────────────────────
# 1 │ a 1 missing missing missing
# 2 │ b 2 missing missing missing
# 3 │ a 3 missing missing missing
# 4 │ c 4 missing missing missing
# 5 │ missing missing false false true
# 6 │ missing missing false true true
# 7 │ missing missing false true false
# 8 │ missing missing true true false
I want to convert it to this:
# 8×5 DataFrame
# Row │ g a Column1 Column2 Column3
# │ String? Int64? Bool? Bool? Bool?
# ─────┼─────────────────────────────────────────────
# 1 │ a 1 false false true
# 2 │ b 2 false true true
# 3 │ a 3 false true false
# 4 │ c 4 true true false
I tried:
DataFrame(collect.(skipmissing.(eachcol(df))), names(df))
But I think this is not an optimal way since I'm using the collect
function. Is there any better way to do it?
CodePudding user response:
For me a natural way to do it would be:
julia> mapcols(x -> filter(!ismissing, x), df)
4×5 DataFrame
Row │ g a Column1 Column2 Column3
│ String? Int64? Bool? Bool? Bool?
─────┼────────────────────────────────────────────
1 │ a 1 false false true
2 │ b 2 false true true
3 │ a 3 false true false
4 │ c 4 true true false
However, this assumes that number of missing values in every column is the same (but I guess this is what you have in this exercise - right?).
skipmissing
is designed for cases when user wants a non-copying iterable skipping missing
values (which is not the case here).