I have the following looking dataframe:
using DataFrames
df = DataFrame(
group = ["A", "A", "A", "B", "B", "B"],
V1 = [1, NaN, NaN, 3, NaN, NaN],
V2 = [NaN, 4, NaN, 2, NaN, NaN],
V3 = [NaN, NaN, 4, NaN, 1, NaN])
Output:
6×4 DataFrame
Row │ group V1 V2 V3
│ String Float64 Float64 Float64
─────┼───────────────────────────────────
1 │ A 1.0 NaN NaN
2 │ A NaN 4.0 NaN
3 │ A NaN NaN 4.0
4 │ B 3.0 2.0 NaN
5 │ B NaN NaN 1.0
6 │ B NaN NaN NaN
As you can see per group there are some values. I would like to merge the rows per group so we actually remove the NaN values. Here is the desired output:
2×4 DataFrame
Row │ group V1 V2 V3
│ String Int64 Int64 Int64
─────┼─────────────────────────────
1 │ A 1 4 4
2 │ B 3 2 1
As you can see the rows are now merged into one row per group without any NaN. So I was wondering if anyone knows how to merge multiple rows with some values to remove the NaNs like above in a dataframe Julia?
CodePudding user response:
julia> combine(groupby(df, :group), Not(:group) .=> x -> filter(!isequal(NaN), x), renamecols=false)
2×4 DataFrame
Row │ group V1 V2 V3
│ String Float64 Float64 Float64
─────┼───────────────────────────────────
1 │ A 1.0 4.0 4.0
2 │ B 3.0 2.0 1.0
Note that the output is not what you asked for as your source columns contain floats, while you asked for integers. You could get it like this:
julia> combine(groupby(df, :group), Not(:group) .=> x -> Int.(filter(!isequal(NaN), x)), renamecols=false)
2×4 DataFrame
Row │ group V1 V2 V3
│ String Int64 Int64 Int64
─────┼─────────────────────────────
1 │ A 1 4 4
2 │ B 3 2 1
Finally note that you should not use NaN
to indicate missing data in your analyses. missing
is an intended way to do it.