I'm working with a CSV in which one column of numbers is separated with commas (ex. 1,000,000 = 1000000) Is there a way I can replace the entire column? When I try:
replace(df2.Volume, "," => "")
it gives me back the entire column as if nothing has changed. ... and when I tried:
julia> parse(Int, replace("df2.Volume",","=>"") )
ERROR: ArgumentError: invalid base 10 digit 'd' in "df2.Volume"
Stacktrace:
[1] tryparse_internal(#unused#::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
@ Base .\parse.jl:137
[2] parse(::Type{Int64}, s::String; base::Nothing)
@ Base .\parse.jl:241
[3] parse(::Type{Int64}, s::String)
@ Base .\parse.jl:241
[4] top-level scope
@ REPL[263]:1
The data is all numbers in the millions, so how can I remove these commas?? I appreciate your help! Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/
CodePudding user response:
You can do something like:
df.Volume = [parse(Int, replace(v, ","=>"")) for v in df.Volume]
CodePudding user response:
A column of a DataFrame
in Julia is a Vector
. Hence if you want to do something with the entire column you usually need to vectorize the operation using the dot (.
) operator.
julia> df = DataFrame(Volume=["1,000","1,000,000","1,000,000,0000"]);
julia> df.VolumeOK = replace.(df.Volume, "," => "");
julia> df
3×2 DataFrame
Row │ Volume VolumeOK
│ String String
─────┼─────────────────────────────
1 │ 1,000 1000
2 │ 1,000,000 1000000
3 │ 1,000,000,0000 10000000000
Note the dot .
after replace
.
You can of course further parse it to Int
using vectorized parse
function such as parse.(Int, df.VolumeOK)
.
Finally, note that you could handle all issues directly when reading data with CSV.jl
such as:
CSV.read("df.csv", delim=";", decimal=",")