Home > Back-end >  If I have a vector that was loaded from a CSV and that number has commas separating
If I have a vector that was loaded from a CSV and that number has commas separating


I'm working with a CSV in which one column of numbers is separated with commas (ex. 1,000,000 = 1000000) Is there a way I can replace the entire column? When I try:

replace(df2.Volume, "," => "")

it gives me back the entire column as if nothing has changed. ... and when I tried:

julia> parse(Int, replace("df2.Volume",","=>"") )
ERROR: ArgumentError: invalid base 10 digit 'd' in "df2.Volume"
 [1] tryparse_internal(#unused#::Type{Int64}, s::String, startpos::Int64, endpos::Int64, base_::Int64, raise::Bool)
   @ Base .\parse.jl:137
 [2] parse(::Type{Int64}, s::String; base::Nothing)
   @ Base .\parse.jl:241
 [3] parse(::Type{Int64}, s::String)
   @ Base .\parse.jl:241
 [4] top-level scope
   @ REPL[263]:1

The data is all numbers in the millions, so how can I remove these commas?? I appreciate your help! Source: https://testdataframesjl.readthedocs.io/en/readthedocs/subsets/

CodePudding user response:

You can do something like:

df.Volume = [parse(Int, replace(v, ","=>"")) for v in df.Volume]

CodePudding user response:

A column of a DataFrame in Julia is a Vector. Hence if you want to do something with the entire column you usually need to vectorize the operation using the dot (.) operator.

julia> df = DataFrame(Volume=["1,000","1,000,000","1,000,000,0000"]);

julia> df.VolumeOK = replace.(df.Volume, "," => ""); 

julia> df
3×2 DataFrame
 Row │ Volume          VolumeOK
     │ String          String
   1 │ 1,000           1000
   2 │ 1,000,000       1000000
   3 │ 1,000,000,0000  10000000000

Note the dot . after replace. You can of course further parse it to Int using vectorized parse function such as parse.(Int, df.VolumeOK).

Finally, note that you could handle all issues directly when reading data with CSV.jl such as:

CSV.read("df.csv", delim=";", decimal=",")
  • Related