Home > Back-end >  Problems using # (hashtag) in string columns importing CSV in R
Problems using # (hashtag) in string columns importing CSV in R

Time:09-24

I have hashtags (#) in some of my string fields in a CSV file. It looks like that R has problems with it.

csv = "A;B;C
n;# 9;0
n;1;0"

read.table(text=csv, header=TRUE, sep=";", encoding="UTF-8")

Results in

Fehler in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 3 elements

The CSV file is generated by Python using the csv,QUOTE_MINIMAL style. IT means that string fiels are only enclosed with quotes if necessary (e.g. when the string itself contains a quote char). There is no way to change that. So I have to deal with the # on the R side.

CodePudding user response:

read.table treats hash as comment by default. Change comment.char to any other value to change that.

read.table(text=csv, header=TRUE, sep=";", encoding="UTF-8", comment.char = '@')

#  A   B C
#1 n # 9 0
#2 n   1 0

And that is why you should use read.csv() instead of read.table(). The first is the latter but with defaults making more sense for CSV files.

  • Related