I have a function that reads and handles *.csv
files in several dataframes.
However, not all of the CSV files have the same separator. So, how could python can detect which type of separator does the csv file has, and then used it in read_csv()
function to read it in pandas?
df = pd.read_csv(path, sep = 'xxx',header = None, index_col = 0)
CodePudding user response:
Update
In fact, use engine='python'
as parameter of read_csv
. It will try to automatically detect the right delimiter.
sepstr, default ‘,’
Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s ' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'.
Use csv.Sniffer
:
import csv
def find_delimiter(filename):
sniffer = csv.Sniffer()
with open(filename) as fp:
delimiter = sniffer.sniff(fp.read(5000)).delimiter
return delimiter
Demo:
>>> find_delimiter('data.csv')
','
>>> find_delimiter('data.txt')
' '