Home > Software design >  Python - How can I check if a CSV file has a comma or a semicolon as a separator?
Python - How can I check if a CSV file has a comma or a semicolon as a separator?

Time:02-12

I have a bunch of CSV files that I would like to read with Python Pandas. Some of them have a comma as a delimiter, hence I use the following command to read them:

import pandas as pd
df = pd.read_csv('file_with_commas.csv')

However I have others CSVs that have a semicolon as a delimiter. Hence, since the default separator is the comma, I now need to specify it and therefore use the following command:

import pandas as pd
df = pd.read_csv('file_with_semicolons.csv', sep=';')

I would like to write a piece of code that recognises if the CSV file has a comma or a semicolon as a delimiter (before I read it) so that I do not have to change the code every time. How can this be done?

Note: I have checked this similar question on Stack Overflow but it is helpless since it is applicable to R, rather than Python.

CodePudding user response:

Say that you would like to read an arbitrary CSV, named input.csv, and you do not know whether the separator is a comma or a semicolon.

You could open your file using the csv module. The Sniffer class is then used to deduce its format, like in the following code:

import csv
with open(input.csv, newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read())

For this module, the dialect class is a container class whose attributes contain information for how to handle delimiters (among other things like doublequotes, whitespaces, etc). You can check the delimiter attribute by using the following code:

print(dialect.delimiter)
# This will be either a comma or a semicolon, depending on what the input is

Therefore, in order to do a smart CSV reading, you could use something like the following:

if dialect.delimiter == ',':
    df = pd.read_csv(input.csv)            # Import the csv with a comma as the separator
elif dialect.delimiter == ';':
    df = pd.read_csv(input.csv, sep=';')   # Import the csv with a semicolon as the separator

More information can be found here.

CodePudding user response:

Use sep=None

df = pd.read_csv('some_file.csv', sep=None)

From the docs

sep str, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s ' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'

.

  • Related