Home > database >  Python - importing a txt.file seperated by commas and whitespaces
Python - importing a txt.file seperated by commas and whitespaces

Time:07-09

I need to import a .txt file with some statistics about weather. The values however are seperated by a comma followed by three spaces. When I try to remove this by adding sep=" " or ", " I get an error.

from tkinter.ttk import Separator
import pandas as pd

# Import dataset
df = pd.read_csv("etmgeg_235.txt")

# Drop eventual null values
df.isnull().sum()
df.dropna

#Show correlations
cr = df.corr()
print(cr)
'

The program "works" when importing the .txt file, but then I get one correlation with NaN and one with a value of 1.0.

The dataset looks like this: "235,19060101, 113, 67, 67, 87, 12, 51, 1, , , -28, etc...." with a few more whitespaces between them. How do I import this dataset correctly?

CodePudding user response:

Use pd.read_csv with engine='python' to set a regex separator. Something like:

df = pd.read_csv('data.csv', sep=r',\s*', engine='python')

CodePudding user response:

pandas read_csv allows you to use regex, so something like

df = pd.read_csv("etmgeg_235.txt", sep="[,\s] ", engine="python")

should work. Note that you will have to use the python engine to use regex.

  • Related