I need to import a .txt file with some statistics about weather. The values however are seperated by a comma followed by three spaces. When I try to remove this by adding sep=" " or ", " I get an error.
from tkinter.ttk import Separator
import pandas as pd
# Import dataset
df = pd.read_csv("etmgeg_235.txt")
# Drop eventual null values
df.isnull().sum()
df.dropna
#Show correlations
cr = df.corr()
print(cr)
'
The program "works" when importing the .txt file, but then I get one correlation with NaN and one with a value of 1.0.
The dataset looks like this: "235,19060101, 113, 67, 67, 87, 12, 51, 1, , , -28, etc...." with a few more whitespaces between them. How do I import this dataset correctly?
CodePudding user response:
Use pd.read_csv
with engine='python'
to set a regex separator. Something like:
df = pd.read_csv('data.csv', sep=r',\s*', engine='python')
CodePudding user response:
pandas read_csv
allows you to use regex, so something like
df = pd.read_csv("etmgeg_235.txt", sep="[,\s] ", engine="python")
should work. Note that you will have to use the python
engine to use regex.