I am trying to read in a text file. The file contains among others the following input:
DE 01945 Ruhland Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4576 13.8664 4
DE 01945 Tettau Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4333 13.7333 4
DE 01945 Grünewald Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4 14 4
DE 01945 Guteborn Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4167 13.9333 4
DE 01945 Kroppen Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.3833 13.8 4
DE 01945 Schwarzbach Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.45 13.9333 4
DE 01945 Hohenbocka Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.431 14.0098 4
DE 01945 Lindenau Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4 13.7333 4
DE 01945 Hermsdorf Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4055 13.8937 4
DE 01968 Senftenberg Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.5252 14.0016 4
DE 01968 Schipkau Hörlitz Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.5299 13.9508
DE 01968 Schipkau Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.5456 13.9121 4
DE 01979 Lauchhammer Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 51.4881 13.7662 4
My code looks like this.
import pandas as pd
data = pd.read_csv('DE.txt', sep=" ", header=None)
Currently I am getting the following error that I can't get past:
ParserError: Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
I think this is due to the two-part city name, how can I read the text file correctly?
CodePudding user response:
You have to read the file normally and parse everything to a dictionary and then create the dataframe.
import pandas as pd
file = open("DE.txt", "r")
lines = file.readlines()
dict = {}
for line in lines:
//Create your own dictionary as you want to be created using the value in each line and store it in dict
df = pd.DataFrame(data=dict)
Or you can create a 2 dimensional list instead of a dictionary, if this is easier for you, and create the dataframe in the same way.