Read .txt as dataframe with pandas-CodePudding

I am trying to read in a text file. The file contains among others the following input:

DE  01945   Ruhland Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4576 13.8664 4
DE  01945   Tettau  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4333 13.7333 4
DE  01945   Grünewald   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    14  4
DE  01945   Guteborn    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4167 13.9333 4
DE  01945   Kroppen Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.3833 13.8    4
DE  01945   Schwarzbach Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.45   13.9333 4
DE  01945   Hohenbocka  Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.431  14.0098 4
DE  01945   Lindenau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4    13.7333 4
DE  01945   Hermsdorf   Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4055 13.8937 4
DE  01968   Senftenberg Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5252 14.0016 4
DE  01968   Schipkau Hörlitz    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5299 13.9508 
DE  01968   Schipkau    Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.5456 13.9121 4
DE  01979   Lauchhammer Brandenburg BB      00  Landkreis Oberspreewald-Lausitz 12066   51.4881 13.7662 4

My code looks like this.

import pandas as pd

data = pd.read_csv('DE.txt', sep=" ", header=None)

Currently I am getting the following error that I can't get past:

ParserError: Error tokenizing data. C error: Expected 2 fields in line 11, saw 3

I think this is due to the two-part city name, how can I read the text file correctly?

CodePudding user response：

You have to read the file normally and parse everything to a dictionary and then create the dataframe.

import pandas as pd

file = open("DE.txt", "r")
lines = file.readlines()
dict = {}
for line in lines:
    //Create your own dictionary as you want to be created using the value in each line and store it in dict
df = pd.DataFrame(data=dict)

Or you can create a 2 dimensional list instead of a dictionary, if this is easier for you, and create the dataframe in the same way.