I'm trying to read in a text file formatted like the following:
student
first name: John
last name: Doe
grade: 9
gpa: 4.0
school
name: Richard High School
city: Kansas City
####
student
first name: Jane
last name: Doe
grade: 10
gpa: 3.0
school
name: Richard High School
city: Kansas City
into a Python dictionary. Trying to have the end result look like:
{0:{'student':{'first name': 'John',
'last name': 'Doe',
'grade': '9',
'gpa': '4.0'},
"school": {'name': 'Richard High School',
'city': 'Kansas City'},
1:{'student':{'first name': 'Jane',
'last name': 'Doe',
'grade': '10',
'gpa': '3.0'},
'school': {'name': 'Richard High School',
'city': 'Kansas City'}
}
So far, I know how to handle the inner keys with:
with open('<filename>') as f:
dict = {}
for line in f:
x, y = line.split(": ")
dict[x] = y
print(dict)
But beyond that I'm stuck.
CodePudding user response:
import re
temp = 0
data = {temp:{}}
with open('txt.txt') as f:
for line in f:
if len(line.strip()) == 0:
continue
if re.match("^[^:]*:.*$", line):
key, value = line.split(':', 1)
data[temp][main_key][key.strip()] = value.strip()
elif re.match("^[^\#]*$", line):
main_key = line.strip()
if main_key in (data[temp].keys()):
temp = 1
data[temp] = {}
data[temp][main_key] = {}
if i realized your target correctly, this is answer. but be careful, it is based on regex and you can now more about it in regex101.com
in fist if, i scape lines that are somthing like " " and full of empty! (break lines) in second, i check that if line format is like "key: value", if not, so it is main key and I add it in main dict and else, i add it in my last dict in main dict
CodePudding user response:
If your data are patterned exactly as you have written, and you don't mind having flat dictionaries, one per student:
pattern = re.compile(r"""
student
first name: (?P<first_name>.*)
last name: (?P<last_name>.*)
grade: (?P<grade>\d*)
gpa: (?P<gpa>\d .?\d*)
school
name: (?P<school>.*)
city: (?P<city>.*)""".strip())
with open(<filename>, "r") as f:
data = f.read()
students = [match.groupdict() for match in pattern.finditer(data)]
Output:
[{'first_name': 'John',
'last_name': 'Doe',
'grade': '9',
'gpa': '4.0',
'school': 'Richard High School',
'city': 'Kansas City'},
{'first_name': 'Jane',
'last_name': 'Doe',
'grade': '10',
'gpa': '3.0',
'school': 'Richard High School',
'city': 'Kansas City'}]
I don't see the benefit of your desired data structure, hence my suggestion for something more conducive to tabular data analysis.
EDIT: now that we're talking about Pandas,
In [4]: df = pd.DataFrame(students)
In [5]: df
Out[5]:
first_name last_name grade gpa school city
0 John Doe 9 4.0 Richard High School Kansas City
1 Jane Doe 10 3.0 Richard High School Kansas City
Getting the count of students in each grade:
In [6]: df.groupby("grade").size()
Out[6]:
grade
10 1
9 1
dtype: int64
You can also group by any number of columns, for instance by grade and school:
In [7]: df.groupby(["grade", "school"]).size()
Out[7]:
grade school
10 Richard High School 1
9 Richard High School 1
dtype: int64
CodePudding user response:
You could do it like this but bear in mind that this method is very specific to the input and output as defined in the original question:
d = dict()
k = 0
with open('foo.txt') as infile:
for line in map(str.strip, infile):
if len(line) > 0:
match line:
case 'student':
td = dict()
d[k] = {line: td}
k = 1
case 'school':
td[line] = dict()
td = td[line]
case _:
k_, *v = line.split(':')
if v:
td[k_] = v[0].strip()
print(d)
Output:
{0: {'student': {'first name': 'John', 'last name': 'Doe', 'grade': '9', 'gpa': '4.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}, 1: {'student': {'first name': 'Jane', 'last name': 'Doe', 'grade': '10', 'gpa': '3.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}}
CodePudding user response:
That's a possible solution:
import re
file = open("a.txt")
dictionaryMain = {}
dictionaryElement = {}
dictionaryStudent = {}
dictionarySchool = {}
text = file.read()
elements = text.split("####")
i = 0
for element in elements:
firstName = re.search('first name: (. )', text).group(1)
lastName = re.search('last name: (. )', text).group(1)
grade = re.search('grade: (. )', text).group(1)
gpa = re.search('gpa: (. )', text).group(1)
name = re.search('name: (. )', text).group(1)
city = re.search('city: (. )', text).group(1)
dictionaryStudent['first name'] = firstName
dictionaryStudent['last name'] = lastName
dictionaryStudent['grade'] = grade
dictionaryStudent['gpa'] = gpa
dictionarySchool['name'] = name
dictionarySchool['city'] = city
dictionaryElement['student'] = dictionaryStudent
dictionaryElement['school'] = dictionarySchool
i = i 1
dictionaryMain[i] = dictionaryElement
print(dictionaryMain)
Input file:
student
first name: John
last name: Doe
grade: 9
gpa: 4.0
school
name: Richard High School
city: Kansas City
####
student
first name: Jane
last name: Doe
grade: 10
gpa: 3.0
school
name: Richard High School
city: Kansas City
####
student
first name: Jane
last name: Doe
grade: 10
gpa: 3.0
school
name: Richard High School
city: Kansas City
Output:
{
1: {
'student': {
'first name': 'John',
'last name': 'Doe',
'grade': '9',
'gpa': '4.0'
},
'school': {
'name': 'John',
'city': 'Kansas City'
}
},
2: {
'student': {
'first name': 'John',
'last name': 'Doe',
'grade': '9',
'gpa': '4.0'
},
'school': {
'name': 'John',
'city': 'Kansas City'
}
},
3: {
'student': {
'first name': 'John',
'last name': 'Doe',
'grade': '9',
'gpa': '4.0'
},
'school': {
'name': 'John',
'city': 'Kansas City'
}
}
}
I do not exactly know what your use-case is, but you should really think about using data-classes if you have such a strict format.