Convert text file into dictionary with multiple keys-CodePudding

I'm trying to read in a text file formatted like the following:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

into a Python dictionary. Trying to have the end result look like:

{0:{'student':{'first name': 'John',
    'last name': 'Doe',
    'grade': '9',
    'gpa': '4.0'},
    "school": {'name': 'Richard High School',
               'city': 'Kansas City'},
1:{'student':{'first name': 'Jane',
    'last name': 'Doe',
    'grade': '10',
    'gpa': '3.0'},
    'school': {'name': 'Richard High School',
               'city': 'Kansas City'}
}

So far, I know how to handle the inner keys with:

with open('<filename>') as f:
    dict = {}
    for line in f:
        x, y = line.split(": ")
        dict[x] = y
    print(dict)

But beyond that I'm stuck.

CodePudding user response：

import re
temp = 0
data = {temp:{}}
with open('txt.txt') as f:
    for line in f:
        if len(line.strip()) == 0:
            continue
        if re.match("^[^:]*:.*$", line):
            key, value = line.split(':', 1)
            data[temp][main_key][key.strip()] = value.strip()
        elif re.match("^[^\#]*$", line):
            main_key = line.strip()
            if main_key in (data[temp].keys()):
                temp  = 1
                data[temp] = {}
            data[temp][main_key] = {}

if i realized your target correctly, this is answer. but be careful, it is based on regex and you can now more about it in regex101.com

in fist if, i scape lines that are somthing like " " and full of empty! (break lines) in second, i check that if line format is like "key: value", if not, so it is main key and I add it in main dict and else, i add it in my last dict in main dict

CodePudding user response：

If your data are patterned exactly as you have written, and you don't mind having flat dictionaries, one per student:

pattern = re.compile(r"""
student
    first name: (?P<first_name>.*)
    last name: (?P<last_name>.*)
    grade: (?P<grade>\d*)
    gpa: (?P<gpa>\d .?\d*)
school
    name: (?P<school>.*)
    city: (?P<city>.*)""".strip())

with open(<filename>, "r") as f:
    data = f.read()

students = [match.groupdict() for match in pattern.finditer(data)]

Output:

[{'first_name': 'John',
  'last_name': 'Doe',
  'grade': '9',
  'gpa': '4.0',
  'school': 'Richard High School',
  'city': 'Kansas City'},
 {'first_name': 'Jane',
  'last_name': 'Doe',
  'grade': '10',
  'gpa': '3.0',
  'school': 'Richard High School',
  'city': 'Kansas City'}]

I don't see the benefit of your desired data structure, hence my suggestion for something more conducive to tabular data analysis.

EDIT: now that we're talking about Pandas,

In [4]: df = pd.DataFrame(students)

In [5]: df
Out[5]:
  first_name last_name grade  gpa               school         city
0       John       Doe     9  4.0  Richard High School  Kansas City
1       Jane       Doe    10  3.0  Richard High School  Kansas City

Getting the count of students in each grade:

In [6]: df.groupby("grade").size()
Out[6]:
grade
10    1
9     1
dtype: int64

You can also group by any number of columns, for instance by grade and school:

In [7]: df.groupby(["grade", "school"]).size()
Out[7]:
grade  school
10     Richard High School    1
9      Richard High School    1
dtype: int64

CodePudding user response：

You could do it like this but bear in mind that this method is very specific to the input and output as defined in the original question:

d = dict()
k = 0
with open('foo.txt') as infile:
    for line in map(str.strip, infile):
        if len(line) > 0:
            match line:
                case 'student':
                    td = dict()
                    d[k] = {line: td}
                    k  = 1
                case 'school':
                    td[line] = dict()
                    td = td[line]
                case _:
                    k_, *v = line.split(':')
                    if v:
                        td[k_] = v[0].strip()

print(d)

Output:

{0: {'student': {'first name': 'John', 'last name': 'Doe', 'grade': '9', 'gpa': '4.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}, 1: {'student': {'first name': 'Jane', 'last name': 'Doe', 'grade': '10', 'gpa': '3.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}}

CodePudding user response：

That's a possible solution:

import re

file = open("a.txt")
dictionaryMain = {}
dictionaryElement = {}
dictionaryStudent = {}
dictionarySchool = {}


text = file.read()
elements = text.split("####")

i = 0
for element in elements:
    firstName = re.search('first name: (. )', text).group(1)
    lastName = re.search('last name: (. )', text).group(1)
    grade = re.search('grade: (. )', text).group(1)
    gpa = re.search('gpa: (. )', text).group(1)
    name = re.search('name: (. )', text).group(1)
    city = re.search('city: (. )', text).group(1)
    dictionaryStudent['first name'] = firstName
    dictionaryStudent['last name'] = lastName
    dictionaryStudent['grade'] = grade
    dictionaryStudent['gpa'] = gpa
    dictionarySchool['name'] = name
    dictionarySchool['city'] = city
    dictionaryElement['student'] = dictionaryStudent
    dictionaryElement['school'] = dictionarySchool
    i = i 1
    dictionaryMain[i] = dictionaryElement

print(dictionaryMain)

Input file:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

Output:

{
  1: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  2: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  3: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  }
}

I do not exactly know what your use-case is, but you should really think about using data-classes if you have such a strict format.