Home > Blockchain >  Convert text file into dictionary with multiple keys
Convert text file into dictionary with multiple keys

Time:04-04

I'm trying to read in a text file formatted like the following:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

into a Python dictionary. Trying to have the end result look like:

{0:{'student':{'first name': 'John',
    'last name': 'Doe',
    'grade': '9',
    'gpa': '4.0'},
    "school": {'name': 'Richard High School',
               'city': 'Kansas City'},
1:{'student':{'first name': 'Jane',
    'last name': 'Doe',
    'grade': '10',
    'gpa': '3.0'},
    'school': {'name': 'Richard High School',
               'city': 'Kansas City'}
}

So far, I know how to handle the inner keys with:

with open('<filename>') as f:
    dict = {}
    for line in f:
        x, y = line.split(": ")
        dict[x] = y
    print(dict)

But beyond that I'm stuck.

CodePudding user response:

import re
temp = 0
data = {temp:{}}
with open('txt.txt') as f:
    for line in f:
        if len(line.strip()) == 0:
            continue
        if re.match("^[^:]*:.*$", line):
            key, value = line.split(':', 1)
            data[temp][main_key][key.strip()] = value.strip()
        elif re.match("^[^\#]*$", line):
            main_key = line.strip()
            if main_key in (data[temp].keys()):
                temp  = 1
                data[temp] = {}
            data[temp][main_key] = {}

if i realized your target correctly, this is answer. but be careful, it is based on regex and you can now more about it in regex101.com

in fist if, i scape lines that are somthing like " " and full of empty! (break lines) in second, i check that if line format is like "key: value", if not, so it is main key and I add it in main dict and else, i add it in my last dict in main dict

CodePudding user response:

If your data are patterned exactly as you have written, and you don't mind having flat dictionaries, one per student:

pattern = re.compile(r"""
student
    first name: (?P<first_name>.*)
    last name: (?P<last_name>.*)
    grade: (?P<grade>\d*)
    gpa: (?P<gpa>\d .?\d*)
school
    name: (?P<school>.*)
    city: (?P<city>.*)""".strip())

with open(<filename>, "r") as f:
    data = f.read()

students = [match.groupdict() for match in pattern.finditer(data)]

Output:

[{'first_name': 'John',
  'last_name': 'Doe',
  'grade': '9',
  'gpa': '4.0',
  'school': 'Richard High School',
  'city': 'Kansas City'},
 {'first_name': 'Jane',
  'last_name': 'Doe',
  'grade': '10',
  'gpa': '3.0',
  'school': 'Richard High School',
  'city': 'Kansas City'}]

I don't see the benefit of your desired data structure, hence my suggestion for something more conducive to tabular data analysis.

EDIT: now that we're talking about Pandas,

In [4]: df = pd.DataFrame(students)

In [5]: df
Out[5]:
  first_name last_name grade  gpa               school         city
0       John       Doe     9  4.0  Richard High School  Kansas City
1       Jane       Doe    10  3.0  Richard High School  Kansas City

Getting the count of students in each grade:

In [6]: df.groupby("grade").size()
Out[6]:
grade
10    1
9     1
dtype: int64

You can also group by any number of columns, for instance by grade and school:

In [7]: df.groupby(["grade", "school"]).size()
Out[7]:
grade  school
10     Richard High School    1
9      Richard High School    1
dtype: int64

CodePudding user response:

You could do it like this but bear in mind that this method is very specific to the input and output as defined in the original question:

d = dict()
k = 0
with open('foo.txt') as infile:
    for line in map(str.strip, infile):
        if len(line) > 0:
            match line:
                case 'student':
                    td = dict()
                    d[k] = {line: td}
                    k  = 1
                case 'school':
                    td[line] = dict()
                    td = td[line]
                case _:
                    k_, *v = line.split(':')
                    if v:
                        td[k_] = v[0].strip()

print(d)

Output:

{0: {'student': {'first name': 'John', 'last name': 'Doe', 'grade': '9', 'gpa': '4.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}, 1: {'student': {'first name': 'Jane', 'last name': 'Doe', 'grade': '10', 'gpa': '3.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}}

CodePudding user response:

That's a possible solution:

import re

file = open("a.txt")
dictionaryMain = {}
dictionaryElement = {}
dictionaryStudent = {}
dictionarySchool = {}


text = file.read()
elements = text.split("####")

i = 0
for element in elements:
    firstName = re.search('first name: (. )', text).group(1)
    lastName = re.search('last name: (. )', text).group(1)
    grade = re.search('grade: (. )', text).group(1)
    gpa = re.search('gpa: (. )', text).group(1)
    name = re.search('name: (. )', text).group(1)
    city = re.search('city: (. )', text).group(1)
    dictionaryStudent['first name'] = firstName
    dictionaryStudent['last name'] = lastName
    dictionaryStudent['grade'] = grade
    dictionaryStudent['gpa'] = gpa
    dictionarySchool['name'] = name
    dictionarySchool['city'] = city
    dictionaryElement['student'] = dictionaryStudent
    dictionaryElement['school'] = dictionarySchool
    i = i 1
    dictionaryMain[i] = dictionaryElement

print(dictionaryMain)

Input file:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

Output:

{
  1: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  2: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  3: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  }
}

I do not exactly know what your use-case is, but you should really think about using data-classes if you have such a strict format.

  • Related