Home > Enterprise >  I get an error about wrong dictionary update sequence length when trying to read lines from txt file
I get an error about wrong dictionary update sequence length when trying to read lines from txt file

Time:10-26

I'm trying to loop through multiple lines and add that into a dictionary, then a dataframe.

I've had many attempts but no solution yet.

I have a txt file with multiple lines like this for example, and I'm trying to iterate through each line, add it to a dictionary and then append the dictionary to a dataframe.

So textfile for example would go from here:

ABC=123, DEF="456", 
ABC="789", DEF="101112"

I would like this be added to a dictionary like this (on the first loop, for the first line):

{ABC:123,DEF=456}

and then appended to a df like this

   ABC   DEF
 0 123   456
 1 789   101112

So far I have tried this, this only works for one line in the text file, when I add a new line, I get this error:

dictionary update sequence element #6 has length 3; 2 is required

with open("file.txt", "r") as f:
    s = f.read().strip()
    dictionary = dict(subString.split("=") for subString in s.split(","))
    dataframe = dataframe.append(dictionary, ignore_index=True)
dataframe

CodePudding user response:

One suggestion is to parse each line with regex, and then insert the matches (if found) into the dictionary. You can change the regex pattern as needed, but this one matches words on the left side of = with numbers on the right which start with ' or ".

import re
import pandas as pd

pattern = r'(\w )=[\'\"]?(\d )'

str_dict = {}
with open('file.txt') as f:
for line in f:
    for key, val in re.findall(pattern, line):
        str_dict.setdefault(key, []).append(int(val))

df = pd.DataFrame(str_dict)

This is how I chose the regex pattern

CodePudding user response:

This also works in the scenario of a huge text file with many different strings:


    import re
    file= open('event.txt', 'r').readlines()
    
  
    for group in file:
        output1 = group.replace('Event time', 'Event_time')
        words = re.findall(r'".*?"', str(output1))
        for word in words:
            text = str(output1).replace(word, word.replace(" ", "_"))
        output2 = text.strip().split(' ')
        for section in output2:
            key,val = section.strip().split('=')
            data_dict[key.strip()] = val.strip()
        df = df.append(data_dict, ignore_index=True)
    df
        
  • Related