Home > Back-end >  Processing .log file with Pandas with Dictionaries and Lists to make Dataframe?
Processing .log file with Pandas with Dictionaries and Lists to make Dataframe?

Time:04-16

Can I convert this file to a pandas dataframe? The file has an extension of .log and it has a lot of rows of this line(don't mind the values):

{
    "asdasd":"1831a-12123",
    "id1":"23x.abc212.4566",
    "id2":"456a.2412.16348x5_def",
    "id3":"sdaw-p-2323",
    "abcd":"xyz",
    "asdsadas":"\"sdasdsad\": sadasd",
    "xasda":0.8,
    "id4":"409cc2e",
    "dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
    "zxczxcz":["xczczxc"],
    "xczczxcz":"dqwdqwd",
    "dadsdsd":["sdsd"],
    "asdasdasdaxcz":true,
    "xczxczxczxc":"sdadsa.xcxcxc.ab",
    "bgfbgb":["dsvsdvsdv"],
    "cascasas":["asxsaasx"],
    "xsxasxas":[],
    "xasxasxas":"wewewe",
    "sdasdasd":"xzczxc",
    "id5":"VB 9",
    "id6":"5134132451",
    "id7":"8989898",
    "sdasdasdsadsa":[],
    "xcascxassaxa":1234,
    "sadasdadasdsad":4567
}

This is wrong

import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns

CodePudding user response:

logs = []
for line in open('/Users/sadsad.log', 'r'):
    logs.append(json.loads(line))

df = pd.DataFrame(logs)

df.sample(n=50)

I created a dataframe. For dictionary issue, i am looking at : Split / Explode a column of dictionaries into separate columns with pandas

CodePudding user response:

You can open the file and convert it to python dictionary with json.loads then read it with pd.DataFrame

import json
import pandas as pd

with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
    data = json.loads(f.read())

df = pd.DataFrame([data])
print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4  \
0  "sdasdsad": sadasd    0.8  409cc2e

                                                             dictionary  \
0  {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}

     zxczxcz xczczxcz dadsdsd  asdasdasdaxcz       xczxczxczxc       bgfbgb  \
0  [xczczxc]  dqwdqwd  [sdsd]           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]

     cascasas xsxasxas xasxasxas sdasdasd   id5         id6      id7  \
0  [asxsaasx]       []    wewewe   xzczxc  VB 9  5134132451  8989898

  sdasdasdsadsa  xcascxassaxa  sadasdadasdsad
0            []          1234            4567

There are nested dictionaries in your data, you can also try pd.json_normalize

df = pd.json_normalize(data)
print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4    zxczxcz xczczxcz dadsdsd  \
0  "sdasdsad": sadasd    0.8  409cc2e  [xczczxc]  dqwdqwd  [sdsd]

   asdasdasdaxcz       xczxczxczxc       bgfbgb    cascasas xsxasxas  \
0           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]  [asxsaasx]       []

  xasxasxas sdasdasd   id5         id6      id7 sdasdasdsadsa  xcascxassaxa  \
0    wewewe   xzczxc  VB 9  5134132451  8989898            []          1234

   sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0            4567             xdasd             asdsa        sdsdsadas.xyz
  • Related