Processing .log file with Pandas with Dictionaries and Lists to make Dataframe?-CodePudding

Can I convert this file to a pandas dataframe? The file has an extension of .log and it has a lot of rows of this line(don't mind the values):

{
    "asdasd":"1831a-12123",
    "id1":"23x.abc212.4566",
    "id2":"456a.2412.16348x5_def",
    "id3":"sdaw-p-2323",
    "abcd":"xyz",
    "asdsadas":"\"sdasdsad\": sadasd",
    "xasda":0.8,
    "id4":"409cc2e",
    "dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
    "zxczxcz":["xczczxc"],
    "xczczxcz":"dqwdqwd",
    "dadsdsd":["sdsd"],
    "asdasdasdaxcz":true,
    "xczxczxczxc":"sdadsa.xcxcxc.ab",
    "bgfbgb":["dsvsdvsdv"],
    "cascasas":["asxsaasx"],
    "xsxasxas":[],
    "xasxasxas":"wewewe",
    "sdasdasd":"xzczxc",
    "id5":"VB 9",
    "id6":"5134132451",
    "id7":"8989898",
    "sdasdasdsadsa":[],
    "xcascxassaxa":1234,
    "sadasdadasdsad":4567
}

This is wrong

import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns

CodePudding user response：

logs = []
for line in open('/Users/sadsad.log', 'r'):
    logs.append(json.loads(line))

df = pd.DataFrame(logs)

df.sample(n=50)

I created a dataframe. For dictionary issue, i am looking at : Split / Explode a column of dictionaries into separate columns with pandas

CodePudding user response：

You can open the file and convert it to python dictionary with json.loads then read it with pd.DataFrame

import json
import pandas as pd

with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
    data = json.loads(f.read())

df = pd.DataFrame([data])

print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4  \
0  "sdasdsad": sadasd    0.8  409cc2e

                                                             dictionary  \
0  {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}

     zxczxcz xczczxcz dadsdsd  asdasdasdaxcz       xczxczxczxc       bgfbgb  \
0  [xczczxc]  dqwdqwd  [sdsd]           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]

     cascasas xsxasxas xasxasxas sdasdasd   id5         id6      id7  \
0  [asxsaasx]       []    wewewe   xzczxc  VB 9  5134132451  8989898

  sdasdasdsadsa  xcascxassaxa  sadasdadasdsad
0            []          1234            4567

There are nested dictionaries in your data, you can also try pd.json_normalize

df = pd.json_normalize(data)

print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4    zxczxcz xczczxcz dadsdsd  \
0  "sdasdsad": sadasd    0.8  409cc2e  [xczczxc]  dqwdqwd  [sdsd]

   asdasdasdaxcz       xczxczxczxc       bgfbgb    cascasas xsxasxas  \
0           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]  [asxsaasx]       []

  xasxasxas sdasdasd   id5         id6      id7 sdasdasdsadsa  xcascxassaxa  \
0    wewewe   xzczxc  VB 9  5134132451  8989898            []          1234

   sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0            4567             xdasd             asdsa        sdsdsadas.xyz