Can I convert this file to a pandas dataframe? The file has an extension of .log and it has a lot of rows of this line(don't mind the values):
{
"asdasd":"1831a-12123",
"id1":"23x.abc212.4566",
"id2":"456a.2412.16348x5_def",
"id3":"sdaw-p-2323",
"abcd":"xyz",
"asdsadas":"\"sdasdsad\": sadasd",
"xasda":0.8,
"id4":"409cc2e",
"dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
"zxczxcz":["xczczxc"],
"xczczxcz":"dqwdqwd",
"dadsdsd":["sdsd"],
"asdasdasdaxcz":true,
"xczxczxczxc":"sdadsa.xcxcxc.ab",
"bgfbgb":["dsvsdvsdv"],
"cascasas":["asxsaasx"],
"xsxasxas":[],
"xasxasxas":"wewewe",
"sdasdasd":"xzczxc",
"id5":"VB 9",
"id6":"5134132451",
"id7":"8989898",
"sdasdasdsadsa":[],
"xcascxassaxa":1234,
"sadasdadasdsad":4567
}
This is wrong
import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns
CodePudding user response:
logs = []
for line in open('/Users/sadsad.log', 'r'):
logs.append(json.loads(line))
df = pd.DataFrame(logs)
df.sample(n=50)
I created a dataframe. For dictionary issue, i am looking at : Split / Explode a column of dictionaries into separate columns with pandas
CodePudding user response:
You can open the file and convert it to python dictionary with json.loads
then read it with pd.DataFrame
import json
import pandas as pd
with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.DataFrame([data])
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 \
0 "sdasdsad": sadasd 0.8 409cc2e
dictionary \
0 {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}
zxczxcz xczczxcz dadsdsd asdasdasdaxcz xczxczxczxc bgfbgb \
0 [xczczxc] dqwdqwd [sdsd] True sdadsa.xcxcxc.ab [dsvsdvsdv]
cascasas xsxasxas xasxasxas sdasdasd id5 id6 id7 \
0 [asxsaasx] [] wewewe xzczxc VB 9 5134132451 8989898
sdasdasdsadsa xcascxassaxa sadasdadasdsad
0 [] 1234 4567
There are nested dictionaries in your data, you can also try pd.json_normalize
df = pd.json_normalize(data)
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 zxczxcz xczczxcz dadsdsd \
0 "sdasdsad": sadasd 0.8 409cc2e [xczczxc] dqwdqwd [sdsd]
asdasdasdaxcz xczxczxczxc bgfbgb cascasas xsxasxas \
0 True sdadsa.xcxcxc.ab [dsvsdvsdv] [asxsaasx] []
xasxasxas sdasdasd id5 id6 id7 sdasdasdsadsa xcascxassaxa \
0 wewewe xzczxc VB 9 5134132451 8989898 [] 1234
sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0 4567 xdasd asdsa sdsdsadas.xyz