I am trying to access JSON data but getting the above error. My code is:
with open(filepath "decompressed_twitter_lot1file1.txt", 'rb') as fh:
for line in fh:
object = json.loads(line)
urls_in_tweet = object['entities']['urls']
domains_in_tweet = []
print(urls_in_tweet)
for url in urls_in_tweet:
for key, value in url.items():
print(key,value)
domain = tldextract.extract(value).registered_domain
print("domain")
My output:
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[{
'display_url': 'msnbc.com/rachel-maddow/…',
'indices': [67, 90],
'expanded_url': '//www.msnbc.com/rachel-maddow/watch/trump-admin-coverage-maxim-watch-what-they-do-not-what-they-say-66934341943',
'url': '//t/zHmMchTCIf'
}]
display_url msnbc.com / rachel - maddow / …
domain
indices[67, 90]
After this, I get this error.I dont understand why after indices key it is not printing anything.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-90-cb728568a4f1> in <module>
8 for key, value in url.items():
9 print(key,value)
---> 10 domain = tldextract.extract(value).registered_domain
11 print("domain")
~/.local/lib/python3.8/site-packages/tldextract/tldextract.py in extract(url, include_psl_private_domains)
294 url, include_psl_private_domains=False
295 ): # pylint: disable=missing-function-docstring
--> 296 return TLD_EXTRACTOR(url, include_psl_private_domains=include_psl_private_domains)
297
298
~/.local/lib/python3.8/site-packages/tldextract/tldextract.py in __call__(self, url, include_psl_private_domains)
214
215 netloc = (
--> 216 SCHEME_RE.sub("", url)
217 .partition("/")[0]
218 .partition("?")[0]
TypeError: expected string or bytes-like object
This is data is small part of Twitter API data. How can I access every key-value pair of this JSON data and load value in domain list?
CodePudding user response:
json.loads expects a str hence the error
If you want to get the key-value pairs you can do this:
fs = [{'display_url': 'eonli.ne/33XF5V1', 'indices': [90, 113], 'expanded_url': 'eonli.ne/33XF5V1', 'url': 't.co/flhUdZcUzB'}]
for k,v in fs[0].items():
print(f"{k}, {v}")
fs[0] is a dictionary, get the items with items()
There is not need for json.loads here