Home > other >  Crawl data AttributeError emoticons cleaning: 'STR' object has no attribute 'xpath�
Crawl data AttributeError emoticons cleaning: 'STR' object has no attribute 'xpath�

Time:01-09

This is a great god code, run time error
 # noinspection PyRedeclaration 
Def parse_tweet_item (self, the items) :
For it in the items:
Try:
Tweet tweet=()
Tweet [' usernameTweet]=it. Xpath ('.//span/@/b/text () ')

ID=it. Xpath ('.//@ data - tweet - ID '). The extract ()
If not ID:
The continue
Tweet [' ID '] [0]=ID

# get text content
Tweet [' text ']='. Join (
It. Xpath ('.//div/@/p//text () '). The extract ()). The replace (' # ',
The '#'). The replace (
'@', '@')

# the clear data [20200416]
# tweet [' text ']=re. The sub (r "[\ s + \ \! \/_ $% ^ * (+ \ '\')] + | [+ -? []?... ~ @ # $% & amp; *] + | + + | | \ \ r \ \ n (\ \ xa0) + | | \ \ (\ \ u3000) + t", "", tweet [' text ']);

# filter emoticons [20200417]
Tweet [' text ']=filter_emoji (tweets [' text '], ' ')

If a tweet [' text ']==':
# If there is not text, we ignore the tweet
The continue

# get meta data
Tweet [' url ']=it. Xpath ('.//@ data - permalink - path '). The extract () [0]

Nbr_retweet=it. CSS (' span. ProfileTweet - action -- -- retweet & gt; Span. ProfileTweet - actionCount '). The xpath (
'@ data - tweet - stat - count') extract ()
If nbr_retweet:
Tweet [' nbr_retweet]=int (nbr_retweet [0])
The else:
Tweet [' nbr_retweet]=0

Nbr_favorite=it. CSS (' span. ProfileTweet - action -- -- favorite & gt; Span. ProfileTweet - actionCount '). The xpath (
'@ data - tweet - stat - count') extract ()
If nbr_favorite:
Tweet [' nbr_favorite]=int (nbr_favorite [0])
The else:
Tweet [' nbr_favorite]=0

Nbr_reply=it. CSS (' span. ProfileTweet - action -- -- reply & gt; Span. ProfileTweet - actionCount '). The xpath (
'@ data - tweet - stat - count') extract ()
If nbr_reply:
Tweet [' nbr_reply]=int (nbr_reply [0])
The else:
Tweet [' nbr_reply]=0

Tweet [' datetime]=datetime. Fromtimestamp (int (
It. Xpath ('.//div/@/small/@/a/span/@ data - time '). The extract () [
0]). Strftime (' % % Y - m - H: % d % % m: % S ')

# get photo
Has_cards=it. Xpath ('.//@ data - card - type '). The extract ()
If has_cards and has_cards [0]=='photo' :
Tweet [' has_image]=True
Tweet [' images']=it. Xpath ('.//*/div/@ data - image - url '). The extract ()
Elif has_cards:
Logger. The debug (' Not handle the data - "card" -type: \ n % s' % it. The xpath ('. '). The extract () [0])

# get animated_gif
Has_cards=it. Xpath ('.//@ data - card2 - type '). The extract ()
If has_cards:
If has_cards [0]=='animated_gif:
Tweet [' has_video]=True
Tweet [' videos']=it. Xpath ('.//*/source/@ video - SRC '). The extract ()
Elif has_cards [0]=='player' :
Tweet [' has_media]=True
Tweet [' medias']=it. Xpath ('.//*/div/@ data - card - url '). The extract ()
Elif has_cards [0]=='summary_large_image:
Tweet [' has_media]=True
Tweet [' medias']=it. Xpath ('.//*/div/@ data - card - url '). The extract ()
Elif has_cards [0]=='amplify:
Tweet [' has_media]=True
Tweet [' medias']=it. Xpath ('.//*/div/@ data - card - url '). The extract ()
Elif has_cards [0]=='summary:
Tweet [' has_media]=True
Tweet [' medias']=it. Xpath ('.//*/div/@ data - card - url '). The extract ()
Elif has_cards [0]=='__entity_video:
Pass # TODO
# tweet [' has_media]=True
# tweet [' medias']=item. The xpath ('.//*/div/@ data - SRC '). The extract ()
The else: # there are many other types of card2!!!!!!!!!!
Logger. The debug (' Not handle the data - "card2 -type" : \ n % s' % it. The xpath ('. '). The extract () [0])

Is_reply=it. Xpath ('.//div [@] '). The extract ()
Tweet [' is_reply]=is_reply!=[]

Is_retweet=it. Xpath ('.//span [@] '). The extract ()
Tweet [' is_retweet]=is_retweet!=[]

Tweet [' user_id ']=it. Xpath ('.//@ data - the user - id '). The extract () [0]
Yield tweet

If self. Crawl_user:
# get user info
User=user ()
The user [' ID ']=tweet [' user_id ']
The user [' name ']=it. Xpath ('.//@ data - the name '). The extract () [0]
User [' screen_name]=it. Xpath ('.//@ data - screen - the name '). The extract () [0]
User=\ [' avatar ']
It. Xpath ('.//div/@/div/@/a/img/@ SRC '). The extract () [0]
Yield user
Except:
Logger. The error (error "tweet: \ n % s" % it. The xpath ('. '). The extract () [0])
# raise



Search on the Internet a lot of solution to use
Really is a fool a small white, come on everybody

CodePudding user response:

Reading the newspaper is said it is a wrong STR, according to the reason for parse_tweet_item (self, the items) function and the items should be is a collection of elements, but now have STR items in this collection
  • Related