Home > Enterprise >  error processing tweet json in R function: missing value where TRUE/FALSE needed
error processing tweet json in R function: missing value where TRUE/FALSE needed

Time:06-02

I am using a function which takes a raw tweet json file as input and outputs the retweet cascades. Here is a part of the function:

if (api_version == 2) {
    parse_tweet <- function(tweet, keep_text = F) {
      tryCatch({
        json_tweet <- jsonlite::fromJSON(tweet)
        id <- json_tweet$data$id
        magnitude <-zero_if_null(json_tweet$includes$users$public_metrics$followers_count)
        user_id <- json_tweet$data$author_id
        retweet_id <- NA

        if (keep_text) text <- json_tweet$data$text

        #if this tweet is a retweet, get original tweet's information
        if (!is.null(json_tweet$data$referenced_tweets) && json_tweet$data$referenced_tweets$type == 'retweeted') {
          retweet_id <- json_tweet$data$referenced_tweets$id  
          cat("retweet_id: ", retweet_id, "\n")
          if (keep_text) text <- NA 
       }
      },
      .... # warning for error processing json 
        )
   }
}

Here is the error:

Error processing json: Error in if (!is.null(json_tweet$data$referenced_tweets) && json_tweet$data$referenced_tweets$type == : missing value where TRUE/FALSE needed

I checked my json file to see the path to find the type of tweet (e.g., "retweeted", "quoted", or "replied_to") which are in this path: json_tweet-> data -> referenced_tweets -> type) but I don't know why the function returns missing values and null retweet ids.

Here is a small part of the data (I couldn't upload even the first line of my json file because it exceeded the StackOverflow character limits):

{"data": [{"referenced_tweets": [{"type": "retweeted", "id": "1253739069273710594"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}, {"start": 18, "end": 24, "username": "AC360", "id": "227837742"}], "annotations": [{"start": 25, "end": 39, "probability": 0.7096, "type": "Person", "normalized_text": "President Trump"}], "urls": [{"start": 98, "end": 121, "url": "", "images": [{"url": "", "width": 144, "height": 144}, {"url": "", "width": 144, "height": 144}], "status": 200, "title": "Ultraviolet Irradiation of Blood: \u201cThe Cure That Time Forgot\u201d?", "description": "Ultraviolet blood irradiation (UBI) was extensively used in the 1940s and 1950s to treat many diseases including septicemia, pneumonia, tuberculosis, arthritis, asthma and even poliomyelitis. The early studies were carried out by several physicians in ...", "unwound_url": ""}]}, "public_metrics": {"retweet_count": 3, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253834847258370048", "context_annotations": [{"domain": {"id": "3", "name": "TV Shows", "description": "Television shows from around the world"}, "entity": {"id": "10000271509", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable."}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable"}}, {"domain": {"id": "4", "name": "TV Episodes", "description": "Television show episodes"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable."}}, {"domain": {"id": "10", "name": "Person", "description": "Named people in the world like Nelson Mandela"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249271407508242432", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1249277031881138178", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "29", "name": "Events [Entity Service]", "description": "Entity Service related Events domain"}, "entity": {"id": "1250891078401552385", "name": "Anderson Cooper 360", "description": "Anderson Cooper goes beyond the headlines with in-depth reporting and investigations. Through nightly \"Keeping Them Honest\" reports, Anderson keeps his commitment to holding those in power accountable. And, of course, there's the RidicuList, a tongue-in-cheek commentary on the day's news that may leave viewers (and Anderson) laughing. Joining him are guests that frequently include political and legal analysts."}}, {"domain": {"id": "35", "name": "Politician", "description": "Politicians in the world, like Joe Biden"}, "entity": {"id": "799022225751871488", "name": "Donald Trump", "description": "45th US President, Donald Trump"}}], "created_at": "2020-04-24T23:54:57.000Z", "author_id": "1890848160", "text": "RT @warriors_mom: @AC360 President Trump was referring to this well-documented medical treatment: ", "source": "Twitter for iPhone", "conversation_id": "1253834847258370048"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253452455540666371"}], "entities": {"mentions": [{"start": 3, "end": 16, "username": "warriors_mom", "id": "75184478"}], "annotations": [{"start": 24, "end": 27, "probability": 0.691, "type": "Place", "normalized_text": "U.S."}]}, "public_metrics": {"retweet_count": 5, "reply_count": 0, "like_count": 0, "quote_count": 0}, "possibly_sensitive": false, "reply_settings": "everyone", "lang": "en", "id": "1253828982413410307", "context_annotations": [{"domain": {"id": "123", "name": "Ongoing News Story", "description": "Ongoing News Stories like 'Brexit'"}, "entity": {"id": "1220701888179359745", "name": "COVID-19"}}], "created_at": "2020-04-24T23:31:39.000Z", "author_id": "863857568", "text": "RT @warriors_mom: Major U.S. credit-card issuers begin lowering customer spending limits as coronavirus pandemic shutdowns leave millions j\u2026", "source": "Twitter for iPhone", "conversation_id": "1253828982413410307"}, {"referenced_tweets": [{"type": "retweeted", "id": "1253815956662620163"}],"entities":.... }}

I found some similar questions, but none of the answers helped me. Can someone please help me with this?

Thanks!

CodePudding user response:

Looking at the JSON, referenced_tweets is an array (it has square brackets around its value: "referenced_tweets":[{"type": "retweeted", "id": "1253739069273710594"}]).

So the cause of the error is json_tweet$data$referenced_tweets$type doesn't exist - type is a property of each of the elements of the array, not the array itself.

So you'll need to loop over the array. Something like this, based on your original code:

#if this tweet is a retweet, get original tweet's information
if (!is.null(json_tweet$data$referenced_tweets)) {
  for (i in seq_along(json_tweet$data$referenced_tweets)) {
    referenced_tweet <- json_tweet$data$referenced_tweets[[i]]
    if (referenced_tweet$type == 'retweeted') {
      cat("retweet_id: ", referenced_tweet$id, "\n")
      if (keep_text) text <- NA
    }
  }
}

You probably don't need the if (!is.null...) as I think seq_along will handle the null case, but you might want to leave it for readability.

  • Related