There is an "sourcetweet_author_id" column in my dataset (of around 30000 tweets) which includes the twitter id of quoted and retweeted users. I want to convert the twitter id to twitter user name.
I managed to gather user names of the "sourcetweet_author_id" with rtweet package's lookup_users function.
data.with.usernames <- lookup_users(as_userid(mydata$sourcetweet_author_id))
sample output:
sample data:
"user_id" | "status_id" | "created_at" | "screen_name" |
---|---|---|---|
"99564663" | "1521494990890876929" | 2022-05-03 14:20:48 | "LeventUzumcu" |
"4274638635" | "1521110034515701760" | 2022-05-02 12:51:07 | "SalihaSnmezate1" |
"1266093027254325250" | "1300887103874707457" | 2020-09-01 20:03:49 | "arjin3426" |
"1494034783" | "1521523729599107073" | 2022-05-03 16:15:00 | "DikenComTr" |
But this function only returned the list of unique users. It is quite normal because my dataset includes many retweets from the same tweet.
Now, I need a function to match each sourcetweet_author_id with its user name and use that function to convert all the ids in "user_id" column to usernames in a new column.
sample data table of my original dataset:
"sourcetweet_author_id" | "created_at" | "retweet_count" | "like_count" |
---|---|---|---|
"99564663" | "2020-07-23T14:00:39.000Z" | 8031 | 0 |
"99564663" | "2020-07-23T14:00:35.000Z" | 7153 | 0 |
"1266093027254325250" | "2020-07-23T14:00:29.000Z" | 7153 | 0 |
"1266093027254325250" | "2020-07-23T14:00:29.000Z" | 6596 | 0 |
CodePudding user response:
This should add the screen_name
column to original_dataset
:
library(dplyr)
original_dataset %>%
left_join(
select(data.with.usernames, sourcetweet_author_id = user_id, screen_name)
)