Home > Net >  Is there an R function to transform list of tweets with mentions into adjacency matrix
Is there an R function to transform list of tweets with mentions into adjacency matrix

Time:03-07

I am currently working on a data project of mentions within tweets. The data is in CSV format, with each row being an individual tweet, and the variables being "User", "mention_1", "mention_2"...."mention_12". For example, User "AAA" retweeted User BBB and User CCC in one tweet, didn't retweet anyone else in another, and then mentioned both CCC and BBB in individual tweets.

User Mention 1 Mention 2 ... Mention 12
AAA BBB CCC ... NA
AAA Blank
AAA CCC
AAA BBB
BBB AAA
BBB
etc

Mention 1-12 is the name of the user being mentioned in the original tweet. Some tweets have up to twelve mentions, and some don't have any.

I am attempting to transform this data into an adjacency matrix of the form:

User 1 User 2 ... User N
User 1 #of mentions #of mentions ... #of mentions
User 2 #of mentions #of mentions ... #of mentions
User N #of mentions #of mentions ... #of mentions

The Y and X axises are the names of the users tweeting each other, the values of the matrix are the number of mentions between the two users, with the diagonal being the number of times a user mentioned themselves.

I am attempting to create this matrix for network analysis with ERGMs, but I can't figure out how to transform the data without manually counting and filling in the matrix. Given that I have over 6000 rows, manual entry is not viable.

Does anyone know how to transform this table into an adjacency matrix in either R or excel?

Thank you in advance.

EDIT #1 Output of dput(head(df,20))

structure(list(ï..Ref = 1:20, user = c("Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Warren Steinley", "Warren Steinley", "Warren Steinley","Warren Steinley", "Warren Steinley"), Mention.0 =c("Candice Bergen", "Candice Bergen", "Dan Albas", "Erin O'Toole","Kerry-Lynne Findlay", "pierrepoilievre", "pierrepoilievre", "pierrepoilievre", "", "", "", "", "", "", "", "Melissa Lantsman","Melissa Lantsman", "", "", ""), Mention.1 = c("", "", "","", "", "", "", "", "Ziad Aboultaif", "", "", "", "GarnettGenuis", "Ziad Aboultaif", "Ziad Aboultaif", "", "", "Candice Bergen","Candice Bergen", ""), Mention.2 = c("", "", "", "", "", "", "", "", "", "","", "", "Dr. Stephen Ellis", "", "", "", "", "", "", ""),Mention.3= c("", "", "", "", "", "", "", "", "", "", "", "","Ziad Aboultaif", "", "", "", "", "", "", ""), Mention.4 = c("", "", "","","", "", "", "", "", "", "", "", "", "", "", "", "", "", "",""),Mention.5 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA,NA, NA, NA), Mention.6 = c("", "", "", "", "","", "", "", "", "", "", "", "", "", "", "", "", "", "", ""),Mention.7 = c("","", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", "", ""), Mention.8 = c("", "", "", "", "", "","", "", "", "", "", "", "", "", "", "", "", "", "", ""), Mention.9= c("","", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", ""), Mention.10 = c("", "", "", "", "", "","","", "", "", "", "", "", "", "", "", "", "", "", ""), Mention.11= c("", "", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", ""), Mention.12 = c("", "", "", "", "", "","","", "", "", "", "", "", "", "", "", "", "", "", "")), row.names= c(NA, 20L), class = "data.frame")

Edit 2: Current script

library(readr)
library(tidyverse)
library(igraph)
library(ergm)
df <- read.csv("CPC Retweets.csv", stringsAsFactors = FALSE)
str(df)

refs <- (dput(df))
             
                
#Count number of tweets per user
df2<-xtabs(~user,data = df)

#Create list of users in data frame
users <- unique(unlist(df$user))
#Create mention variable 
men <- df$mentions
umen <- unique(unlist(men))
#head(umen)
#Create Adjacency Matrix 
mat <- matrix(0,length(users), length(users))
rownames(mat) <- users
colnames(mat) <- users
mat[1:6,1:6]

# fill in matrix by looping through each tweet 
for(t in 1:length(users)){
  #select mentions
  mention <- men[[t]]
  #skip if 0 mentions
  #if(length(mention) == 0) next()
  #add plus one to the current value in adj  matrix
  mat[users,users] <- mat[users,users]   1
  
}
rm(t)

CodePudding user response:

Perhaps this:

xtabs(~ User   value, data = reshape2::melt(refs, "User"))
#      value
# User    AAA BBB CCC
#   AAA 4   0   2   2
#   BBB 3   1   0   0

Data

refs <- structure(list(User = c("AAA", "AAA", "AAA", "AAA", "BBB", "BBB"), Mention.1 = c("BBB", "", "CCC", "", "AAA", ""), Mention.2 = c("CCC", "", "", "BBB", "", "")), class = "data.frame", row.names = c(NA, -6L))

CodePudding user response:

Found the solution after some diving into forums. Here is for anyone with a similar problem,

library(reshape2)
#create matrix 
adj_mat <- dcast(
  data = df, 
  formula = user ~ mentions,
  drop = F
)

#create adjacency matrix 
tweets <- as.matrix(adj_mat)

I had to bring all the mentions into a single column called "mentions". I could not find a solution to the problem unless the data was dyadic, and so there was some work in excel to make this solution work.

  • Related