I am currently working on a data project of mentions within tweets. The data is in CSV format, with each row being an individual tweet, and the variables being "User", "mention_1", "mention_2"...."mention_12". For example, User "AAA" retweeted User BBB and User CCC in one tweet, didn't retweet anyone else in another, and then mentioned both CCC and BBB in individual tweets.
User | Mention 1 | Mention 2 | ... | Mention 12 |
---|---|---|---|---|
AAA | BBB | CCC | ... | NA |
AAA | Blank | |||
AAA | CCC | |||
AAA | BBB | |||
BBB | AAA | |||
BBB | ||||
etc |
Mention 1-12 is the name of the user being mentioned in the original tweet. Some tweets have up to twelve mentions, and some don't have any.
I am attempting to transform this data into an adjacency matrix of the form:
User 1 | User 2 | ... | User N | |
---|---|---|---|---|
User 1 | #of mentions | #of mentions | ... | #of mentions |
User 2 | #of mentions | #of mentions | ... | #of mentions |
User N | #of mentions | #of mentions | ... | #of mentions |
The Y and X axises are the names of the users tweeting each other, the values of the matrix are the number of mentions between the two users, with the diagonal being the number of times a user mentioned themselves.
I am attempting to create this matrix for network analysis with ERGMs, but I can't figure out how to transform the data without manually counting and filling in the matrix. Given that I have over 6000 rows, manual entry is not viable.
Does anyone know how to transform this table into an adjacency matrix in either R or excel?
Thank you in advance.
EDIT #1 Output of dput(head(df,20))
structure(list(ï..Ref = 1:20, user = c("Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Ziad Aboultaif", "Ziad Aboultaif", "Ziad Aboultaif","Warren Steinley", "Warren Steinley", "Warren Steinley","Warren Steinley", "Warren Steinley"), Mention.0 =c("Candice Bergen", "Candice Bergen", "Dan Albas", "Erin O'Toole","Kerry-Lynne Findlay", "pierrepoilievre", "pierrepoilievre", "pierrepoilievre", "", "", "", "", "", "", "", "Melissa Lantsman","Melissa Lantsman", "", "", ""), Mention.1 = c("", "", "","", "", "", "", "", "Ziad Aboultaif", "", "", "", "GarnettGenuis", "Ziad Aboultaif", "Ziad Aboultaif", "", "", "Candice Bergen","Candice Bergen", ""), Mention.2 = c("", "", "", "", "", "", "", "", "", "","", "", "Dr. Stephen Ellis", "", "", "", "", "", "", ""),Mention.3= c("", "", "", "", "", "", "", "", "", "", "", "","Ziad Aboultaif", "", "", "", "", "", "", ""), Mention.4 = c("", "", "","","", "", "", "", "", "", "", "", "", "", "", "", "", "", "",""),Mention.5 = c(NA,NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,NA, NA, NA, NA, NA,NA, NA, NA), Mention.6 = c("", "", "", "", "","", "", "", "", "", "", "", "", "", "", "", "", "", "", ""),Mention.7 = c("","", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", "", ""), Mention.8 = c("", "", "", "", "", "","", "", "", "", "", "", "", "", "", "", "", "", "", ""), Mention.9= c("","", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", ""), Mention.10 = c("", "", "", "", "", "","","", "", "", "", "", "", "", "", "", "", "", "", ""), Mention.11= c("", "", "", "", "", "", "", "", "", "", "", "", "", "","", "", "", "", "", ""), Mention.12 = c("", "", "", "", "", "","","", "", "", "", "", "", "", "", "", "", "", "", "")), row.names= c(NA, 20L), class = "data.frame")
Edit 2: Current script
library(readr)
library(tidyverse)
library(igraph)
library(ergm)
df <- read.csv("CPC Retweets.csv", stringsAsFactors = FALSE)
str(df)
refs <- (dput(df))
#Count number of tweets per user
df2<-xtabs(~user,data = df)
#Create list of users in data frame
users <- unique(unlist(df$user))
#Create mention variable
men <- df$mentions
umen <- unique(unlist(men))
#head(umen)
#Create Adjacency Matrix
mat <- matrix(0,length(users), length(users))
rownames(mat) <- users
colnames(mat) <- users
mat[1:6,1:6]
# fill in matrix by looping through each tweet
for(t in 1:length(users)){
#select mentions
mention <- men[[t]]
#skip if 0 mentions
#if(length(mention) == 0) next()
#add plus one to the current value in adj matrix
mat[users,users] <- mat[users,users] 1
}
rm(t)
CodePudding user response:
Perhaps this:
xtabs(~ User value, data = reshape2::melt(refs, "User"))
# value
# User AAA BBB CCC
# AAA 4 0 2 2
# BBB 3 1 0 0
Data
refs <- structure(list(User = c("AAA", "AAA", "AAA", "AAA", "BBB", "BBB"), Mention.1 = c("BBB", "", "CCC", "", "AAA", ""), Mention.2 = c("CCC", "", "", "BBB", "", "")), class = "data.frame", row.names = c(NA, -6L))
CodePudding user response:
Found the solution after some diving into forums. Here is for anyone with a similar problem,
library(reshape2)
#create matrix
adj_mat <- dcast(
data = df,
formula = user ~ mentions,
drop = F
)
#create adjacency matrix
tweets <- as.matrix(adj_mat)
I had to bring all the mentions into a single column called "mentions". I could not find a solution to the problem unless the data was dyadic, and so there was some work in excel to make this solution work.