Home > Enterprise >  Efficient way to create a weighted graph with networkx where weights are intersection of appearances
Efficient way to create a weighted graph with networkx where weights are intersection of appearances

Time:05-22

I am analyzing Amazon's reviews dataset, and I have, customers IDs, their reviews on different products, and products' identifiers as well.
The data can be represented by:

Customer Product Review ...
1 A ....
1 B ....
2 A ....
2 C ....

I want to create a weighted undirected graph using networkx, where each node would be a product, and the weights between nodes (products) would be the number of different customers that reviewed the two products.
The data is huge, so I was wondering if there is a feasible way to update the current weights of a network iteratively when going product by product.

Another desirable representation of this graph would be, for the example above,

A B C
A 2 1 1
B 1 1 0
C 1 0 1

EDIT: Mistakenly wrote the (A,C)=2. Replaced it with 1.

CodePudding user response:

Try this

import pandas as pd
df = pd.read_csv('file.csv')
# cross-tabulate
v = pd.crosstab(df['Product'], df['Customer'])
# dot product for the number of customers who reviewed 2 products
v.dot(v.T)
Product  A  B  C
Product         
A        2  1  1
B        1  1  0
C        1  0  1
  • Related