Calculate degree centrality of a node for every day in a NetworkX graph-CodePudding

I have a networkx graph with events spanning several months. I wanted to see how a node's centrality score changes over time.

I am planning on using several different centrality measures so I have created a function to select a specific sender (I don't have many unique senders) and a specific date, then create a networkx graph and calculate the degree. Then add everything to a dataframe.

But my code seems to be a bit convoluted and I'm not sure it's working correctly, since my output:

    feature degree  date
0   A   1.0 2017-01-02
1   35  1.0 2017-01-02
0   A   1.0 2017-01-20
1   18  1.0 2017-01-20

contains nodes 35 and 18, but I only want A. Is there a better way of doing this?

import numpy as np
import pandas as pd
from datetime import datetime
import networkx as nx

df = pd.DataFrame({'feature':['A','B','A','B','A','B','A','B','A','B'],
                   'feature2':['18','78','35','14','57','68','57','17','18','78'],
                   'timestamp':['2017-01-20T11','2017-01-01T13',
                           '2017-01-02T12','2017-02-01T13',
                           '2017-03-01T14','2017-05-01T15',
                           '2017-04-01T16','2017-04-01T17',
                          '2017-12-01T17','2017-12-01T19']})
df['timestamp'] = pd.to_datetime(pd.Series(df['timestamp']))
df['date'], df['time']= df.timestamp.dt.date,  df.timestamp.dt.time

def test(feature,date,name,col_name,nx_measure):
    feature = df[df['feature']== feature]
    feature['date_str'] = feature['date'].astype(str)
    one_day = feature[feature['date_str']==date]
    oneDay_graph =nx.from_pandas_edgelist(one_day, source = 'feature', target = 'feature2', create_using=nx.DiGraph)
    name = pd.DataFrame()
    name['feature']= nx_measure(oneDay_graph).keys()  
    name[col_name]= nx_measure(oneDay_graph).values()
    name['date'] = date
    return name

a =test('A','2017-01-02','degree','degree',nx.degree_centrality)
b = test('A','2017-01-20','degree','degree',nx.degree_centrality)

a.append(b)

desiered output

    feature degree  date
0   A   1.0 2017-01-02
0   A   1.0 2017-01-20

CodePudding user response：

When you set name['feature']= nx_measure(oneDay_graph).keys(), you're getting a row for each element of the graph, which in this case is both 'A' and the target node of 35 or 18. What you should be doing instead is something like

d = nx_measure(oneDay_graph)
name['feature'] = feature  
name[col_name] = d[feature]

Here's a more thorough refactoring of your approach:

import numpy as np
import pandas as pd
from datetime import datetime
import networkx as nx

df = pd.DataFrame({'feature':['A','B','A','B','A','B','A','B','A','B'],
                   'feature2':['18','78','35','14','57','68','57','17','18','78'],
                   'timestamp':['2017-01-20T11','2017-01-01T13',
                           '2017-01-02T12','2017-02-01T13',
                           '2017-03-01T14','2017-05-01T15',
                           '2017-04-01T16','2017-04-01T17',
                          '2017-12-01T17','2017-12-01T19']})
df['timestamp'] = pd.to_datetime(pd.Series(df['timestamp']))
df['date'], df['time']= df.timestamp.dt.date,  df.timestamp.dt.time

feature = 'A'
dates = ['2017-01-02','2017-01-20']
# dates = df['date'].unique().astype(str)
name = col_name = 'degree'
nx_measure = nx.degree_centrality

df['date_str'] = df['date'].astype(str)

def get_centralities(feature,dates,name,col_name,nx_measure):
    rows = []
    for date in dates:
        one_day = df[(df['feature']==feature) & (df['date_str']==date)]
        oneDay_graph = nx.from_pandas_edgelist(one_day, source = 'feature', target = 'feature2', create_using=nx.DiGraph)
        d = nx_measure(oneDay_graph)
        rows.append([feature,d[feature],date])
    return pd.DataFrame(rows, columns = ['feature',col_name,'date'])

print(get_centralities(feature,dates,name,col_name,nx_measure))

Result:

  feature  degree        date
0       A     1.0  2017-01-02
1       A     1.0  2017-01-20

In fact, I suspect that this approach produces the wrong answers since you only consider node centrality relative to the subgraph containing feature 'A', but not feature 'B'. I suspect that the following is a better approach:

#<build dataframe in same way>

features = df['feature'].unique()
dates = df['date'].unique().astype(str)
name = col_name = 'degree'
nx_measure = nx.degree_centrality

df['date_str'] = df['date'].astype(str)

def get_centralities(features,dates,name,col_name,nx_measure):
    df_out = pd.DataFrame([[feat,date] for feat in features for date in dates], columns = ['feature','date'])
    for date in dates:
        one_day = df[df['date_str']==date]
        oneDay_graph = nx.from_pandas_edgelist(one_day, source = 'feature', target = 'feature2', create_using=nx.DiGraph)
        d = nx_measure(oneDay_graph)
        def meas_func(c): return d.get(c,0)
        where = (df_out['date'] == date)
        df_out.loc[where,col_name] = df_out.loc[where,'feature'].transform(meas_func)
    return df_out

print(get_centralities(features,dates,name,col_name,nx_measure))

Result:

   feature        date    degree
0        A  2017-01-20  1.000000
1        A  2017-01-01  0.000000
2        A  2017-01-02  1.000000
3        A  2017-02-01  0.000000
4        A  2017-03-01  1.000000
5        A  2017-05-01  0.000000
6        A  2017-04-01  0.333333
7        A  2017-12-01  0.333333
8        B  2017-01-20  0.000000
9        B  2017-01-01  1.000000
10       B  2017-01-02  0.000000
11       B  2017-02-01  1.000000
12       B  2017-03-01  0.000000
13       B  2017-05-01  1.000000
14       B  2017-04-01  0.333333
15       B  2017-12-01  0.333333