ValueError due to a missing element in color map-CodePudding

I need to build a network where nodes (from df1) have some specific colors based on labels from a different dataset (df2). In df1 not all the nodes have labelled assigned in df2 (for example, because they have not been labelled yet, so they have currently nan value). The below code should provide a good example on what I mean:

import networkx as nx
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, colors as mcolor

# Sample DataFrames
df1 = pd.DataFrame({
    'Node': ['A', 'A', 'B', 'B', 'B', 'Z'],
    'Edge': ['B', 'D', 'N', 'A', 'X', 'C']
})
df2 = pd.DataFrame({
    'Nodes': ['A', 'B', 'C', 'D', 'N', 'S', 'X'],
    'Attribute': [-1, 0, -1.5, 1, 1, 9, 0]
})

# Simplified construction of `colour_map`
uni_val = df2['Attribute'].unique()
colors = plt.cm.jet(np.linspace(0, 1, len(uni_val)))
# Map colours to_hex then zip with
mapper = dict(zip(uni_val, map(mcolor.to_hex, colors)))
color_map =df2.set_index('Nodes')['Attribute'].map(mapper).fillna('black')


G = nx.from_pandas_edgelist(df1, source='Node', target='Edge')
# Add Attribute to each node
nx.set_node_attributes(G, color_map, name="colour")

# Then draw with colours based on attribute values:
nx.draw(G,
        node_color=nx.get_node_attributes(G, 'colour').values(),
        with_labels=True)

plt.show()

Z is not df2 because df2 was created considering only non NA values. I would like to assign the color black to unlabelled nodes, i.e., for those nodes that are not in df2. Trying to run the code above, I am getting this error:

ValueError: 'c' argument has 7 elements, which is inconsistent with 'x' and 'y' with size 8.

It is clear that this error is caused by the add of color black for missing, not included in color_map. What it is not clear to me is how to fix the issue. I hope in some help for figuring it out.

CodePudding user response：

Since Z is not in df2, but is one of the nodes, we should, instead of creating properties exclusively from df2 we should

color_map without reindex

df2.set_index('Nodes')['Attribute'].map(mapper)

Nodes
A    #000080
B    #0080ff
C    #7dff7a
D    #ff9400
N    #ff9400
S    #800000
X    #0080ff
Name: Attribute, dtype: object

nodes (using nodes here since this will be all nodes in the Graph, rather than just those in df2)

G.nodes()

['A', 'B', 'D', 'N', 'X', 'Z', 'C']

reindex to ensure all nodes are present in mapping:

df2.set_index('Nodes')['Attribute'].map(mapper).reindex(G.nodes(), fill_value='black')

Nodes
A    #000080
B    #0080ff
D    #ff9400
N    #ff9400
X    #0080ff
Z      black  # <- Missing Nodes are added with specified value
C    #7dff7a
Name: Attribute, dtype: object

Complete Code:

import networkx as nx
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, colors as mcolor

# Sample DataFrames
df1 = pd.DataFrame({
    'Node': ['A', 'A', 'B', 'B', 'B', 'Z'],
    'Edge': ['B', 'D', 'N', 'A', 'X', 'C']
})
df2 = pd.DataFrame({
    'Nodes': ['A', 'B', 'C', 'D', 'N', 'S', 'X'],
    'Attribute': [-1, 0, -1.5, 1, 1, 9, 0]
})

# Simplified construction of `colour_map`
uni_val = df2['Attribute'].unique()
colors = plt.cm.jet(np.linspace(0, 1, len(uni_val)))
# Map colours to_hex then zip with
mapper = dict(zip(uni_val, map(mcolor.to_hex, colors)))

G = nx.from_pandas_edgelist(df1, source='Node', target='Edge')
# Create Colour map. Ensure all nodes have a value via reindex
color_map = (
    df2.set_index('Nodes')['Attribute'].map(mapper)
        .reindex(G.nodes(), fill_value='black')
)
# Add Attribute to each node
nx.set_node_attributes(G, color_map, name="colour")

# Then draw with colours based on attribute values:
nx.draw(G,
        node_color=nx.get_node_attributes(G, 'colour').values(),
        with_labels=True)

plt.show()