I have pandas dataframe that has the following structure
BID | EID |
---|---|
B1 | 1001,1002 |
B2 | 1001,1003,1006 |
B3 | 1004,1006,1008,1005 |
B4 | 1001,1002,1003,10004,1005,1008 |
I want to report the COUNT of how many common EIDs are there amongst the BIDs. I want visualization in the following format
B1 | B2 | B3 | B4 | |
---|---|---|---|---|
B1 | n/a | 1 | 0 | 2 |
B2 | 1 | n/a | 2 | 3 |
B3 | 0 | 2 | n/a | 4 |
B4 | 2 | 3 | 4 | n/a |
How can i achieve this ? Also higher the number in the cell, i want to highlight dark as it appears in heat map. Appreciate your help.
My logic is .. Create a pandas dataframe with BID as index Loop through each BID and compare it with other BIDs Create a new column Each of new column will be a list (This list will contain the count of EIDs) How to Convert this dataframe to heat map ?
or any easy logic that I can implement ?
CodePudding user response:
OK, I'm new here but here is my solution :)
first of all for each BID create a column with his data.
tmp = df["EID"].apply(pd.Series).set_index(df["BID"].values).T
next build the structure of the correlation with corr()
corr_df=tmp.corr()
Then change the values of the corr
import itertools
for a, b in itertools.combinations_with_replacement(tmp.columns, 2,):
corr_df.loc[[a],[b]]=len(set(tmp[a]) & set(tmp[b])) # len of equal items
corr_df.loc[[b],[a]]=len(set(tmp[a]) & set(tmp[b]))
corr_df.loc[[a],[a]]=np.NAN
print(corr_df)
output:
B1 B2 B3 B4
B1 NaN 1.0 0.0 2.0
B2 1.0 NaN 1.0 2.0
B3 0.0 1.0 NaN 3.0
B4 2.0 2.0 3.0 NaN
heatmap code:
import plotly.graph_objects as go # for visualization
fig = go.Figure(data=go.Heatmap(
z=corr_df,
x=corr_df.columns,
y=corr_df.columns,
hoverongaps = False,
colorscale="Greys",))
fig.update_layout(
title="HeatMap",
width=600
)
fig.show()
output: