I am trying to create a dataframe from the list below where the 1st column is "webpage" which is the index number and 2nd column is "destination_nodes" which is the list of dest_nodes.
for col in range(10001):
print(col)
dest_nodes = M.index[M[col] == 1.0].tolist()
print(dest_nodes)
A sample of the output of print(col) and print(dest_nodes) is shown below:
0
[2725, 2763, 3575, 4377, 6221, 7798, 7852, 8014, 8753, 9575]
1
[137, 753, 1434, 2182, 3163, 3646, 3684, 3702, 3966, 4353, 4410, 5029, 5610, 5671, 6149, 6505, 6835, 7027, 7030, 7127, 7724, 7876, 8006, 8676, 8821, 9069, 9226, 9321]
2
[473, 1843, 6748]
3
[67, 433, 537, 1068, 1118, 1191, 1236, 1953, 2285, 2848, 3296, 3816, 4155, 4507, 4704, 4773, 5028, 5333, 5341, 5613, 5656, 5858, 6068, 6169, 6239, 7367, 7897, 7909, 8973, 9113, 9576, 9799, 9909]
4
[]
I tried the following but it does not seem to give me what i require.
dest_node = pd.DataFrame (col, dest_nodes, columns = ["webpage","destination_nodes"])
The output dataframe i would like is something like this:
Would appreciate any help I can get!
CodePudding user response:
I would use a list comprehension to set up the dictionary
:
df = pd.DataFrame({col:[M.index[M[col] == 1.0].tolist()] for col in range(10001)}, index="nodes")
df.index.name = "website"
print(df.traspose())
CodePudding user response:
This works
# Make list
colLst = [i for i in range(10001)]
dest_nodesLst =[M.index[M[col] == 1.0].tolist() for col in range(1001)]
# Make data frame
dic = {"col":colLst,"M":dest_nodesLst}
dest_node = pd.DataFrame(data=dic)
# print head of dataframe
print(dest_node.head())
CodePudding user response:
You can use zip to achieve that. Like this
pd.DataFrame(zip(col, dest_nodes), columns=["webpage","destination_nodes"])
If you want to remove the brackets and want the exact same representation as shown in the image, run the below code first and then create a DataFrame.
dest_nodes = [str(l1).replace('[', '').replace(']','') for l1 in dest_nodes]
CodePudding user response:
Maybe you can use M
directly:
df = pd.DataFrame(
{'webpage': M.columns,
'destination_nodes': M.eq(1).apply(lambda x: M[x].index.tolist())}
)
print(df)
# Output
webpage destination_nodes
0 0 [0, 2]
1 1 [0, 1]
2 2 []
3 3 [1]
4 4 [1, 2]
Setup:
data = {'0': [1, 0, 1],
'1': [1, 1, 0],
'2': [0, 0, 0],
'3': [0, 1, 0],
'4': [0, 1, 1]}
M = pd.DataFrame(data)
print(M)
# Output:
0 1 2 3 4
0 1 1 0 0 0
1 0 1 0 1 1
2 1 0 0 0 1