Home > OS >  Convert list to data frame in python
Convert list to data frame in python

Time:10-28

I am trying to create a dataframe from the list below where the 1st column is "webpage" which is the index number and 2nd column is "destination_nodes" which is the list of dest_nodes.

for col in range(10001):
    print(col)
    dest_nodes = M.index[M[col] == 1.0].tolist()
    print(dest_nodes)

A sample of the output of print(col) and print(dest_nodes) is shown below:

0
[2725, 2763, 3575, 4377, 6221, 7798, 7852, 8014, 8753, 9575]
1
[137, 753, 1434, 2182, 3163, 3646, 3684, 3702, 3966, 4353, 4410, 5029, 5610, 5671, 6149, 6505, 6835, 7027, 7030, 7127, 7724, 7876, 8006, 8676, 8821, 9069, 9226, 9321]
2
[473, 1843, 6748]
3
[67, 433, 537, 1068, 1118, 1191, 1236, 1953, 2285, 2848, 3296, 3816, 4155, 4507, 4704, 4773, 5028, 5333, 5341, 5613, 5656, 5858, 6068, 6169, 6239, 7367, 7897, 7909, 8973, 9113, 9576, 9799, 9909]
4
[]

I tried the following but it does not seem to give me what i require.

dest_node = pd.DataFrame (col, dest_nodes, columns = ["webpage","destination_nodes"])

The output dataframe i would like is something like this: enter image description here

Would appreciate any help I can get!

CodePudding user response:

I would use a list comprehension to set up the dictionary:

df = pd.DataFrame({col:[M.index[M[col] == 1.0].tolist()] for col in range(10001)}, index="nodes")
df.index.name = "website"

print(df.traspose())

CodePudding user response:

This works

# Make list
colLst = [i for i in range(10001)]
dest_nodesLst =[M.index[M[col] == 1.0].tolist() for col in range(1001)]

# Make data frame
dic = {"col":colLst,"M":dest_nodesLst}
dest_node = pd.DataFrame(data=dic)
 
# print head of dataframe
print(dest_node.head())

CodePudding user response:

You can use zip to achieve that. Like this

pd.DataFrame(zip(col, dest_nodes), columns=["webpage","destination_nodes"])

If you want to remove the brackets and want the exact same representation as shown in the image, run the below code first and then create a DataFrame.

dest_nodes = [str(l1).replace('[', '').replace(']','') for l1 in dest_nodes]

CodePudding user response:

Maybe you can use M directly:

df = pd.DataFrame(
         {'webpage': M.columns,
         'destination_nodes': M.eq(1).apply(lambda x: M[x].index.tolist())}
)
print(df)

# Output
  webpage destination_nodes
0       0            [0, 2]
1       1            [0, 1]
2       2                []
3       3               [1]
4       4            [1, 2]

Setup:

data = {'0': [1, 0, 1],
        '1': [1, 1, 0],
        '2': [0, 0, 0],
        '3': [0, 1, 0],
        '4': [0, 1, 1]}
M = pd.DataFrame(data)
print(M)

# Output:
   0  1  2  3  4
0  1  1  0  0  0
1  0  1  0  1  1
2  1  0  0  0  1
  • Related