Most pythonic/stylish/efficient way to create a dataframe from 2-dimensional list of string with var-CodePudding

I guess professional data analysts know an answer to this, but I'm no analyst. And I just barely know Pandas. So I am at a loss.

There are two lists. Their contents are unpredictable (parsed from web counters, web analytics, web statistics, etc).

list1 = ['WordA', 'WordB', ..., 'WordXYZ']

...and...

list2 = [['WordA1', 'WordA2'], ['WordB1'], ['WordC1', 'WordC2', ,'WordC96'], ..., ['WordXYZ1', 'WordXYZ2']]

Length of two lists are always equal (they`re the results of work of parser I already wrote)

What I need is to create a dataframe which will have two rows for each item in list1, each containing the word in first column, and then put corresponding words from list2 into first row of those two (starting from second column, first column to bealready filled from list1).

So I imagine the following steps:

Create a dataframe filled with empty strings ('') with number of columns equal to len(max(list2, key=len)) and number of rows equal to twice length of list1 (aaaand I don't know how, this is actually my very second time using Pandas at all!);
Somehow fill first column of resulting dataframe with contents of list1, filling two rows for each item in list1;
Somehow put contents of list2 into every even row of the dataframe, starting with second column;
Save into .xls file (yes, that's the final goal), enjoy job done.

Now first thing, I already spend half a day trying to find an answer "how to create pandas dataframe filled with empty strings with given number of rows and columns", and found a lot of different articles, which contradict each other.

And second, there's got to be a way to do all this more pythonic, more efficient and more stylish way!

Aaaand, maybe there`s a way to create an excel file without using pandas at all, which I just don't know about (hopefully, yet)

Can anyone help, please?

UPD: (to answer a question) the results should look like:

WordA WordA1 WordA2 
WordA 
WordB WordB1 
WordB 
WordC WordC1 WordC2 (...) WordC96 
WordC 
(...)x2 
WordXYZ WordXYZ1 WordXYZ2 
WordXYZ

CodePudding user response：

The following should give you (almost) what you want:

import pandas as pd
from itertools import chain

list1 = ['WordA', 'WordB']
list2 = [['WordA1', 'WordA2'], ['WordB1']]

# Flatten list 2
list2 = list(chain(*list2))

# Create DataFrames
list1 = pd.DataFrame(data=list1, columns=["word1"])
list2 = pd.DataFrame(data=list2, columns=["word2"])

# Prefix for list2
list2["prefix"] = list2["word2"].str.extract("([^0-9] )")

list1 = list1.merge(list2, left_on="word1", right_on="prefix", how="inner")

# Concatenated words
list1 = list1.groupby("word1")["word2"].agg(lambda x: " ".join(x)).reset_index()
list1["word2"] = list1["word1"].str.cat(list1["word2"], sep=" ")
list1 = pd.melt(list1).sort_values(by="value")

CodePudding user response：

If you just want to write the lists to an Excel file, you don't need pandas. You can use for instance openpyxl:

from openpyxl import Workbook

wb = Workbook()
ws = wb.active

for *word, words in zip(list1, list2):
    ws.append(word   words)
    ws.append(word)

wb.save('output.xlsx')

If you really want to use pandas:

import pandas as pd

df = pd.DataFrame([[None]   x if isinstance(x, list) else [x] for pair in zip(list2, list1) for x in pair])
df[0] = df[0].bfill() 
df.to_excel('output.xlsx', index=False, header=False)