I have several lists of english words. How do I make a column in a DataFrame that tells me which list each word came from. So in the future as more words are added from new lists I can keep track of what list a word came from?
list_1 = [['ant', 3] ['bat', 3] ['cat', 3]]
df = pd.DataFrame(list_1, columns = ['word', 'length'], dtype = str)
How would I add list_2 data to this dataframe and identify which lists the data came from under the source column?
list_2 = [['rose', 4] ['tulip', 5] ['lilac', 5] ['daisy', 5]]
Expected output:
source word length
0 list_1 ant 3
1 list_1 bat 3
2 list_1 cat 3
3 list_2 rose 4
4 list_2 tulip 5
5 list_2 lilac 5
6 list_2 daisy 5
CodePudding user response:
Here is how I would do it, using a dictionary to hold the lists, and a small comprehension with the dataframe constructor:
import pandas as pd
list_1 = ['ant', 'bat', 'cat']
list_2 = ['rose', 'tulip', 'lilac', 'daisy']
lists = {'list_1': list_1, 'list_2': list_2}
df = pd.DataFrame([(k,e,len(e)) for k,l in lists.items() for e in l],
columns=['source', 'word', 'length'])
Output:
source word length
0 list_1 ant 3
1 list_1 bat 3
2 list_1 cat 3
3 list_2 rose 4
4 list_2 tulip 5
5 list_2 lilac 5
6 list_2 daisy 5