Home > Blockchain >  How can I compare some lists in python and add them into a dataframe?
How can I compare some lists in python and add them into a dataframe?

Time:10-27

I have four lists contain different numbers as shown below:

list1 = [399826, 399827, 413350, 404450, 399827, 404451]  
list2 = [399825, 399826, 412450, 403650, 391227]  
list3 = [412450, 399827]  
list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]  

Regarding the lists there are overlaps between the lists. I am going to make a dataframe which shows a set of all numbers and the name of lists that they belong to. Like this:

numbers list1 list2 list3 list4
399826 True True False False
399827 True False True False
413350 True False False False
412450 False True True False
etc ... ... ... ...

For comparing the lists I used a function here:

def returnNotMatches(a, b):

    a = set(a)
    b = set(b)
    return list(b - a)

But I don't know how I can make the dataframe correctly. Thanks in advanced for your help.

CodePudding user response:

Data:

>>> list1 = [399826, 399827, 413350, 404450, 399827, 404451]  
>>> list2 = [399825, 399826, 412450, 403650, 391227]  
>>> list3 = [412450, 399827]  
>>> list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]  
>>> import pandas as pd
>>> df = pd.DataFrame(list1 list2 list3 list4, columns=['values'])
>>> for i in range(1,5):
>>>     v = 'list'   str(i)
>>>     df[v] = df['values'].apply(lambda x:x in eval(v))
>>> df


    values  list1   list2   list3   list4
0   399826  True    True    False   False
1   399827  True    False   True    False
2   413350  True    False   False   False
3   404450  True    False   False   False
4   399827  True    False   True    False
5   404451  True    False   False   True
6   399825  False   True    False   False
7   399826  True    True    False   False
8   412450  False   True    True    False
9   403650  False   True    False   False
10  391227  False   True    False   False
11  412450  False   True    True    False
12  399827  True    False   True    False
13  399829  False   False   False   True
14  399246  False   False   False   True
15  513350  False   False   False   True
16  404370  False   False   False   True
17  789827  False   False   False   True
18  439931  False   False   False   True
19  404451  True    False   False   True

CodePudding user response:

Create dictionary by columns for new columns names first, then create dicts with Trues in values and create DataFrame, last replace NaNs to Falses:

list1 = [399826, 399827, 413350, 404450, 399827, 404451]  
list2 = [399825, 399826, 412450, 403650, 391227]  
list3 = [412450, 399827]  
list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]  

d = {'list1':list1,'list2':list2,'list3':list3,'list4':list4 }

df  = pd.DataFrame({k: dict.fromkeys(v, True) for k, v in d.items()}).fillna(False)
print (df)
        list1  list2  list3  list4
399826   True   True  False  False
399827   True  False   True  False
413350   True  False  False  False
404450   True  False  False  False
404451   True  False  False   True
399825  False   True  False  False
412450  False   True   True  False
403650  False   True  False  False
391227  False   True  False  False
399829  False  False  False   True
399246  False  False  False   True
513350  False  False  False   True
404370  False  False  False   True
789827  False  False  False   True
439931  False  False  False   True

CodePudding user response:

Create the dic the explode and crosstab

d = {'list1':list1,'list2':list2,'list3':list3,'list4':list4 }
s = pd.Series(d).explode()
s = pd.crosstab(s,s.index).astype(bool)
Out[67]: 
col_0   list1  list2  list3  list4
row_0                             
391227  False   True  False  False
399246  False  False  False   True
399825  False   True  False  False
399826   True   True  False  False
399827   True  False   True  False
399829  False  False  False   True
403650  False   True  False  False
404370  False  False  False   True
404450   True  False  False  False
404451   True  False  False   True
412450  False   True   True  False
413350   True  False  False  False
439931  False  False  False   True
513350  False  False  False   True
789827  False  False  False   True
  • Related