I have four lists contain different numbers as shown below:
list1 = [399826, 399827, 413350, 404450, 399827, 404451]
list2 = [399825, 399826, 412450, 403650, 391227]
list3 = [412450, 399827]
list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]
Regarding the lists there are overlaps between the lists. I am going to make a dataframe which shows a set of all numbers and the name of lists that they belong to. Like this:
numbers | list1 | list2 | list3 | list4 |
---|---|---|---|---|
399826 | True | True | False | False |
399827 | True | False | True | False |
413350 | True | False | False | False |
412450 | False | True | True | False |
etc | ... | ... | ... | ... |
For comparing the lists I used a function here:
def returnNotMatches(a, b):
a = set(a)
b = set(b)
return list(b - a)
But I don't know how I can make the dataframe correctly. Thanks in advanced for your help.
CodePudding user response:
Data:
>>> list1 = [399826, 399827, 413350, 404450, 399827, 404451]
>>> list2 = [399825, 399826, 412450, 403650, 391227]
>>> list3 = [412450, 399827]
>>> list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]
>>> import pandas as pd
>>> df = pd.DataFrame(list1 list2 list3 list4, columns=['values'])
>>> for i in range(1,5):
>>> v = 'list' str(i)
>>> df[v] = df['values'].apply(lambda x:x in eval(v))
>>> df
values list1 list2 list3 list4
0 399826 True True False False
1 399827 True False True False
2 413350 True False False False
3 404450 True False False False
4 399827 True False True False
5 404451 True False False True
6 399825 False True False False
7 399826 True True False False
8 412450 False True True False
9 403650 False True False False
10 391227 False True False False
11 412450 False True True False
12 399827 True False True False
13 399829 False False False True
14 399246 False False False True
15 513350 False False False True
16 404370 False False False True
17 789827 False False False True
18 439931 False False False True
19 404451 True False False True
CodePudding user response:
Create dictionary by columns for new columns names first, then create dicts
with True
s in values and create DataFrame
, last replace NaN
s to False
s:
list1 = [399826, 399827, 413350, 404450, 399827, 404451]
list2 = [399825, 399826, 412450, 403650, 391227]
list3 = [412450, 399827]
list4 = [399829, 399246, 513350, 404370, 789827, 439931, 404451]
d = {'list1':list1,'list2':list2,'list3':list3,'list4':list4 }
df = pd.DataFrame({k: dict.fromkeys(v, True) for k, v in d.items()}).fillna(False)
print (df)
list1 list2 list3 list4
399826 True True False False
399827 True False True False
413350 True False False False
404450 True False False False
404451 True False False True
399825 False True False False
412450 False True True False
403650 False True False False
391227 False True False False
399829 False False False True
399246 False False False True
513350 False False False True
404370 False False False True
789827 False False False True
439931 False False False True
CodePudding user response:
Create the dic the explode
and crosstab
d = {'list1':list1,'list2':list2,'list3':list3,'list4':list4 }
s = pd.Series(d).explode()
s = pd.crosstab(s,s.index).astype(bool)
Out[67]:
col_0 list1 list2 list3 list4
row_0
391227 False True False False
399246 False False False True
399825 False True False False
399826 True True False False
399827 True False True False
399829 False False False True
403650 False True False False
404370 False False False True
404450 True False False False
404451 True False False True
412450 False True True False
413350 True False False False
439931 False False False True
513350 False False False True
789827 False False False True