I try to write a script taking lists of different size as input, and giving as output the longest lists of the input including the characters of the shortest lists.
I have put the list in a dataframe, and use a script that loop through all the values of the dataframe to see if the same characters are present in the same lists, and printing the longest if there is a match.
lists = [['a','b','g'], ['a','c','d','e','g'], ['a','b'], ['b', 'd', 'f'], ['a', 'c']]
df = pd.DataFrame(lists)
Define number rows:
nber_rows=len(df.index)
Looping through the dataframe to find matches between the lists:
> listnorep=[] for f in range(nber_rows):
> row1 = df.iloc[f].dropna().tolist();
> list_intersection=[]
> for g in range(nber_rows):
> row2 = df.iloc[g].dropna().tolist();
> check = all( elem in row2 for elem in row1);
> if check == True:
> list_intersection.append(row2);
> if list_intersection:
> listnorep.append(list_intersection);
> else:
> listnorep.append(row1); listnorep
The desired output in this example is:
a b g None None
a c d e g
b d f
CodePudding user response:
You can use set operations. If any set is < to another one, let's not select it:
# aggregate as set (after stacking to drop the NaNs)
s = df.stack().groupby(level=0).agg(set)
# keep rows that do not have any sweet greater than them
df[[not any(a<b for b in s) for a in s]]
Output:
0 1 2 3 4
0 a b g None None
1 a c d e g
3 b d f None None