I want to make a new column based on the other column. The column C
is a list of words. I want to make lists of all pairs. For example consider 2 rows of the column C as below!
Existing column "C"
[‘a’,’b’,’c’]
[‘g’,’h’, ‘j’]
I want the new column to be like this:
New column
[‘a’, ‘b’],[‘a’, ‘c’],[‘b’, ‘c’]
[‘g’, ‘h’],[‘g’, ‘j’],[‘h’, ‘j’]
I know how to do that by using list’s method. For example the following:
List_words=Df[‘C].tolist()
pairs_of_skills=[]
for l in List_words:
Each=[]
for i in range(len(l)-1):
for j in range(i 1,len(l)):
if j!=i:
Each.append(sorted([l[j],l[i]]))
pairs_of_skills.append(pairs_of_skills)
And then Each
can be a new column. But I am looking for a more efficient method. I have a huge data set with large sets. Is there any faster method?
CodePudding user response:
Use itertools.combinations
:
>>> import itertools
>>> df['C'].apply(lambda x: list(itertools.combinations(x, 2)))
0 [(a, b), (a, c), (b, c)]
1 [(g, h), (g, j), (h, j)]
Name: C, dtype: object
>>>
As a new DataFrame
:
>>> pd.DataFrame(df['C'].apply(lambda x: list(itertools.combinations(x, 2))).to_numpy(), columns=['New Column'])
New Column
0 [(a, b), (a, c), (b, c)]
1 [(g, h), (g, j), (h, j)]
>>>
Or with assign
and pop
:
>>> df.assign(**{'New Column': df.pop('C').apply(lambda x: list(itertools.combinations(x, 2)))})
New Column
0 [(a, b), (a, c), (b, c)]
1 [(g, h), (g, j), (h, j)]
>>>