Home > Enterprise >  How to make a new column by lists of a column in pandas?
How to make a new column by lists of a column in pandas?

Time:10-19

I want to make a new column based on the other column. The column C is a list of words. I want to make lists of all pairs. For example consider 2 rows of the column C as below!

Existing column "C" 
[‘a’,’b’,’c’]
[‘g’,’h’, ‘j’]

I want the new column to be like this:

New column
[‘a’, ‘b’],[‘a’, ‘c’],[‘b’, ‘c’]
[‘g’, ‘h’],[‘g’, ‘j’],[‘h’, ‘j’]

I know how to do that by using list’s method. For example the following:

List_words=Df[‘C].tolist()
pairs_of_skills=[]
for l in List_words:
    Each=[]
    for i in range(len(l)-1):
        for j in range(i 1,len(l)):
            if j!=i:      
               Each.append(sorted([l[j],l[i]]))

    pairs_of_skills.append(pairs_of_skills)

And then Each can be a new column. But I am looking for a more efficient method. I have a huge data set with large sets. Is there any faster method?

CodePudding user response:

Use itertools.combinations:

>>> import itertools
>>> df['C'].apply(lambda x: list(itertools.combinations(x, 2)))
0    [(a, b), (a, c), (b, c)]
1    [(g, h), (g, j), (h, j)]
Name: C, dtype: object
>>> 

As a new DataFrame:

>>> pd.DataFrame(df['C'].apply(lambda x: list(itertools.combinations(x, 2))).to_numpy(), columns=['New Column'])
                 New Column
0  [(a, b), (a, c), (b, c)]
1  [(g, h), (g, j), (h, j)]
>>> 

Or with assign and pop:

>>> df.assign(**{'New Column': df.pop('C').apply(lambda x: list(itertools.combinations(x, 2)))})
                 New Column
0  [(a, b), (a, c), (b, c)]
1  [(g, h), (g, j), (h, j)]
>>> 
  • Related