Home > Enterprise >  Is there a way to concatenate columns by groups of 2 same values in pandas
Is there a way to concatenate columns by groups of 2 same values in pandas

Time:03-09

Say i have something like this in a pandas dataframe :

Entity Type Doc Proj
Daniel PER 1 1
Daniel PER 4 2
Daniel PER 5 3
Daniel PER 9 6
Daniel LOC 7 4
905-888-8988 ID 3 1
905-888-8988 ID 4 2
905-888-8988 ID 14 8

For each combo of entity and type that is recurring, i'd like to add two new columns for the doc and proj corresponding to the match. I'd like to do this to all possible match between combo of entity.

Edit 1 More detailed explanation to get to the expected outcome

Step 1 - Identify if an "Entity" and "Type" combo has more than 1 occurence in the dataframe.

Step 2 - For each combo that have more than 1 occurence, i would need to represent all possible combination of "Doc" and "Proj" of the combos.

Step 3 - All these possible combination should be represented in pairs of "doc" and "proj"

So the result would look like this in the pandas dataframe :

Entity Type Doc1 Proj1 Doc2 Proj2
Daniel PER 1 1 4 2
Daniel PER 1 1 5 3
Daniel PER 1 1 9 6
Daniel PER 4 2 5 3
Daniel PER 4 2 9 6
Daniel PER 5 3 9 6
905-888-8988 ID 3 1 4 2
905-888-8988 ID 3 1 14 8
905-888-8988 ID 4 2 14 8

Thanks all for the help

CodePudding user response:

here is one way:

  1. reset_index to copy the index as a column
  2. use merge to join df with itself on entity and type columns
  3. remove the duplicate pairs by only keeping the smaller index of original right side of merge.
df = df.reset_index()
res = pd.merge(df, df, on=['Entity', 'Type'],suffixes=['1', '2'])
res = res.loc[res.index1 < res.index2].drop(columns=['index1', 'index2']).reset_index(drop=True)

output:

>>
         Entity Type  Doc1  Proj1  Doc2  Proj2
0        Daniel  PER     1      1     4      2
1        Daniel  PER     1      1     5      3
2        Daniel  PER     1      1     9      6
3        Daniel  PER     4      2     5      3
4        Daniel  PER     4      2     9      6
5        Daniel  PER     5      3     9      6
6  905-888-8988   ID     3      1     4      2
7  905-888-8988   ID     3      1    14      8
8  905-888-8988   ID     4      2    14      8
  • Related