How to create new columns with duplicated records and copy some of their column values?-CodePudding

I wonder if there is an easy way to achieve what I want. Basically, I want to remove duplicated data from a DataFrame and store two records in one, creating a new column new_value2 where I will store one value from other record that comply certain condition.

I have a DataFrame where the column key2 can have a duplicated value, two duplicates as much. If this is the case, and the column value2 in the same row is equal to copy, then what I want is to move the value in column value to the column new_value2 of the record where the key2 column has the same value (but the value2 colum value is NaN).

I think you will understand better looking at the image:

I forgot to add the key1 and key3 to the subset as you can check in this source code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'key1': ['a', 'b', 'c', 'c', 'd', 'e', 'f', 'f', 'g'],
    'key2': [23423, 4565, 7777, 7777, 9789, 567567, 3333, 3333, 1246],
    'key3': [1, 2, 3, 3, 4, 5, 6, 6, 7],
    'value': [23453, 345, 234 ,556, 4, 5678, 56767, 2348, 932],
    'value2': [np.nan, np.nan, np.nan, 'copy', np.nan, np.nan, np.nan, 'copy', np.nan]
})

df['duplicated_map'] = df.duplicated(subset=['key1', 'key2', 'key3'], keep=False)

df['new_value2'] = np.nan  # ??
df

Finally I would like to remove the original row where I took the value.

CodePudding user response：

If there are at most two duplicate values, you can do groupby.apply

def set_value(g):
    if len(g) > 1:
        g.loc[g['value2'].ne('copy'), 'new_value2'] = g.loc[g['value2'].eq('copy'), 'value'].values
        return g[g['value2'].ne('copy')]
    else:
        return g

out = df.groupby('key2').apply(set_value).reset_index(drop=True)

print(out)

  key1    key2  key3  value value2  new_value2
0    g    1246     7    932    NaN         NaN
1    f    3333     6  56767    NaN      2348.0
2    b    4565     2    345    NaN         NaN
3    c    7777     3    234    NaN       556.0
4    d    9789     4      4    NaN         NaN
5    a   23423     1  23453    NaN         NaN
6    e  567567     5   5678    NaN         NaN