Home > Software design >  How to create new columns with duplicated records and copy some of their column values?
How to create new columns with duplicated records and copy some of their column values?

Time:10-28

I wonder if there is an easy way to achieve what I want. Basically, I want to remove duplicated data from a DataFrame and store two records in one, creating a new column new_value2 where I will store one value from other record that comply certain condition.

I have a DataFrame where the column key2 can have a duplicated value, two duplicates as much. If this is the case, and the column value2 in the same row is equal to copy, then what I want is to move the value in column value to the column new_value2 of the record where the key2 column has the same value (but the value2 colum value is NaN).

I think you will understand better looking at the image:

img

I forgot to add the key1 and key3 to the subset as you can check in this source code:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'key1': ['a', 'b', 'c', 'c', 'd', 'e', 'f', 'f', 'g'],
    'key2': [23423, 4565, 7777, 7777, 9789, 567567, 3333, 3333, 1246],
    'key3': [1, 2, 3, 3, 4, 5, 6, 6, 7],
    'value': [23453, 345, 234 ,556, 4, 5678, 56767, 2348, 932],
    'value2': [np.nan, np.nan, np.nan, 'copy', np.nan, np.nan, np.nan, 'copy', np.nan]
})

df['duplicated_map'] = df.duplicated(subset=['key1', 'key2', 'key3'], keep=False)

df['new_value2'] = np.nan  # ??
df

Finally I would like to remove the original row where I took the value.

CodePudding user response:

If there are at most two duplicate values, you can do groupby.apply

def set_value(g):
    if len(g) > 1:
        g.loc[g['value2'].ne('copy'), 'new_value2'] = g.loc[g['value2'].eq('copy'), 'value'].values
        return g[g['value2'].ne('copy')]
    else:
        return g

out = df.groupby('key2').apply(set_value).reset_index(drop=True)
print(out)

  key1    key2  key3  value value2  new_value2
0    g    1246     7    932    NaN         NaN
1    f    3333     6  56767    NaN      2348.0
2    b    4565     2    345    NaN         NaN
3    c    7777     3    234    NaN       556.0
4    d    9789     4      4    NaN         NaN
5    a   23423     1  23453    NaN         NaN
6    e  567567     5   5678    NaN         NaN
  • Related