Home > other >  Combining one column values in another
Combining one column values in another

Time:07-08

I have the following dataframe:

import pandas as pd
array = {'test_ID': [10, 13, 10, 13, 16],
         'test_date': ['2010-09-05', '2010-10-23', '2011-09-12', '2010-05-05', '2010-06-01'],
         'Value1': [40, 56, 23, 78, 67],
         'Value2': [25, 0, 68, 0, 0]}
df = pd.DataFrame(array)
df

   test_ID   test_date  Value1  Value2
0       10  2010-09-05      40      25
1       13  2010-10-23      56       0
2       10  2011-09-12      23      68
3       13  2010-05-05      78       0
4       16  2010-06-01      67       0

I would like to delete column 'Value2' and combine it in column 'Value1' - but only when Value2 != Zero.

The expected output is:

   test_ID   test_date  Value1
0       10  2010-09-05      40
1       99  2010-09-05      25
2       13  2010-10-23      56
3       10  2011-09-12      23
4       99  2011-09-12      68
5       13  2010-05-05      78
6       16  2010-06-01      67

CodePudding user response:

Use DataFrame.set_index with DataFrame.stack for reshape, remove values with 0, remove last level of MultiIndex by DataFrame.droplevel and last create 3 columns DataFrame:

s = df.set_index(['test_ID','test_date']).stack()
df = s[s.ne(0)].reset_index(name='Value1')
df['test_ID'] = df['test_ID'].mask(df.pop('level_2').eq('Value2'), 99)
print (df)
   test_ID   test_date  Value1
0       10  2010-09-05      40
1       99  2010-09-05      25
2       13  2010-10-23      56
3       10  2011-09-12      23
4       99  2011-09-12      68
5       13  2010-05-05      78
6       16  2010-06-01      67

Another solution with DataFrame.melt and remove 0 rows by DataFrame.loc:

df = (df.melt(['test_ID','test_date'], value_name='Value1', ignore_index=False)
        .assign(test_ID =  lambda x: x['test_ID'].mask(x.pop('variable').eq('Value2'), 99))
        .sort_index()
        .loc[lambda x: x['Value1'].ne(0)]
        .reset_index(drop=True))
print (df)
   test_ID   test_date  Value1
0       10  2010-09-05      40
1       99  2010-09-05      25
2       13  2010-10-23      56
3       10  2011-09-12      23
4       99  2011-09-12      68
5       13  2010-05-05      78
6       16  2010-06-01      67

CodePudding user response:

Here is a simple solution by filtering on non zero values.

df = pd.DataFrame(array)
filtered_rows = df.loc[df["Value2"] != 0]
filtered_rows.loc[:,'Value1'] = filtered_rows.loc[:,'Value2']
filtered_rows.loc[:, 'test_ID'] = 99
df = pd.concat([df, filtered_rows]).sort_index().drop(['Value2'], axis=1)

This gives us the expected data :

   test_ID   test_date  Value1
0       10  2010-09-05      40
0       99  2010-09-05      25
1       13  2010-10-23      56
2       10  2011-09-12      23
2       99  2011-09-12      68
3       13  2010-05-05      78
4       16  2010-06-01      67
  • Related