I have the following dataframe:
import pandas as pd
array = {'test_ID': [10, 13, 10, 13, 16],
'test_date': ['2010-09-05', '2010-10-23', '2011-09-12', '2010-05-05', '2010-06-01'],
'Value1': [40, 56, 23, 78, 67],
'Value2': [25, 0, 68, 0, 0]}
df = pd.DataFrame(array)
df
test_ID test_date Value1 Value2
0 10 2010-09-05 40 25
1 13 2010-10-23 56 0
2 10 2011-09-12 23 68
3 13 2010-05-05 78 0
4 16 2010-06-01 67 0
I would like to delete column 'Value2' and combine it in column 'Value1' - but only when Value2 != Zero.
The expected output is:
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67
CodePudding user response:
Use DataFrame.set_index
with DataFrame.stack
for reshape, remove values with 0
, remove last level of MultiIndex by DataFrame.droplevel
and last create 3 columns DataFrame
:
s = df.set_index(['test_ID','test_date']).stack()
df = s[s.ne(0)].reset_index(name='Value1')
df['test_ID'] = df['test_ID'].mask(df.pop('level_2').eq('Value2'), 99)
print (df)
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67
Another solution with DataFrame.melt
and remove 0
rows by DataFrame.loc
:
df = (df.melt(['test_ID','test_date'], value_name='Value1', ignore_index=False)
.assign(test_ID = lambda x: x['test_ID'].mask(x.pop('variable').eq('Value2'), 99))
.sort_index()
.loc[lambda x: x['Value1'].ne(0)]
.reset_index(drop=True))
print (df)
test_ID test_date Value1
0 10 2010-09-05 40
1 99 2010-09-05 25
2 13 2010-10-23 56
3 10 2011-09-12 23
4 99 2011-09-12 68
5 13 2010-05-05 78
6 16 2010-06-01 67
CodePudding user response:
Here is a simple solution by filtering on non zero values.
df = pd.DataFrame(array)
filtered_rows = df.loc[df["Value2"] != 0]
filtered_rows.loc[:,'Value1'] = filtered_rows.loc[:,'Value2']
filtered_rows.loc[:, 'test_ID'] = 99
df = pd.concat([df, filtered_rows]).sort_index().drop(['Value2'], axis=1)
This gives us the expected data :
test_ID test_date Value1
0 10 2010-09-05 40
0 99 2010-09-05 25
1 13 2010-10-23 56
2 10 2011-09-12 23
2 99 2011-09-12 68
3 13 2010-05-05 78
4 16 2010-06-01 67