Home > other >  Python - Add rows based on information in columns
Python - Add rows based on information in columns

Time:06-07

I want to add rows in Python based on the information in some of the columns. For example let's say this is my data

df = pd.DataFrame({
    'ID':[1,2,3],
    'E Test':['Y','Y','N'],
    'M Test':['Y','Y','Y'],
})

enter image description here

For the row with ID equal to 1, I'd like to add a column that says if the column labeled "E Test" equals "Y" then new column "Test Date" equals "April 1". I'd like to do the same for the "M Test" but with a different date and add a completely new row for the ID equal to 1. Therefore there would be 2 rows that have the ID equal to 1 and with different "Test Date" numbers.

Here is what it would look like ideally:

Ideal

CodePudding user response:

I use two different dates to show how they can be individually changed.

df2 = df.melt('ID', var_name='Test', value_name='Test Date')
df2['Test'] = df2['Test'].str[0]
df2.replace({'Y': True, 'N': np.nan}, inplace=True)
df2.dropna(inplace=True)
df2.loc[df2['Test'].eq('E'), 'Test Date'] = '1-Apr'
df2.loc[df2['Test'].eq('M'), 'Test Date'] = '2-Apr'
df2 = df2.sort_values('ID').reset_index(drop=True)
print(df2)

Output:

   ID Test Test Date
0   1    E     1-Apr
1   1    M     2-Apr
2   2    E     1-Apr
3   2    M     2-Apr
4   3    M     2-Apr

Filtered down to just ID == 1:

print(df2[df2['ID'].eq(1)])

...

   ID Test Test Date
0   1    E     1-Apr
1   1    M     2-Apr

CodePudding user response:

Generated test data :

   ID E TEST M TEST
0   1      Y      Y
1   3      Y      N
2   4      Y      N
3   5      N      Y
4   6      N      Y
5   7      Y      N
6   8      Y      Y
7   9      Y      Y
8  10      N      Y

Then:

test_dates={'E':'1-Apr','M':'1-May'}

df = df.melt(id_vars='ID').sort_values(['ID','variable']) \
    .assign(Test=lambda x: x['variable'].str.slice(0,1)) \
    .assign(**{'Test Date': lambda x: x['Test'].map(test_dates)}) \
    .loc[lambda x: x['value']=='Y',['ID','Test','Test Date']].reset_index(drop=True)

print(df)

Which results in:

    ID Test Test Date
0    1    E     1-Apr
1    1    M     1-May
2    3    E     1-Apr
3    4    E     1-Apr
4    5    M     1-May
5    6    M     1-May
6    7    E     1-Apr
7    8    E     1-Apr
8    8    M     1-May
9    9    E     1-Apr
10   9    M     1-May
11  10    M     1-May
  • Related