Home > Blockchain >  How can I divide explicit columns of a Dataframe with a single column and add a new header?
How can I divide explicit columns of a Dataframe with a single column and add a new header?

Time:04-22

I would like to divide all columns, except the first, with a specific column of a dataframe and add the results as new columns with a new header, but I'm stuck. Here is my approach, but please be gentle, I just started programming a month ago..:

I got this example dataframe:

np.random.seed(0)
data = pd.DataFrame(np.random.randint(1,10,size=(100, 10)),
                    columns=list('ABCDEFGHIJ'))

Now I create a list of the columns and drop 'A' and 'J':

cols = list(data.drop(columns=['A', 'J']).columns)

Then I would like to divide the columns B-I by column J. In this example this would be easy, since there are just single letters, but the column names are longer in reality (for example "Donaudampfschifffahrtkapitän" (there are really funny and long words in german). That's why I want to do it with the "cols"-list.

data[[cols]] = data[[cols]].div(data['J'].values,axis=0)

However, I get this error:

KeyError: "None of [Index([('B', 'C', 'D', 'E', 'F', 'G', 'H', 'I')], dtype='object')] are in the [columns]"

What is wrong? Or does someone knows an even better approach?

And how can I add the results with their specific names ('B/J', 'C/J', ..., 'I/J') to the dataframe?

Thx in advance!

CodePudding user response:

You need to remove the [], cols is already a list:

data[cols] = data[cols].div(data['J'], axis=0)

NB. Also using values is not needed as pandas perform index alignment (and anyway you don't change the order of the rows here).

output:

   A         B         C         D         E         F         G         H         I  J
0  6  0.125000  0.500000  0.500000  1.000000  0.500000  0.750000  0.375000  0.625000  8
1  7  1.500000  1.500000  0.333333  1.166667  1.333333  1.333333  1.500000  0.333333  6
2  9  0.555556  0.444444  0.111111  0.444444  0.666667  0.111111  0.333333  0.444444  9
3  2  0.500000  0.500000  0.500000  1.000000  0.125000  0.250000  0.125000  0.625000  8
4  4  0.428571  1.142857  0.428571  0.142857  0.142857  0.714286  0.857143  0.857143  7
...
as new columns
data2 = pd.concat([data, data[cols].div(data['J'], axis=0).add_suffix('/J')],
                  axis=1)

output:

   A  B  C  D  E  F  G  H  I  J       B/J       C/J       D/J       E/J  \
0  6  1  4  4  8  4  6  3  5  8  0.125000  0.500000  0.500000  1.000000   
1  7  9  9  2  7  8  8  9  2  6  1.500000  1.500000  0.333333  1.166667   
2  9  5  4  1  4  6  1  3  4  9  0.555556  0.444444  0.111111  0.444444   
3  2  4  4  4  8  1  2  1  5  8  0.500000  0.500000  0.500000  1.000000   
4  4  3  8  3  1  1  5  6  6  7  0.428571  1.142857  0.428571  0.142857   

        F/J       G/J       H/J       I/J  
0  0.500000  0.750000  0.375000  0.625000  
1  1.333333  1.333333  1.500000  0.333333  
2  0.666667  0.111111  0.333333  0.444444  
3  0.125000  0.250000  0.125000  0.625000  
4  0.142857  0.714286  0.857143  0.857143 

CodePudding user response:

Because cols is list remove nested []:

data = pd.DataFrame(np.random.randint(1,10,size=(100, 10)), columns=list('ABCDEFGHIJ'))

#you can already drop from columns names, converting to list is not necessary
cols = data.columns.drop(['A', 'J'])
#alternative solution
cols = data.columns.difference(['A', 'J'], sort=False)
data[cols] = data[cols].div(data['J'],axis=0)

print (data)
    A         B         C         D         E         F         G         H  \
0   2  1.000000  0.200000  0.200000  0.400000  1.600000  1.200000  0.800000   
1   2  0.428571  0.285714  0.857143  1.142857  0.142857  0.714286  0.142857   
2   2  0.222222  0.444444  1.000000  0.111111  0.222222  0.222222  0.333333   
3   2  1.500000  3.000000  0.500000  0.500000  3.500000  2.000000  3.000000   
4   1  0.666667  1.333333  0.833333  0.166667  1.166667  0.500000  1.500000   
.. ..       ...       ...       ...       ...       ...       ...       ...   
95  8  0.857143  1.142857  0.142857  1.000000  0.571429  0.142857  1.000000   
96  1  5.000000  4.000000  8.000000  8.000000  2.000000  7.000000  3.000000   
97  2  0.888889  0.222222  0.222222  0.666667  1.000000  0.333333  0.444444   
98  7  2.333333  0.666667  3.000000  2.000000  0.666667  2.000000  1.333333   
99  2  2.000000  6.000000  8.000000  5.000000  9.000000  5.000000  3.000000   

           I  J  
0   0.800000  5  
1   1.000000  7  
2   1.000000  9  
3   1.000000  2  
4   0.833333  6  
..       ... ..  
95  0.857143  7  
96  3.000000  1  
97  1.000000  9  
98  1.000000  3  
99  8.000000  1  

[100 rows x 10 columns]

If need add new columns use concat:

df = pd.concat([data, data[cols].div(data['J'], axis=0).add_suffix('/J')], axis=1)
  • Related