Home > Mobile >  DataFrame pairs of columns division
DataFrame pairs of columns division

Time:12-21

I have a DataFrame and want to get divisions of pairs of columns like below:

df = pd.DataFrame({
    'a1': np.random.randint(1, 1000, 1000),
    'a2': np.random.randint(1, 1000, 1000),
    'b1': np.random.randint(1, 1000, 1000),
    'b2': np.random.randint(1, 1000, 1000),
    'c1': np.random.randint(1, 1000, 1000),
    'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']

I want to combine the last three lines into one like:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]

but I only get an error of ValueError: Columns must be same length as key. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]), I will only get a DataFrame with NaNs of shape (1000, 6).

CodePudding user response:

You can use:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values

OUTPUT

      a1   a2   b1   b2   c1   c2          a         b         c
0    864  214  551  761  174  111   0.247685  1.381125  0.637931
1    820  971  379   79  190  587   1.184146  0.208443  3.089474
2    305  154  519  378  567  186   0.504918  0.728324  0.328042
3     51  505  303  417  959  326   9.901961  1.376238  0.339937
4     84  531  625  899  248  905   6.321429  1.438400  3.649194
..   ...  ...  ...  ...  ...  ...        ...       ...       ...
995  302  695  790  777  896  975   2.301325  0.983544  1.088170
996   24  308  462  316  388  784  12.833333  0.683983  2.020619
997  135  286  359  752  282  283   2.118519  2.094708  1.003546
998  695   45  832  936  811  404   0.064748  1.125000  0.498150
999  809  454  971  335  366  884   0.561187  0.345005  2.415301

CodePudding user response:

You could do the following as simple workaround:

df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])

Although I think that using the assign method would make your code much more readable:

df.assign(a=lambda x: x['a2'] / x['a1'], 
          b=lambda x: x['b2'] / x['b1'], 
          c=lambda x: x['c2'] / x['c1'])

CodePudding user response:

MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.

Using sample data below:

df = pd.DataFrame({'a_1':range(2,10,2),
            'a_2': range(4, 20, 4),
           'b_1': range(3, 15,3),
           'b_2': range(6,30,6),
           'c_1': range(5, 25, 5),
           'c_2': range(10, 50, 10)})

df
   a_1  a_2  b_1  b_2  c_1  c_2
0    2    4    3    6    5   10
1    4    8    6   12   10   20
2    6   12    9   18   15   30
3    8   16   12   24   20   40

split the columns into a MultiIndex:

temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()

temp

 1   2   1   2   1   2
   a   a   b   b   c   c
0  2   4   3   6   5  10
1  4   8   6  12  10  20
2  6  12   9  18  15  30
3  8  16  12  24  20  40

In this form, you can simply select 2 divided by 1:

df['a','b','c']] = temp['2'] / temp['1']

This gives the same values as :

df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values

Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.

Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.

  • Related