I have a DataFrame and want to get divisions of pairs of columns like below:
df = pd.DataFrame({
'a1': np.random.randint(1, 1000, 1000),
'a2': np.random.randint(1, 1000, 1000),
'b1': np.random.randint(1, 1000, 1000),
'b2': np.random.randint(1, 1000, 1000),
'c1': np.random.randint(1, 1000, 1000),
'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']
I want to combine the last three lines into one like:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]
but I only get an error of ValueError: Columns must be same length as key
. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']])
, I will only get a DataFrame with NaN
s of shape (1000, 6).
CodePudding user response:
You can use:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values
OUTPUT
a1 a2 b1 b2 c1 c2 a b c
0 864 214 551 761 174 111 0.247685 1.381125 0.637931
1 820 971 379 79 190 587 1.184146 0.208443 3.089474
2 305 154 519 378 567 186 0.504918 0.728324 0.328042
3 51 505 303 417 959 326 9.901961 1.376238 0.339937
4 84 531 625 899 248 905 6.321429 1.438400 3.649194
.. ... ... ... ... ... ... ... ... ...
995 302 695 790 777 896 975 2.301325 0.983544 1.088170
996 24 308 462 316 388 784 12.833333 0.683983 2.020619
997 135 286 359 752 282 283 2.118519 2.094708 1.003546
998 695 45 832 936 811 404 0.064748 1.125000 0.498150
999 809 454 971 335 366 884 0.561187 0.345005 2.415301
CodePudding user response:
You could do the following as simple workaround:
df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])
Although I think that using the assign
method would make your code much more readable:
df.assign(a=lambda x: x['a2'] / x['a1'],
b=lambda x: x['b2'] / x['b1'],
c=lambda x: x['c2'] / x['c1'])
CodePudding user response:
MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.
Using sample data below:
df = pd.DataFrame({'a_1':range(2,10,2),
'a_2': range(4, 20, 4),
'b_1': range(3, 15,3),
'b_2': range(6,30,6),
'c_1': range(5, 25, 5),
'c_2': range(10, 50, 10)})
df
a_1 a_2 b_1 b_2 c_1 c_2
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
split the columns into a MultiIndex:
temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()
temp
1 2 1 2 1 2
a a b b c c
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
In this form, you can simply select 2
divided by 1
:
df['a','b','c']] = temp['2'] / temp['1']
This gives the same values as :
df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values
Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.
Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.