Assuming I have the following toy dataframe:
Firm num1 num2 num3
A 0.1 0.2 0.3
B 0.4 1.5 9.7
C 2.1 3.7 1.5
D 6.2 2.3 5.5
I want to combine columns num1, num2, num3 into arrays (lists), so that A = [0.1, 0.2, 0.3] and create a new column that is this new array. My method so far involves converting them to strings first.
Firm arrone
A [0.1, 0.2, 0.3]
B [0.4, 1.5,9.7 ]
C [2.1, 3.7, 1.5]
D [6.2, 2.3, 5.5]
Next, I want to create combinations using itertools
. This has been achieved here
out = df.merge(df,how='cross',suffixes = ('_1', '_2')).query('base1_1<base1_2')
Firm_1 Firm_2 arrone arrtwo
A B [0.1, 0.2, 0.3] [0.4, 1.5, 9.7]
A C [0.1, 0.2, 0.3] [2.1, 3.7, 1.5]
A D [0.1, 0.2, 0.3] [6.2, 2.3, 5.5]
B C [0.4, 1.5,9.7 ] [2.1, 3.7, 1.5]
B D [0.4, 1.5,9.7 ] [6.2, 2.3, 5.5]
C D [2.1, 3.7, 1.5] [6.2, 2.3, 5.5]
Finally, I want to create df['corrcoef'] = correlation between arrone and arrtwo. This link has been helpful. However, the manually inputted toy model has the column in list form. My working dataset is loaded from a csv file. For that, I get the following error when I try the above:
ValueError: x and y must have the same length.
I wish to have a final dataset that looks like this:
Firm_1 Firm_2 arrone arrtwo corrcoef
A B [0.1, 0.2, 0.3] [0.4, 1.5, 9.7] 0.8
A C [0.1, 0.2, 0.3] [2.1, 3.7, 1.5] 0.3
A D [0.1, 0.2, 0.3] [6.2, 2.3, 5.5] 0.2
B C [0.4, 1.5,9.7 ] [2.1, 3.7, 1.5] 0.5
B D [0.4, 1.5,9.7 ] [6.2, 2.3, 5.5] 0.7
C D [2.1, 3.7, 1.5] [6.2, 2.3, 5.5] 0.9
CodePudding user response:
If you want to convert your initial DataFrame of columns into one column with lists, you could use:
df[['Firm']].join(pd.Series(df.filter(like='num').to_numpy().tolist(),
index=df.index, name='arrone'))
output:
Firm arrone
0 A [0.1, 0.2, 0.3]
1 B [0.4, 1.5, 9.7]
2 C [2.1, 3.7, 1.5]
3 D [6.2, 2.3, 5.5]