I have the following reference
corr = pd.DataFrame({'i':['a','b'],'a':[.1,.2],'b':[.2,.1]}).set_index('i')
I also have some vector values. The length will always change so I'm only using 6 to show what I am trying to achieve.
vectors = pd.DataFrame({'val':['a','b','a','a','b']})
I would like to use these values to generate a 5x5 matrix 'X' such that:
5x5 because vector len(vector) = 5
The closest I can think of is a map function, that generates only one column.
CodePudding user response:
DataFrame.reindex
twice, once for columns and once for rows:
out = corr.reindex(vectors['val']).reindex(vectors['val'], axis=1)
Note that duplicate column names, while supported, is not recommended. For example out['a']
would return a dataframe, while most of the case it returns a series.
CodePudding user response:
We can use DataFrame.reindex
and use both index
and columns
arguments with vectors["val"]
idx = vectors["val"] # vectors["val"].tolist() to avoid naming axes `val`.
corr.reindex(index=idx, columns=idx)
val a b a a a b
val
a 0.1 0.2 0.1 0.1 0.1 0.2
b 0.2 0.1 0.2 0.2 0.2 0.1
a 0.1 0.2 0.1 0.1 0.1 0.2
a 0.1 0.2 0.1 0.1 0.1 0.2
a 0.1 0.2 0.1 0.1 0.1 0.2
b 0.2 0.1 0.2 0.2 0.2 0.1
CodePudding user response:
DataFrame.reindex twice, once for columns and once for rows:
out = corr.reindex(vectors['val']).reindex(vectors['val'], axis=1)
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
Although duplicate column names are supported, they are not recommended