My goal is to define a data frame matrix (df2) whose rows correspond to communes and the columns to the cantons of Switzerland. I need to fill in the matrix with 0/1 values where entry (i,j) is a 1 if the commune in row i is in the canton in column j and a 0 otherwise.
[df1] (parameters) : https://i.stack.imgur.com/0u7w8.png
[df2] (dataframe to fill) : https://i.stack.imgur.com/AR0mM.png
What i try to do is to loop like this :
for row in df2:
for column in df2.columns:
if column == df1.loc[row]:
df2[column] = 1
else:
df2[column] = 0
But it doesn't run. The error is about "df1.loc[row]"
- KeyError: 'ZH'
Would appreciate any educational help. Thanks in advance :)
CodePudding user response:
You can use the first dataframe to generate a crosstab or a pivot table, then use that to reindex like df2:
out = pd.crosstab(df1.index, df1['Canton']).reindex_like(df2)
or:
out = (df1
.reset_index()
.pivot_table(index='Commune', columns='Canton', aggfunc=lambda x: 1, fill_value=0)
.reindex_like(df2)
)
example output:
Canton BS GE VD ZH
Commune
Basel 1 0 0 0
Genève 0 1 0 0
Lausanne 0 0 1 0
Zürich 0 0 0 1