Make a matrix from a CSV-CodePudding

I have a csv file, which has two columns, including the School ID and the region they belong to:

School ID      Area Code
  123              1
  124              1
  125              2
  134              2
  197              3
  212              4

I want to make a region matrix out of it, which gives the value "1" if the belong to same region, and "0" if they don't , kind of like this:

      123  124  125  134  197  212
123    1    1    0    0    0    0 
124    1    1    0    0    0    0 
125    0    0    1    1    0    0
134    0    0    1    1    0    0
197    0    0    0    0    1    1
212    0    0    0    0    1    1

I tried with iterating in the entire csv file:

for i in range(num_schools):
   for j in range(num_schools):
        if df['Area Code'] != df['Area Code']:
            c[i][j] = 0
        else:
            c[i][j] = 1

The problem is, I dont know how iterate in the csv file, I know I have to change the "if" command, but I am not sure how? Kindly guide me!

CodePudding user response：

Since the answer has a Pandas tag: With a dataframe df like

   School ID  Area Code
0        123          1
1        124          1
2        125          2
3        134          2
4        197          3
5        212          4

this

df = (
    df
    .groupby("Area Code").agg(list)
    .assign(columns=lambda df: df["School ID"]).rename(columns={"School ID": "index"})
    .explode("index").explode("columns")
    .assign(values=1)
    .pivot_table(index="index", columns="columns", values="values", fill_value=0)
)

gives you

columns  123  124  125  134  197  212
index                                
123        1    1    0    0    0    0
124        1    1    0    0    0    0
125        0    0    1    1    0    0
134        0    0    1    1    0    0
197        0    0    0    0    1    0
212        0    0    0    0    0    1

CodePudding user response：

First open and read the csv
Create hashmap of area code and the schools in that area
create a matrix and store the values by iterating through each of area code key in hashmap

CodePudding user response：

import io  

import pandas as pd

data = """School ID;Area Code
  123;1
  124;1
  125;2
  134;2
  197;3
  212;4"""

df = pd.read_csv(io.StringIO(data), sep=";", index_col=0, parse_dates=True)

df['School ID'] = df.index

map = df.groupby('Area Code')['School ID'].apply(list)
peers = df['Area Code'].map(map).explode()

ct = pd.crosstab(peers.index, peers)

Output:

123 124 125 134 197 212
School ID                       
123 1   1   0   0   0   0
124 1   1   0   0   0   0
125 0   0   1   1   0   0
134 0   0   1   1   0   0
197 0   0   0   0   1   0
212 0   0   0   0   0   1