I have a csv file, which has two columns, including the School ID and the region they belong to:
School ID Area Code
123 1
124 1
125 2
134 2
197 3
212 4
I want to make a region matrix out of it, which gives the value "1" if the belong to same region, and "0" if they don't , kind of like this:
123 124 125 134 197 212
123 1 1 0 0 0 0
124 1 1 0 0 0 0
125 0 0 1 1 0 0
134 0 0 1 1 0 0
197 0 0 0 0 1 1
212 0 0 0 0 1 1
I tried with iterating in the entire csv file:
for i in range(num_schools):
for j in range(num_schools):
if df['Area Code'] != df['Area Code']:
c[i][j] = 0
else:
c[i][j] = 1
The problem is, I dont know how iterate in the csv file, I know I have to change the "if" command, but I am not sure how? Kindly guide me!
CodePudding user response:
Since the answer has a Pandas tag: With a dataframe df
like
School ID Area Code
0 123 1
1 124 1
2 125 2
3 134 2
4 197 3
5 212 4
this
df = (
df
.groupby("Area Code").agg(list)
.assign(columns=lambda df: df["School ID"]).rename(columns={"School ID": "index"})
.explode("index").explode("columns")
.assign(values=1)
.pivot_table(index="index", columns="columns", values="values", fill_value=0)
)
gives you
columns 123 124 125 134 197 212
index
123 1 1 0 0 0 0
124 1 1 0 0 0 0
125 0 0 1 1 0 0
134 0 0 1 1 0 0
197 0 0 0 0 1 0
212 0 0 0 0 0 1
CodePudding user response:
- First open and read the csv
- Create hashmap of area code and the schools in that area
- create a matrix and store the values by iterating through each of area code key in hashmap
CodePudding user response:
import io
import pandas as pd
data = """School ID;Area Code
123;1
124;1
125;2
134;2
197;3
212;4"""
df = pd.read_csv(io.StringIO(data), sep=";", index_col=0, parse_dates=True)
df['School ID'] = df.index
map = df.groupby('Area Code')['School ID'].apply(list)
peers = df['Area Code'].map(map).explode()
ct = pd.crosstab(peers.index, peers)
Output:
123 124 125 134 197 212
School ID
123 1 1 0 0 0 0
124 1 1 0 0 0 0
125 0 0 1 1 0 0
134 0 0 1 1 0 0
197 0 0 0 0 1 0
212 0 0 0 0 0 1