Python Pandas - Matrix Generation From CSV-CodePudding

Beginner to Pandas, but i think it is the right tool for this.

I have a CSV file that looks like this, showing the relationship between Source/Destination/Value:

Source	Destination	Value
Item1	Item1	true
Item1	Item2	true
Item2	Item1	false
Item2	Item2	true

(there are thousands of "items")

The end goal is to achieve a CSV such as:

	Item1	Item2	Item3	Item4
Item1	true	true	false	false
Item2	true	false	true	true

I have generated a new CSV with the Row Headers and Column Headers, however, i am not sure how to fill in the value element.
How can i read in the CSV to maintain the relationship between Source/Dest/Value and output to a CSV?
Current code that generates a new CSV:

df = pd.read_csv('Input.csv')
unique = df['Source'].unique()
unique.sort()
headers = [x for x in unique]
outfile = open("newcsv.csv",'w ', newline='')
writer = csv.writer(outfile,delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow([" "]   headers)
for item in headers:
    writer.writerow([item])

CodePudding user response：

There are few options for you. You can go with pivot (an example is already in the comments) or pivot_table. The difference of them is that pivot_table can handle duplicate values of a pivoted column, so if you do have rows where the same relationship is shown again, go with that.

Code can look like that:

df = pd.read_csv('Input.csv')

res = df.pivot_table(values='Value', index='Source', columns='Destination',aggfunc=max)
# without aggfunc, you get 0 and 1 as result, you can also go with "any" for example. 
# if duplicate rows with different value in "Value" then you can decide about how these values gets aggregated.

res.to_csv('Output.csv')

Other possibility would be pd.crosstab (you'd also have to choose an aggfunc here) or

CodePudding user response：

I think pandas 'pivot' would help you here.

df = pd.read_csv("./source-def.csv", sep="\t")
df_op = df.pivot(index='Source', columns='Destination', values='Value').fillna(False)

df_op.head()

Destination	Item1	Item2	Item3	Item4
Source
Item1	True	True	False	False
Item2	False	True	True	True

Finally, you can use 'to_csv' like this to write the file into another CSV file again.

df_op.to_csv('pivot.csv')