Home > Software engineering >  Python Pandas - Matrix Generation From CSV
Python Pandas - Matrix Generation From CSV

Time:05-10

Beginner to Pandas, but i think it is the right tool for this.

I have a CSV file that looks like this, showing the relationship between Source/Destination/Value:

Source Destination Value
Item1 Item1 true
Item1 Item2 true
Item2 Item1 false
Item2 Item2 true

(there are thousands of "items")

The end goal is to achieve a CSV such as:

Item1 Item2 Item3 Item4
Item1 true true false false
Item2 true false true true

I have generated a new CSV with the Row Headers and Column Headers, however, i am not sure how to fill in the value element.
How can i read in the CSV to maintain the relationship between Source/Dest/Value and output to a CSV?
Current code that generates a new CSV:

df = pd.read_csv('Input.csv')
unique = df['Source'].unique()
unique.sort()
headers = [x for x in unique]
outfile = open("newcsv.csv",'w ', newline='')
writer = csv.writer(outfile,delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow([" "]   headers)
for item in headers:
    writer.writerow([item])

CodePudding user response:

There are few options for you. You can go with pivot (an example is already in the comments) or pivot_table. The difference of them is that pivot_table can handle duplicate values of a pivoted column, so if you do have rows where the same relationship is shown again, go with that.

Code can look like that:

df = pd.read_csv('Input.csv')

res = df.pivot_table(values='Value', index='Source', columns='Destination',aggfunc=max)
# without aggfunc, you get 0 and 1 as result, you can also go with "any" for example. 
# if duplicate rows with different value in "Value" then you can decide about how these values gets aggregated.

res.to_csv('Output.csv')

Other possibility would be pd.crosstab (you'd also have to choose an aggfunc here) or

CodePudding user response:

I think pandas 'pivot' would help you here.

df = pd.read_csv("./source-def.csv", sep="\t")
df_op = df.pivot(index='Source', columns='Destination', values='Value').fillna(False)

df_op.head()
Destination Item1 Item2 Item3 Item4
Source
Item1 True True False False
Item2 False True True True

Finally, you can use 'to_csv' like this to write the file into another CSV file again.

df_op.to_csv('pivot.csv')
  • Related