Beginner to Pandas, but i think it is the right tool for this.
I have a CSV file that looks like this, showing the relationship between Source/Destination/Value:
Source | Destination | Value |
---|---|---|
Item1 | Item1 | true |
Item1 | Item2 | true |
Item2 | Item1 | false |
Item2 | Item2 | true |
(there are thousands of "items")
The end goal is to achieve a CSV such as:
Item1 | Item2 | Item3 | Item4 | |
---|---|---|---|---|
Item1 | true | true | false | false |
Item2 | true | false | true | true |
I have generated a new CSV with the Row Headers and Column Headers, however, i am not sure how to fill in the value element.
How can i read in the CSV to maintain the relationship between Source/Dest/Value and output to a CSV?
Current code that generates a new CSV:
df = pd.read_csv('Input.csv')
unique = df['Source'].unique()
unique.sort()
headers = [x for x in unique]
outfile = open("newcsv.csv",'w ', newline='')
writer = csv.writer(outfile,delimiter = ',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow([" "] headers)
for item in headers:
writer.writerow([item])
CodePudding user response:
There are few options for you. You can go with pivot
(an example is already in the comments) or pivot_table
. The difference of them is that pivot_table
can handle duplicate values of a pivoted column, so if you do have rows where the same relationship is shown again, go with that.
Code can look like that:
df = pd.read_csv('Input.csv')
res = df.pivot_table(values='Value', index='Source', columns='Destination',aggfunc=max)
# without aggfunc, you get 0 and 1 as result, you can also go with "any" for example.
# if duplicate rows with different value in "Value" then you can decide about how these values gets aggregated.
res.to_csv('Output.csv')
Other possibility would be pd.crosstab
(you'd also have to choose an aggfunc here) or
CodePudding user response:
I think pandas 'pivot' would help you here.
df = pd.read_csv("./source-def.csv", sep="\t")
df_op = df.pivot(index='Source', columns='Destination', values='Value').fillna(False)
df_op.head()
Destination | Item1 | Item2 | Item3 | Item4 |
---|---|---|---|---|
Source | ||||
Item1 | True | True | False | False |
Item2 | False | True | True | True |
Finally, you can use 'to_csv' like this to write the file into another CSV file again.
df_op.to_csv('pivot.csv')