I'm going through a CSV that has a list of cargo movements between various ports, and I'd like to take all the unique values for the ports into a new list.
Currently, I have the below, it adds every value under the 'Origin Ports' column, how can I make sure it adds just the unique values under that column? Thank you.
import csv
CSV_FILE = "Bitumen2021Exports.csv"
ports = []
with open(CSV_FILE, encoding="utf-8-sig") as bitumen_csv:
bitumen_reader = csv.DictReader(bitumen_csv)
for port in bitumen_reader:
ports.append(port['ORIGIN PORT'])
print(ports)
The data in the CSV looks like below:
CodePudding user response:
One way based on your code:
import csv
CSV_FILE = "Bitumen2021Exports.csv"
ports = []
with open(CSV_FILE, encoding="utf-8-sig") as bitumen_csv:
bitumen_reader = csv.DictReader(bitumen_csv)
for port in bitumen_reader:
if port['ORIGIN PORTS'] not in ports:
ports.append(port['ORIGIN PORTS'])
print(ports)
Another way is to import the csv into a pandas
df and use column.unique()
.
CodePudding user response:
You can also skip handling the "uniqueness logic" and use Python's set, which only allows unique elements:
import csv
CSV_FILE = "Bitumen2021Exports.csv"
ports = set()
with open(CSV_FILE, encoding="utf-8-sig") as bitumen_csv:
bitumen_reader = csv.DictReader(bitumen_csv)
for port in bitumen_reader:
ports.add(port['ORIGIN PORTS'])
print(ports)
Ports, a set, is an iterable, or just convert to a list if you need, list(ports)
.