I have a csv file like:
Rack | Tube | Well | sample_vol | solvent_vol |
---|---|---|---|---|
1 | 0 | A1 | 230 | 400 |
1 | 1 | B1 | 200 | 20 |
2 | 2 | G1 | 5 | 30 |
3 | 1 | A1 | 90 | 40 |
3 | 20 | A1 | 100 | 90 |
And i'm trying to make mappings between the different columns for each row, using dictionaries. But I'm stuck at how to make separate dictionaries within a single list for each of the different values of "Rack". Basically I need an output like:
print(rack_list)
[{'0':230,'1':200},{'2':5},{'1':90,'20':200}]
Where each dict in the list stores the mappings for each Rack.
This is what I have so far:
csv_reader = csv.DictReader(csvfile)
header = csv_reader.fieldnames
solvent_volume_map = {}
sample_volume_map = {}
max_rack = None
rack = None
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if max_rack == None or max_rack < rack:
max_rack = rack
destination_well = csv_row['Well']
source_tube = csv_row['Tube']
source_rack = csv_row['Rack']
print(source_rack)
try:
solvent_volume = float(csv_row['solvent_vol'])
sample_volume = float(csv_row['sample_vol'])
except ValueError as e:
# blank csv entry
solvent_volume = "skip"
sample_volume = "skip"
solvent_volume_map[destination_well] = solvent_volume
for i in range(max_rack):
sample_volume_map[source_tube] = sample_volume
rack_list.append(sample_volume_map)
CodePudding user response:
You can go with pandas
package or else with csv
.
with csv
package
Source Code 1
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack_idx_map = {} # mapping of rack number and corresponding index no. in rack_list
idx = 0 # index number
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack in rack_idx_map: # if rack number is present in rack_idx_map
rack_list[rack_idx_map[rack]][csv_row["Tube"]] = int(csv_row["sample_vol"])
else: # if new rack number then add new dict and it's mapping
rack_list.append({csv_row["Tube"]: int(csv_row["sample_vol"])})
rack_idx_map[rack] = idx
idx = 1
print(rack_list)
print(rack_idx_map) # rack 1 mapped at index 0, rack 2 mapped at index 1 and so on
OUTPUT
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
{1: 0, 2: 1, 3: 2}
Source Code 2
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack = None
rack_list = []
temp_dict = {}
prev_rack = 1
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack != prev_rack:
rack_list.append(temp_dict)
temp_dict = {}
temp_dict[csv_row["Tube"]] = int(csv_row["sample_vol"])
prev_rack = rack
rack_list.append(temp_dict)
rack_list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
PS of Source Code 2:
Assuming Rack
is in sorted order and it start's from 1
with pandas
package
Source Code
import pandas as pd
df = pd.read_csv("./test.csv") # read csv file
# pandas will by default type cast the data type
df["Tube"] = df["Tube"].astype(str) # will cast Tube from int to str
df.groupby("Rack")[["Tube", "sample_vol"]].apply(lambda row: dict([*row.values])).tolist()
# grouping data based on Rack then selecting Tube and sample_vol column then converting it's row value to dict and back to list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
CodePudding user response:
You can use pandas:
import pandas as pd
df = pd.read_csv('1.csv')
rack_list = df.groupby(['Rack'])[['Tube','sample_vol']].apply(lambda g:dict(map(tuple, g.values.tolist()))).tolist()
print(rack_list)
Output:
[{0: 230, 1: 200}, {2: 5}, {1: 90, 20: 100}]