I have a .tsv file which I have attached along with this post. I have rows(cells) in the format of A1,A2,A3...A12 , B1..B2, .... H1..H12. I need to re-arrange this to a format like A1,B1,C1,D1,...H1 , A2,B2,C2,...H2 ..... A12,B12,C12,...H12. I need to do this using Python.
I have another .tsv file that allows me to compare it with this file. It is called flipped.tsv . The flipped.tsv file contains the accurate well values corresponding to the cells. In other words, I must map the well values with their accurate cell-lines.
From what I have understood is that the cell line of the meta-data is incorreclty arranged in column-major whereas it has to be arranged in a row-major format like how it is in flipped.tsv file.
For example : "A2 of flipped_metadata.tsv has the same well values as that of B1 of metadata.tsv."
What is the logic that I can carry out to perform this in Python?
First .tsv file flipped .tsv file
CodePudding user response:
You could do the following:
import csv
# Read original file
with open("file.tsv", "r") as file:
rows = list(csv.reader(file, delimiter="\t"))
# Key function for sorting
def key_func(row):
""" Transform row in sort key, e.g. ['A7', 1, 2] -> (7, 'A') """
return int(row[0][1:]), row[0][0]
# Write `flipped´ file
with open("file_flipped.tsv", "w") as file:
writer = csv.writer(file, delimiter="\t")
writer.writerows(
row[:1] flipped[1:]
for row, flipped in zip(rows, sorted(rows, key=key_func))
)
The flipping is done by sorting the original rows by
- first the integer part of their first row entry
int(row[0][1:])
, and - then the character part of their first entry
row[0][0]
.
See tio.run illustration here.
This only works as long as the non-number part has always exactly one character.
If the non-number part has always exactly 2 characters then the return of the key function has to be adjusted to int(row[0][2:]), row[0][:2]
etc.
If there's more variability allowed, e.g. between 1 and 5 characters, then a regex approach would be more appropriate:
import re
re_key = re.compile(r"([a-zA-Z] )(\d )")
def key_func(row):
""" Transform row in sort key, e.g. ['Aa7', 10, 20] -> (7, 2, 'Aa') """
word, number = re_key.match(row[0]).group(1, 2)
return int(number), len(word), word
Here's a regex demo.
And, depending on how the words have to be sorted, it might be necessary to include the length of the word into the sort key: Python sorts ['B', 'AA', 'A']
naturally into ['A', 'AA', 'B']
and not ['A', 'B', 'AA']
. The addition of the length, like in the function, does achieve that.