Home > Software design >  Rearranging cells in a .tsv file
Rearranging cells in a .tsv file

Time:03-19

I have a .tsv file which I have attached along with this post. I have rows(cells) in the format of A1,A2,A3...A12 , B1..B2, .... H1..H12. I need to re-arrange this to a format like A1,B1,C1,D1,...H1 , A2,B2,C2,...H2 ..... A12,B12,C12,...H12. I need to do this using Python.

I have another .tsv file that allows me to compare it with this file. It is called flipped.tsv . The flipped.tsv file contains the accurate well values corresponding to the cells. In other words, I must map the well values with their accurate cell-lines.

From what I have understood is that the cell line of the meta-data is incorreclty arranged in column-major whereas it has to be arranged in a row-major format like how it is in flipped.tsv file.

For example : "A2 of flipped_metadata.tsv has the same well values as that of B1 of metadata.tsv."

What is the logic that I can carry out to perform this in Python?

First .tsv file flipped .tsv file

CodePudding user response:

You could do the following:

import csv

# Read original file
with open("file.tsv", "r") as file:
    rows = list(csv.reader(file, delimiter="\t"))

# Key function for sorting
def key_func(row):
    """ Transform row in sort key, e.g. ['A7', 1, 2] -> (7, 'A') """
    return int(row[0][1:]), row[0][0]

# Write `flipped´ file
with open("file_flipped.tsv", "w") as file:
    writer = csv.writer(file, delimiter="\t")
    writer.writerows(
        row[:1]   flipped[1:]
        for row, flipped in zip(rows, sorted(rows, key=key_func))
    )

The flipping is done by sorting the original rows by

  • first the integer part of their first row entry int(row[0][1:]), and
  • then the character part of their first entry row[0][0].

See tio.run illustration here.


This only works as long as the non-number part has always exactly one character.

If the non-number part has always exactly 2 characters then the return of the key function has to be adjusted to int(row[0][2:]), row[0][:2] etc.

If there's more variability allowed, e.g. between 1 and 5 characters, then a regex approach would be more appropriate:

import re

re_key = re.compile(r"([a-zA-Z] )(\d )")

def key_func(row):
    """ Transform row in sort key, e.g. ['Aa7', 10, 20] -> (7, 2, 'Aa') """
    word, number = re_key.match(row[0]).group(1, 2)
    return int(number), len(word), word

Here's a regex demo.

And, depending on how the words have to be sorted, it might be necessary to include the length of the word into the sort key: Python sorts ['B', 'AA', 'A'] naturally into ['A', 'AA', 'B'] and not ['A', 'B', 'AA']. The addition of the length, like in the function, does achieve that.

  • Related