Blank column appearing in .csv output, how can I remove it?-CodePudding

*Updated to add more lines of input file

I have a .csv file with header and subsequent data as follows (shown only first few rows here):

    gene_name   VarXCRep.1  VarX1Rep.1  VarX2Rep.1  VarXCRep.2  VarX3Rep.2  VarX1Rep.2  VarX2Rep.2  VarXCRep.3  VarX3Rep.3  VarX1Rep.3  VarX2Rep.3
1   Soltu.DM.01G000010  360.7000522 395.2279977 323.2595994 361.5910696 327.7380499 386.8290979 336.3997167 333.0843759 317.4954424 377.756613  396.666783
2   Soltu.DM.01G000020  91.12422371 69.30538348 77.36127164 135.060696  61.85252412 110.6099    68.21624475 108.7053612 55.31681029 56.52040232 36.14709293
3   Soltu.DM.01G000030  439.1681337 183.5656103 232.0838149 579.546161  220.9018719 179.6646995 179.2348391 291.2746216 222.4196747 266.8621527 208.321404
4   Soltu.DM.01G000040  268.3102142 185.4387288 192.0217278 301.5640936 130.9345641 237.108515  203.9799475 236.921941  92.19468382 198.1791322 38.04957151
5   Soltu.DM.01G000050  341.7158389 479.5183289 504.229717  322.2876925 528.5579334 390.4957244 470.1570594 342.8399852 554.3205365 424.9761896 634.4766049
6   Soltu.DM.01G000060  468.2772607 839.1570756 759.7982036 514.516937  886.0173261 572.6048416 579.8380803 549.1014398 1011.836655 598.8300854 1077.754113
7   Soltu.DM.01G000070  2.531228436 0   5.525805117 1.429213714 8.032795341 1.83331326  5.350293706 0   4.609734191 0   7.609914302
8   Soltu.DM.01G000090  84.79615262 54.3204357  75.97982036 98.61574626 102.0165008 83.11020113 84.26712586 108.7053612 98.53306833 80.13019064 93.2214502
9   Soltu.DM.01G000100  67.07755356 73.05162042 12.43306151 118.6247383 6.426236273 77.61026135 36.11448251 97.55609336 8.643251608 67.25212429 15.2198286
10  Soltu.DM.01G000110  1.265614218 0   1.381451279 2.143820571 0   1.22220884  4.012720279 0   2.304867095 0.715448131 0.951239288
11  Soltu.DM.01G000120  821.3836276 451.4215518 846.8296342 820.3686718 737.4106123 497.4389979 835.9833915 798.5663071 752.5391067 704.7164087 532.6940011
12  Soltu.DM.01G000130  2.531228436 3.746236945 5.525805117 2.143820571 0.803279534 0.61110442  2.00636014  1.393658477 1.728650322 2.146344392 10.46363217
13  Soltu.DM.01G000140  93.65545214 127.3720561 102.2273947 105.7618148 104.4263394 108.7765868 115.7001014 98.94975183 108.9049703 110.8944603 126.5148253
14  Soltu.DM.01G000150  112.6396654 84.29033126 91.17578444 86.46742969 154.2296705 99.61002047 111.0185944 115.6736536 111.7860541 115.187149  163.6131575
15  Soltu.DM.01G000160  644.197637  573.1742525 222.413656  760.3416958 178.3280566 761.4361074 594.551388  1053.605808 222.4196747 585.2365709 303.4453328
16  Soltu.DM.01G000170  751.7748456 841.0301941 910.3763931 773.9192261 835.4107154 820.7132361 1148.975573 804.140941  849.3435247 710.4399938 946.4830913
17  Soltu.DM.01G000190  6.328071091 1.873118472 5.525805117 6.431461713 8.836074875 5.49993978  8.694227272 11.14926781 4.609734191 7.869929438 0.951239288
18  Soltu.DM.01G000200  88.59299527 73.05162042 66.30966141 74.31911313 63.45908319 78.83247019 74.23532517 86.40682554 59.35032771 59.38219485 44.70824652
19  Soltu.DM.01G000210  108.8428228 112.3871083 85.64997932 111.4786697 73.0984376  123.4430928 113.6937412 143.5468231 67.41736254 77.26839812 86.56277518
20  Soltu.DM.01G000220  5.062456873 86.16344973 93.938687   20.72359885 507.6726655 30.555221   24.74510839 6.968292383 551.4394526 54.37405793 920.7996305

This is how the file appears in Bash shell

gene_name,VarXCRep.1,VarX1Rep.1,VarX2Rep.1,VarXCRep.2,VarX3Rep.2,VarX1Rep.2,VarX2Rep.2,VarXCRep.3,VarX3Rep.3,VarX1Rep.3,VarX2Rep.3
Soltu.DM.01G000010,360.7000522,395.2279977,323.2595994,361.5910696,327.7380499,386.8290979,336.3997167,333.0843759,317.4954424,377.756613,396.666783
Soltu.DM.01G000020,91.12422371,69.30538348,77.36127164,135.060696,61.85252412,110.6099,68.21624475,108.7053612,55.31681029,56.52040232,36.14709293
Soltu.DM.01G000030,439.1681337,183.5656103,232.0838149,579.546161,220.9018719,179.6646995,179.2348391,291.2746216,222.4196747,266.8621527,208.321404
Soltu.DM.01G000040,268.3102142,185.4387288,192.0217278,301.5640936,130.9345641,237.108515,203.9799475,236.921941,92.19468382,198.1791322,38.04957151
Soltu.DM.01G000050,341.7158389,479.5183289,504.229717,322.2876925,528.5579334,390.4957244,470.1570594,342.8399852,554.3205365,424.9761896,634.4766049
Soltu.DM.01G000060,468.2772607,839.1570756,759.7982036,514.516937,886.0173261,572.6048416,579.8380803,549.1014398,1011.836655,598.8300854,1077.754113
Soltu.DM.01G000070,2.531228436,0,5.525805117,1.429213714,8.032795341,1.83331326,5.350293706,0,4.609734191,0,7.609914302
Soltu.DM.01G000090,84.79615262,54.3204357,75.97982036,98.61574626,102.0165008,83.11020113,84.26712586,108.7053612,98.53306833,80.13019064,93.2214502
Soltu.DM.01G000100,67.07755356,73.05162042,12.43306151,118.6247383,6.426236273,77.61026135,36.11448251,97.55609336,8.643251608,67.25212429,15.2198286

I was asked to remove various types of columns and associated data which I have done successfully in the following code. I was then asked to arrange the data such that the headers show control (VarXC) repeats 1, 2 and 3 and experiment 1 (VarX1) repeats in columns next to each other which also has been done in the following code:

empty_list = []


for ln in open("FinalXVartest.csv").readlines():
    col = ln.split(",")
    del col[3]
    del col[4]
    del col[5]
    del col[6]
    del col[7]
    col.append(col.pop(2))
    col.append(col.pop(3))
    col.append(col.pop(4))
    empty_list  = col
    empty_list  = '\n'
 
file_out = open("Xtest_2Var.csv", "w")
file_out.write(','.join(empty_list))
file_out.close()

When I try to compile all this information, the output shows up like this: This is the final output

I am not sure how I am getting that space on the left side. Can someone help me remove so that all the rows shift by one cell to the left?

CodePudding user response：

You should change the code a little bit to make it work as you expect. The problem with your code is that you are constructing a single list to which you add EOL \n as elements. Therefore, when you write this list to a file

file_out.write(','.join(empty_list))

there will be a comma after each line break. I construct a list of lists and add \n right after join to avoid your problem:

empty_list = []

for ln in open("files/FinalXVartest.csv").readlines():
    col = ln.split(",")
    del col[3]
    del col[4]
    del col[5]
    del col[6]
    del col[7]
    col.append(col.pop(2))
    col.append(col.pop(3))
    col.append(col.pop(4))
    empty_list.append(col)

file_out = open("files/Xtest_2Var.csv", "w")
for item in empty_list:
    file_out.write(','.join(item)   '\n')
file_out.close()

But it's better to use csv library. It is suitable for reading and writing csv files.

CodePudding user response：

Using pandas:

import pandas as pd
import re

df = pd.read_csv('FinalXVartest.csv', index_col='gene_name')
parsed = sorted([(re.match(r'VarX(.)Rep.(\d)', k).groups()[::-1], k) for k in df.columns])
cols = [k for (i, j), k in parsed if j in {'1', 'C'}]

df.to_csv('Xtest_2Var.csv')


>>> df[cols]
                    VarX1Rep.1  VarXCRep.1  VarX1Rep.2  VarXCRep.2  VarX1Rep.3  VarXCRep.3
gene_name                                                                                 
Soltu.DM.01G000010  395.227998  360.700052  386.829098  361.591070  377.756613  333.084376
Soltu.DM.01G000020   69.305383   91.124224  110.609900  135.060696   56.520402  108.705361
Soltu.DM.01G000030  183.565610  439.168134  179.664700  579.546161  266.862153  291.274622
Soltu.DM.01G000040  185.438729  268.310214  237.108515  301.564094  198.179132  236.921941
Soltu.DM.01G000050  479.518329  341.715839  390.495724  322.287692  424.976190  342.839985
Soltu.DM.01G000060  839.157076  468.277261  572.604842  514.516937  598.830085  549.101440
Soltu.DM.01G000070    0.000000    2.531228    1.833313    1.429214    0.000000    0.000000
Soltu.DM.01G000090   54.320436   84.796153   83.110201   98.615746   80.130191  108.705361
Soltu.DM.01G000100   73.051620   67.077554   77.610261  118.624738   67.252124   97.556093