Home > Enterprise >  How to create .dat file with multiple spaces as delimiter from .xlsx file
How to create .dat file with multiple spaces as delimiter from .xlsx file

Time:12-17

I have an xlsx file, where each row corresponds to a sample with associated features in each column, as shown here: xlsx file example

I am trying to convert this xlsx file into a dat file, with multiple spaces separating the columns, as displayed in the example below:

samples      property  feature1  feature2  feature3
sample1       3.0862    0.8626    0.7043    0.6312
sample2       2.8854    0.7260    0.7818    0.6119
sample3       0.6907    0.4943    0.0044    0.4420
sample4       0.9902    0.0106    0.0399    0.9877
sample5       0.7242    0.0970    0.3199    0.5504

I have tried doing this by creating a dataframe in pandas and using dataframe.to_csv to save the file as a .dat, but it only allows me to use one character as a delimiter. Does anyone know how I might go about creating a file like the one above?

CodePudding user response:

You can use the string representation to_string of the dataframe, imported by pandas from Excel:

df = pd.read_excel('input.xlsx')
with open ('output.dat', 'w') as f:
    f.write(df.to_string(index=False))

CodePudding user response:

This is another approach to do so without using DataFrame. We will have more flexibility since we do all the structure ourselves from the ground up.

Suppose you have read the xlsx file and store it in the form of 2-d list as follows:

lines = [['sample1', 3.0862, 0.8626, 0.7043, 0.6312],
        ['sample2', 2.8854, 0.7260, 0.7818, 0.6119],
        ['sample3', 0.6907, 0.4943, 0.0044, 0.4420],
        ['sample4', 0.9902, 0.0106, 0.0399, 0.9877],
        ['sample5', 0.7242, 0.0970, 0.3199, 0.5504]]

We can make use of string methods like ljust, rjust, or center. Right here, I just show you the use of ljust that takes the length as the first argument. The length will be the total width for left justification.

One could also use f-string to do padding in the format of f'{var:^10.4f}'. The meaning of each component is:

  • ^ represents centering (can be changed to < for left justification or > for right justification)
  • 10 is the padding length
  • .4 is the number of decimal places
  • f means float

So, here is the final script.

padding1 = 12
padding2 = 10

print('samples'.ljust(padding1   1)   'property  '   'feature1  '   'feature2  '   'feature3')
for line in lines:
    text = line[0].ljust(padding1)
    for i in range(1, len(line)):
        text  = f'{line[i]:^{padding2}.4f}'
    print(text)
  • Related