I have an xlsx file, where each row corresponds to a sample with associated features in each column, as shown here: xlsx file example
I am trying to convert this xlsx file into a dat file, with multiple spaces separating the columns, as displayed in the example below:
samples property feature1 feature2 feature3
sample1 3.0862 0.8626 0.7043 0.6312
sample2 2.8854 0.7260 0.7818 0.6119
sample3 0.6907 0.4943 0.0044 0.4420
sample4 0.9902 0.0106 0.0399 0.9877
sample5 0.7242 0.0970 0.3199 0.5504
I have tried doing this by creating a dataframe in pandas and using dataframe.to_csv to save the file as a .dat, but it only allows me to use one character as a delimiter. Does anyone know how I might go about creating a file like the one above?
CodePudding user response:
You can use the string representation to_string
of the dataframe, imported by pandas from Excel:
df = pd.read_excel('input.xlsx')
with open ('output.dat', 'w') as f:
f.write(df.to_string(index=False))
CodePudding user response:
This is another approach to do so without using DataFrame
. We will have more flexibility since we do all the structure ourselves from the ground up.
Suppose you have read the xlsx file and store it in the form of 2-d list as follows:
lines = [['sample1', 3.0862, 0.8626, 0.7043, 0.6312],
['sample2', 2.8854, 0.7260, 0.7818, 0.6119],
['sample3', 0.6907, 0.4943, 0.0044, 0.4420],
['sample4', 0.9902, 0.0106, 0.0399, 0.9877],
['sample5', 0.7242, 0.0970, 0.3199, 0.5504]]
We can make use of string methods like ljust
, rjust
, or center
. Right here, I just show you the use of ljust
that takes the length as the first argument. The length will be the total width for left justification.
One could also use f-string to do padding in the format of f'{var:^10.4f}'
. The meaning of each component is:
^
represents centering (can be changed to<
for left justification or>
for right justification)10
is the padding length.4
is the number of decimal placesf
means float
So, here is the final script.
padding1 = 12
padding2 = 10
print('samples'.ljust(padding1 1) 'property ' 'feature1 ' 'feature2 ' 'feature3')
for line in lines:
text = line[0].ljust(padding1)
for i in range(1, len(line)):
text = f'{line[i]:^{padding2}.4f}'
print(text)