Home > Enterprise >  How to separate .csv data into different columns
How to separate .csv data into different columns

Time:07-25

I have a text file with data which looks like this:

NCP_341_1834_0022.png 2 0 130 512 429

I would like to split the data into different columns with names like this:

['filename','class','xmin','ymin','xmax','ymax']

I have done this:

test_txt = pd.read_csv(r"../input/covidxct/train_COVIDx_CT-3A.txt")
test_txt.to_csv(r"../working/test/train.csv",index=None, sep='\t')
train = pd.read_csv("../working/test/train.csv")

However when I download the .csv file, it gives me the data line all in one column, as opposed to 6 columns. How can I fix this?

CodePudding user response:

Just set the right separator (',' by default):

test_txt = pd.read_csv(r"../input/covidxct/train_COVIDx_CT-3A.txt", sep=' ', header=None)

if you are using test_COVIDx_CT-3A.txt from Kaggle.

Don't forget to set header=None since there is no header. You can also use colnames=['image', 'col1', 'col2', ...] to replace default names (0, 1, 2, ...)

CodePudding user response:

Just to answer my own question, You can use str to split the single .csv file into different columns. For me, I split it into 6 columns, for my 6 labels:

train[['filename', 'class','xmin','ymin','xmax','ymax']] = train['NCP_96_1328_0032.png 2 9 94 512 405'].str.split(' ', 6, expand=True)
train.head()

Then just drop the column you dont need:

train.drop(train.columns[[0]], axis=1)
  • Related