read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of s-CodePudding

This script reads in a txt file and creates a df, but the 'sep' argument I want to handle values that may be seperated by 1 space or more, so when I run the script above I get many columns with NaN.

code:

df = pd.read_csv(data_file,header = None, sep=' ')

example txt file

blah blahh    bl
blah3 blahhe      ble

I want there to just be 3 columns so i get

Col_a  col_b   col_c
blah   blahh    bl
blah3  blahhe   ble

CodePudding user response：

You can use regex as the delimiter:

pd.read_csv(data_file, header=None, delimiter=r"\s ", names='Col_a Col_b Col_c'.split(' '))

Or you can use delim_whitespace=True argument, it's faster than regex:

pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))

Reference: How to read file with space separated values in pandas