I'm using pandas 'read_csv' function to read the lines of a file which is not in a CSV format. It does not contain ',' (comma) for me to use it as the delimiter. It has whitespaces with different spacings. The line below is one of the example:
Power Output 12(25%) 24(50%) 12(25%)
I would like to extract them out using the following way pandas.read_csv(sep='')
by using regex and I'm not sure how exactly it can be done. I know I can separate them using whitespaces but that will separate Power Output
as two different columns. I want a regex method where I can match all the whitespaces irrelevant of the spacing, BUT skips the first match it founds.
I'm expecting the following output in the pandas dataframe later:
Col 1 | Col 2 | Col 3 | Col 4 |
---|---|---|---|
Power Output | 12(25%) | 24(50%) | 12(25%) |
CodePudding user response:
Your code uses sep=''
(empty string). You want to use sep='\s '
(regex for whitespace).
If you want more detail, refer to the documentation for read_csv: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
CodePudding user response:
You can use white spaces followed by a digit as separator. For this use a look-ahead regex:
df = pd.read_csv(..., sep='\s (?=\d)', engine='python')