Home > Blockchain >  How to skip the first whitespace match in regex [Python]?
How to skip the first whitespace match in regex [Python]?

Time:03-06

I'm using pandas 'read_csv' function to read the lines of a file which is not in a CSV format. It does not contain ',' (comma) for me to use it as the delimiter. It has whitespaces with different spacings. The line below is one of the example:

Power Output 12(25%)   24(50%)  12(25%)

I would like to extract them out using the following way pandas.read_csv(sep='') by using regex and I'm not sure how exactly it can be done. I know I can separate them using whitespaces but that will separate Power Output as two different columns. I want a regex method where I can match all the whitespaces irrelevant of the spacing, BUT skips the first match it founds.

I'm expecting the following output in the pandas dataframe later:

Col 1 Col 2 Col 3 Col 4
Power Output 12(25%) 24(50%) 12(25%)

CodePudding user response:

Your code uses sep='' (empty string). You want to use sep='\s ' (regex for whitespace).

If you want more detail, refer to the documentation for read_csv: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

CodePudding user response:

You can use white spaces followed by a digit as separator. For this use a look-ahead regex:

df = pd.read_csv(..., sep='\s (?=\d)', engine='python')
  • Related