Home > Blockchain >  replace whitespace with comma in multiline string (doc string), but keeping end-of-line
replace whitespace with comma in multiline string (doc string), but keeping end-of-line

Time:09-20

I have a multiline string (and not a text file) like this:

x = '''
Index    Value         Max     Min    State
0    10    nan         nan     nan
1    20    nan         nan     nan    
2    15    nan         nan     nan     
3    25    20          10      1
4    15    25          15      2
5    10    25          15      4
6    15    20          10      3    
'''

The column white spaces are unequal.

I want to replace the whitespace with a comma, but keep the end-of-line.

So the result would look like this:

Index,Value,Max,Min,State
0,10,nan,nan,nan
1,20,nan,nan,nan    
2,15,nan,nan,nan     
3,25,20,10,1
4,15,25,15,2
5,10,25,15,4
6,15,20,10,3    

...or alternatively as a pandas dataframe.

what i have tried

  • I can use replace('') with different spaces, but need to count the white spaces
  • I can use the re module (from here re.sub question ), but it converts the whole string to 1 line, where as i need to keep multiple lines (end-of-line).

CodePudding user response:

Try with StringIO

from io import StringIO
import pandas as pd


x = '''
Index    Value         Max     Min    State
0    10    nan         nan     nan
1    20    nan         nan     nan    
2    15    nan         nan     nan     
3    25    20          10      1
4    15    25          15      2
5    10    25          15      4
6    15    20          10      3    
'''

df = pd.read_csv(StringIO(x), sep='\s\s ', engine='python')

   Index  Value   Max   Min  State
0      0     10   NaN   NaN    NaN
1      1     20   NaN   NaN    NaN
2      2     15   NaN   NaN    NaN
3      3     25  20.0  10.0    1.0
4      4     15  25.0  15.0    2.0
5      5     10  25.0  15.0    4.0
6      6     15  20.0  10.0    3.0

CodePudding user response:

Since you tagged pandas, you can try:

out = ('\n'.join(pd.Series(x.split('\n')).str.strip().str.replace('\s ',',', regex=True)))

Output (note that there are leading and trailing blank lines because your x does):

Index,Value,Max,Min,State
0,10,nan,nan,nan
1,20,nan,nan,nan
2,15,nan,nan,nan
3,25,20,10,1
4,15,25,15,2
5,10,25,15,4
6,15,20,10,3
  • Related