pd.read_csv column misalignment-CodePudding

I have a text file containing two columns, but when I want to read the data using Pandas, the columns and the data are misaligned.

Here's my code for reference:

import pandas as pd
filename = r'data.txt'
#read text file into pandas DataFrame
df = pd.read_csv(
    filename,
    sep="\t",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)

df

Result:

Frequency / MHz S2,1 [SPara1]/abs,dB
0   0.1423125 -...  NaN
1   0.146625 -...   NaN
2   0.1509375 -...  NaN
3   0.15525 -...    NaN
4   0.1595625 -...  NaN
... ... ...
4059    17.64675 ...    NaN
4060    17.651063 ...   NaN
4061    17.655375 ...   NaN
4062    17.659688 ...   NaN
4063    17.664 ...  NaN

My sample data.txt:

  Frequency / MHz                S2,1 [SPara1]/abs,dB
----------------------------------------------------------------------
                   0.138                     -0.92293553
               0.1423125                     -0.93264485
                0.146625                     -0.94201416
               0.1509375                     -0.95106676
                 0.15525                     -0.95982484
               0.1595625                     -0.96830956
                0.163875                     -0.97654091
               0.1681875                     -0.98453781
                  0.1725                     -0.99231804
               0.1768125                     -0.99989837
                0.181125                      -1.0072945
               0.1854375                      -1.0145212
                 0.18975                      -1.0215921
               0.1940625                      -1.0285203
                0.198375                      -1.0353177
               0.2026875                      -1.0419956
                   0.207                      -1.0485644
               0.2113125                      -1.0550339

I want the data 'NaN' to change to its original data. What should I change in my code?

CodePudding user response：

You should change the separator that you're using to "\s\s ":

df = pd.read_csv(
    filename,
    sep="\s\s ",
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
    engine="python"
)

This outputs:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034

CodePudding user response：

I think it is better to use delim_whitespace=True

df = pd.read_csv(
    filename,
    delim_whitespace=True,
    skiprows=3,
    names=['Frequency / MHz', 'S2,1 [SPara1]/abs,dB'],
)

This outputs:

    Frequency / MHz  S2,1 [SPara1]/abs,dB
0          0.142313             -0.932645
1          0.146625             -0.942014
2          0.150938             -0.951067
3          0.155250             -0.959825
4          0.159562             -0.968310
5          0.163875             -0.976541
6          0.168187             -0.984538
7          0.172500             -0.992318
8          0.176813             -0.999898
9          0.181125             -1.007294
10         0.185438             -1.014521
11         0.189750             -1.021592
12         0.194062             -1.028520
13         0.198375             -1.035318
14         0.202687             -1.041996
15         0.207000             -1.048564
16         0.211312             -1.055034