read from txt file and convert into dataframe in python-CodePudding

I have a txt file as following:

sub_ID: ['sub-01','sub-02']

ses_ID: ['ses-01','ses-01']

mean: [0.3456,0.446]

I want to read this and convert it to a dataframe such as in the image -don't mind the values in mean_e_field column, it's just an example. the values should be the same as in the txt file. desired dataframe

I tried this and got this however I can't transform it to my prefered df :dataframe data = pd.read_csv(filename, sep=",", header=None) data

I appreaciate your answers in advance.

CodePudding user response：

So, several things here.

The reason why your previous data = pd.read_csv(filename, sep=",", header=None) did not work is that you've indicated that it should separate on , and it treats every single line as a row to be split. So, sub_ID: [ 'sub-01','sub-02' ] is split to sub_ID: ['sub-01' and 'sub-02' ].

The example data you've provided seems to be in YAML format:

sub_ID: [ 'sub-01','sub-02' ]

ses_ID: [ 'ses-01','ses-01' ]

mean: [ 0.3456,0.446 ]

If it were CSV, the data would look as follows (it does not):

sub_ID,ses_ID,mean
sub-01,ses-01,0.3456
sub-02,ses-02,0.445

To read this data into a dataframe, you will either need to preprocess it into another format (e.g. csv) or read it as YAML into a dict and pass that to pandas.DataFrame.

For example:

import yaml
with open("data.txt", "r") as file:
    try:
        # This returns a dict from the given YAML data.
        data = yaml.safe_load(file)
    except yaml.YAMLError as exc:
        print(exc)

print(data)
# {'sub_ID': ['sub-01', 'sub-02'], 'ses_ID': ['ses-01', 'ses-01'], 'mean': [0.3456, 0.446]}

After that, you can create a DataFrame from this dict:

df = pd.DataFrame(data)
df.head()


 ----- -------- -------- -------- 
|     | sub_ID | ses_ID |  mean  |
 ----- -------- -------- -------- 
|   0 | sub-01 | ses-01 | 0.3456 |
|   1 | sub-02 | ses-02 |  0.446 |
 ----- -------- -------- --------

as desired.

If you have certain entries that are not valid YAML, you will need to preprocess the data before loading it into pandas.