Home > Software engineering >  Importing data from a dataframe in a txt file into a dictionary
Importing data from a dataframe in a txt file into a dictionary

Time:09-28

Assuming the following text file

Some lines here
and here

chr1    111    222    More_data_I_dont_need    More_data_I_dont_need   ...
chr2    333    444    More_data_I_dont_need.   More_data_I_dont_need   ...
chr3    555    666    More_data_I_dont_need    More_data_I_dont_need   ...

(data separated by \t)

I have been trying to convert this data into a Python dictionary where the keys store the values of the first columns and the values of the dictionary store in a list the values of the columns second and third. eg.:

{"chr1": [111, 222], "chr2": [333,444]}

The code I have written so far is:

d = {}
star_end_list = []

with open("file.txt") as f:
for line in f:
    if line.startswith('chr'):
        chr, start,end = line.split()
        star_end_list.append() = start
        star_end_list.append() = end
        # Then with this data I can create a dictionary. Something like
        d[chr] = star_end_list

But I don't know how to ignore the values of the columns 4th, 5th and so on.

CodePudding user response:

You can retrieve only the first 3 items of the list, as seen below. You also had a few errors with your indentation, the way you use the append function, and that you initialize the start_end_list outside of the loop. They are addressed below too.

d = {}

with open("file.txt") as f:
    for line in f:
        star_end_list = []
        if line.startswith('chr'):
            chr, start, end = line.split()[:3]
            star_end_list.append(start)
            star_end_list.append(end)
            d[chr] = star_end_list
  • Related