Home > OS >  Picking out a specific column in a table
Picking out a specific column in a table

Time:11-13

My goal is to import a table of astrophysical data that I have saved to my computer (obtained from matching 2 other tables in TOPCAT, if you know it), and extract certain relevant columns. I hope to then do further manipulations on these columns. I am a complete beginner in python, so I apologise for basic errors. I've done my best to try and solve my problem on my own but I'm a bit lost.

This script I have written so far:

import pandas as pd
input_file = "location\\filename"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])

The file that I'm trying to import is listed as having file type "File", in my drive. I've looked at this file in Notepad and it has a lot of descriptive bumf in the first few rows, so to try and get rid of this I've used "skiprows" as you can see. The data in the file is separated column-wise by lines--at least that's how it appears in Notepad.

The problem is when I try to extract the first column using "usecol" it instead returns what appears to be the first row in the command window, as well as a load of vertical bars between each value. I assume it is somehow not interpreting the table correctly? Not understanding what's a column and what's a row.

What I've tried: Modifying the file and saving it in a different filetype. This gives the following error:

FileNotFoundError: \[Errno 2\] No such file or directory: 'location\\filename'

Despite the fact that the new file is saved in exactly the same location.

I've tried using "pd.read_table" instead of csv, but this doesn't seem to change anything (nor does it give me an error).

When I've tried to extract multiple columns (ie "usecol=[1,2]") I get the following error:

ValueError: Usecols do not match columns, columns expected but not found: \[1, 2\]

My hope is that someone with experience can give some insight into what's likely going on to cause these problems.

CodePudding user response:

Maybie you can try dataset.iloc[:,0] . With iloc you can extract the column or line you want by index(not only). [:,0] for all the lines of 1st column.

CodePudding user response:

The file is incorrectly named.

I expect that you are reading a csv file or an xlsx or txt file. So the (windows) path would look similar to this:

import pandas as pd
input_file = "C:\\python\\tests\\test_csv.csv"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])

The error message tell you this: No such file or directory: 'location\\filename'

  • Related