I must perform some calculations with values provided from a .csv file, and need to use the column headings as keys to their values in each column. However there is a likelihood that the columns of the file may be jumbled up or swapped around, and thus simply indexing for the values of each key wouldn't work. Also keeping in mind i cant import any modules such as csv. Heres a sample of what the csv file would look like, except with many more rows, and more AdultIDs...
AdultID | Landmark | X | Y | Z |
---|---|---|---|---|
R7033 | Ex_L | -32 | -39 | -4.6 |
R7033 | En_L | -1.8 | -41 | 6.7 |
R7033 | N | 12 | -34 | 22.6 |
R7033 | En_R | 30.1 | -43 | 8.3 |
So effectively, I need the dictionary as such: {AdultID: [R7033, R7033, R7033], Landmark:[Ex_L, En_R, N, En_R] ... } and so on.
CodePudding user response:
Assume that your input file (let's call it luh.csv) is a well formed CSV using commas as delimiters like this:
AdultID,Landmark,X,Y,Z
R7033,Ex_L,-32,-39,-4.6
R7033,En_L,-1.8,-41,6.7
R7033,N,12,-34,22.6
R7033,En_R,30.1,-43,8.3
Then:
with open('luh.csv') as csv:
columns = next(csv).strip().split(',')
dict_ = {}
for line in map(str.strip, csv):
for col, val in zip(columns, line.split(',')):
dict_.setdefault(col, []).append(val)
print(dict_)
Output:
{'AdultID': ['R7033', 'R7033', 'R7033', 'R7033'], 'Landmark': ['Ex_L', 'En_L', 'N', 'En_R'], 'X': ['-32', '-1.8', '12', '30.1'], 'Y': ['-39', '-41', '-34', '-43'], 'Z': ['-4.6', '6.7', '22.6', '8.3']}
CodePudding user response:
Why don't you use explicit indexing, then it doesn't matter what order the columns are in. In the dataframe, you can replace the indexes with your own, I used letters for clarity (you can use numbers). If I understand you correctly.
import pandas as pd
df = pd.DataFrame({'abc': [1, 2, 3, 4], 'Landmark':['Ex_L', 'En_R', 'N', 'En_R'], 'AdultID': ['R7033', 'R7033', 'R7033', 'R7033']},
index = ['a', 'b', 'c', 'd'])
print(df.loc['a', 'AdultID'])
print(df.loc[:, ['abc', 'AdultID']])
Output df.loc['a', 'AdultID']
R7033
Output df.loc[:, ['abc', 'AdultID']]
abc AdultID
a 1 R7033
b 2 R7033
c 3 R7033
d 4 R7033
This is how the file is read and at the same time a dataframe is created, where 'header=0 ' is the first line from which the column names are created.
df = pd.read_csv('name_file.csv', header=0)