Home > OS >  In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file
In Pandas, how can I extract certain value using the key off of a dataframe imported from a csv file

Time:01-03

Using Pandas, I'm trying to extract value using the key but I keep failing to do so. Could you help me with this?

There's a csv file like below:

value
"{""id"":""1234"",""currency"":""USD""}"
"{""id"":""5678"",""currency"":""EUR""}"

I imported this file in Pandas and made a DataFrame out of it:

dataframe from a csv file

However, when I tried to extract the value using a key (e.g. df["id"]), I'm facing an error message.

I'd like to see a value 1234 or 5678 using df["id"]. Which step should I take to get it done? This may be a very basic question but I need your help. Thanks.

CodePudding user response:

You need to extract each row of your dataframe using json.loads() or eval()

something like this:

import json


for row in df.iteritems():
    print(json.loads(row.value))
    # OR
    print(eval(row.value))

CodePudding user response:

The csv file isn't being read in correctly.

You haven't set a delimiter; pandas can automatically detect a delimiter but hasn't done so in your case. See the read_csv documentation for more on this. Because the , the pandas dataframe has a single column, value, which has entire lines from your file as individual cells - the first entry is "{""id"":""1234"",""currency"":""USD""}". So, the file doesn't have a column id, and you can't select data by id.

The data aren't formatted as a pandas df, with row titles and columns of data. One option is to read in this data is to manually process each row, though there may be slicker options.

file = 'test.dat'
f = open(file,'r')

id_vals = []
currency = []

for line in f.readlines()[1:]:
    
    ## remove obfuscating characters
    for c in '"{}\n':
        line = line.replace(c,'')
    line = line.split(',')

    ## extract values to two lists
    id_vals.append(line[0][3:])
    currency.append(line[1][9:])

CodePudding user response:

You just need to clean up the CSV file a little and you are good. Here is every step:

# open your csv and read as a text string
with open('My_CSV.csv', 'r') as f:
    my_csv_text = f.read()

    # remove problematic strings 
    find_str = ['{', '}', '"', 'id:', 'currency:','value']
    replace_str = ''
    for i in find_str:
        my_csv_text = re.sub(i, replace_str, my_csv_text)

# Create new csv file and save cleaned text
new_csv_path = './my_new_csv.csv' # or whatever path and name you want
with open(new_csv_path, 'w') as f:
    f.write(my_csv_text)
    
# Create pandas dataframe
df = pd.read_csv('my_new_csv.csv', sep=',', names=['ID', 'Currency'])
print(df)

Output df:

     ID Currency
0  1234      USD
1  5678      EUR

  • Related