Home > Enterprise >  Data extraction for specific string
Data extraction for specific string

Time:11-09

I have a long list of JSON data, with repeats of contents similar to followings.

Due to the original JSON file is too long, I will just shared the hyperlinks here. This is a result generated from a database called RegulomeDB.

Direct link to the JSON file

I would like to extract specific data (eQTLs) from "method": "eQTLs" and "value": "xxxx", and put them into 2 columns (tab delimited) exactly like below. Note: "value":"xxxx" is extracted right after "method": "eQTLs"is detected.

eQTLs   firstResult, secondResult, thirdResult, ...

In this example, the desired output is:

eQTLs   EIF3S8, EIF3CL

I've tried using a python script but was unsuccessful.

import json
with open('file.json') as f:
    f_json = json.load(f)
    print 'f_json[0]['"method": "eQTLs"']   "\t"   f_json[0]["value"]

Thank you for your kind help.

CodePudding user response:

Try this:

cat file.json | grep -iE '"method":\s*"eQTLs"[^}]*' -o | cut -d ',' -f 1,5 | sed -r 's/"|:|method|value//gi' | sed 's/\s*eqtls,\s*//gi' | tr '\n' ',' | sed 's/,$/\n/g' | sed 's/,/, /g' | xargs echo -e 'eQTLs\x09'
  • Related