Python: Generate Dictionary From Form Data That Comes In As A Single String-CodePudding

When working with the ASANA API, a form that is filled out is inserted as a single string under a notes field (sample output below). I am working in a Jupyter Notebook via Anaconda using python version 3.9

My goal would be to create a dict where the form question is the KEY and the answer(s) is the VALUE. ex: {"Name":"Internal Requestor", "Name of Project": "Dummy Test project"} etc so that I can ultimately store it as a pandas df. Some of the questions do have multi-part answers. Those answers can be kept as a single key, example: {"What teams will be involved?": "Content","SEO","Creative","Ops","Pr","Internal"}

EXAMPLE STRING BELOW

"Name:\nInternal Requestor\n\nName of Project:\nDummy Test Project\n\nWhich Marketing O-Team does this project belong to?:\nIncrease traffic\n\nProject Description:\nThis is a project test to see if we can get the fields from this form into a dataframe\n\nWhat is the strategy driving this project?:\nEfficiency is the name of the game!\n\nWhat is the expected impact of the project?:\nBe more efficient than we currently are\n\nPlease rank size of expected impact (H, M, L):\nHigh\n\nWhat are the project objectives?:\nBe more efficient\nClearer creative direction\neasier to stack rank\n\nHow confident are you this project will meet these objectives?:\nHigh\n\nPlease point the size of this project:\n8\n\nWhat teams will be involved?:\nContent/SEO\nCreative\nOps\nPR\nInternal Comms\nSocial\nDemandGen\nOwner\nGuest\nExternal Stakeholders\n\nIf including external stakeholders, please note below:\nRes Ops\n\nPlease point the external stakeholders expected involvement in the project:\n5\n\nWhich content teams?:\nContent\nSEO\n\nPlease point content's expected involvement in the project:\n5\n\nPlease point SEO's expected involvement in the project:\n5\n\nWhich creative teams?:\nDesign\nCopy\n\nPlease point Design's expected involvement in the project:\n8\n\nPlease point Copy's expected involvement in the project:\n8\n\nWhich Ops Teams?:\nAnalytics\nMartech\n\nPlease point Analytics expected involvement in projects:\n8\n\nPlease point Martech's expected involvement in the project:\n8\n\nPlease point PR's expected involvement in the project:\n1\n\nPlease point Internal Comms expected involvement in the project:\n5\n\nPlease point Social's expected involvement in the project:\n5\n\nPlease point DemandGen's expected involvement in the project:\n3\n\nPlease point Owner's expected involvement in the project:\n8\n\nPlease point Guest's expected involvement in the project:\n1\n\nPlease provide project milestones:\nScoping 9/19 - 9/20\nExecution 9/21 - 9/25\nLaunch 9/31\n\nIs this a hard or soft deadline?:\nHard\n\nWhat is driving this deadline?:\nEfficiency\n\nWhich manager approved this request submission?:\nBilly Bob\n

I attempted to use .splitlines(), but was not sure how to utilize the output to construct a dict from the list it returned (particularly when accounting for the questions that have multiple answers, described above). ** New to StackOverflow as well, can include more details as needed **

CodePudding user response：

Break the problem up into smaller pieces.

Based on your data, start by splitting by double new lines, then by a single new line.

>>> raw = "YOUR LONG STRING"
>>> qa = {}
>>> for group in raw.split('\n\n'):
...   question, answers = group.split('\n', 1)
...   qa[question.rstrip(':')] = answers.splitlines()
...

which gives

{'How confident are you this project will meet these objectives?': ['High'],
 'If including external stakeholders, please note below': ['Res Ops'],
 'Is this a hard or soft deadline?': ['Hard'],
 'Name': ['Internal Requestor'],
 'Name of Project': ['Dummy Test Project'],
 'Please point Analytics expected involvement in projects': ['8'],
 "Please point Copy's expected involvement in the project": ['8'],
 "Please point DemandGen's expected involvement in the project": ['3'],
 "Please point Design's expected involvement in the project": ['8'],
 "Please point Guest's expected involvement in the project": ['1'],
 'Please point Internal Comms expected involvement in the project': ['5'],
 "Please point Martech's expected involvement in the project": ['8'],
 "Please point Owner's expected involvement in the project": ['8'],
 "Please point PR's expected involvement in the project": ['1'],
 "Please point SEO's expected involvement in the project": ['5'],
...
 'What is the strategy driving this project?': ['Efficiency is the name of the '
                                                'game!'],
 'What teams will be involved?': ['Content/SEO',
                                  'Creative',
                                  'Ops',
                                  'PR',
                                  'Internal Comms',
                                  'Social',
                                  'DemandGen',
                                  'Owner',
                                  'Guest',
                                  'External Stakeholders'],
...

if you wanted to hardcode something so that single answers weren't lists, you should decide that yourself, that seems ambiguous. You should also be careful of duplicate questions, etc.

CodePudding user response：

I assume you have your string in a file, so if this is not the case just ignore the first row;

file_as_string = ''.join(open('yourfile.txt').readlines())

outdict = {}
for i in file_as_string.split('\n\n'):
    s = i.split(':\n')
    outdict[s[0]] = s[1]

Results:

{'Name': 'Internal Requestor',
 'Name of Project': 'Dummy Test Project',
 'Which Marketing O-Team does this project belong to?': 'Increase traffic',
 'Project Description': 'This is a project test to see if we can get the fields from this form into a dataframe',
 'What is the strategy driving this project?': 'Efficiency is the name of the game!',
 'What is the expected impact of the project?': 'Be more efficient than we currently are',
 'Please rank size of expected impact (H, M, L)': 'High',
 'What are the project objectives?': 'Be more efficient\nClearer creative direction\neasier to stack rank',
 'How confident are you this project will meet these objectives?': 'High',
 'Please point the size of this project': '8',
 'What teams will be involved?': 'Content/SEO\nCreative\nOps\nPR\nInternal Comms\nSocial\nDemandGen\nOwner\nGuest\nExternal Stakeholders',
 'If including external stakeholders, please note below': 'Res Ops',
 'Please point the external stakeholders expected involvement in the project': '5',
 'Which content teams?': 'Content\nSEO',
 "Please point content's expected involvement in the project": '5',
 "Please point SEO's expected involvement in the project": '5',
 'Which creative teams?': 'Design\nCopy',
 "Please point Design's expected involvement in the project": '8',
 "Please point Copy's expected involvement in the project": '8',
 'Which Ops Teams?': 'Analytics\nMartech',
 'Please point Analytics expected involvement in projects': '8',
 "Please point Martech's expected involvement in the project": '8',
 "Please point PR's expected involvement in the project": '1',
 'Please point Internal Comms expected involvement in the project': '5',
 "Please point Social's expected involvement in the project": '5',
 "Please point DemandGen's expected involvement in the project": '3',
 "Please point Owner's expected involvement in the project": '8',
 "Please point Guest's expected involvement in the project": '1',
 'Please provide project milestones': 'Scoping 9/19 - 9/20\nExecution 9/21 - 9/25\nLaunch 9/31',
 'Is this a hard or soft deadline?': 'Hard',
 'What is driving this deadline?': 'Efficiency',
 'Which manager approved this request submission?': 'Billy Bob'}

CodePudding user response：

If s is your string :

d = {
    entry.split("\n")[0]: ",".join(entry.split("\n")[1:])
    for entry in s.split("\n\n")
}

If you ever have 2 newlines in a row in an answer this won't work, but in that case the string is ambiguous and I don't see what could get around that.