It's hard to search for what I'm having trouble describing, so I apologize if this has been answered elsewhere.
I have a list of dictionaries from a cli tool (timewarrior, if you're familiar!) that I want to organize into a hierarchical structure for the purposes of printing a table, or exporting a CSV.
How that hierarchy will be generated depends on the order of a list contained in each list element, called "tags". Each list of tags contains some tracked time. I want to summarize the time spent on the hierarchy from the bottom, to the top.
Here is an oversimplified example of the data I am dealing with:
data = [{"tags": ["Project A", "Task 1"], "time": 50},
{"tags": ["Project A", "Task 2"], "time": 20},
{"tags": ["Do a thing"], "time": 10},
{"tags": ["Project B", "Do a thing"], "time": 50}]
With that data, I am hoping to create one of the follow two structures for a recursive function call:
A nested list:
outcome_a = [["Project A", 70, [["Task 1", 50], ["Task 2", 20]]],
["Do a thing", 10],
["Project B", 50, ["Do a thing", 50]]]
Or a nested dictionary:
outcome_b = {
"Project A": {
"time": 70,
"sub": {
"Task 1": {
"time": 50
},
"Task 2": {
"time": 20
}
}
},
"Do a thing": {
"time": 10
},
"Project B": {
"time": 50,
"sub": {
"Do a thing": {
"time": 50
}
}
}
}
It feels simple enough to iterate the dictionaries then iterate the tags inside. What's tripping me up is how to elegantly keep track of the time context of each element once iteration has moved past the first layer of nested data.
I obviously don't want to iterate the list ad nauseam and try to re-discover the context. The best solution I can think of is to iterate the data and somehow pass the context along to the next tags
element. I think some sort of recursive function or reducer could to do the trick.
At this point I have hammered away a couple of free-time hours to juggle how I could go about this. I'm sure I'm over-thinking things. I am open to suggestions :)
CodePudding user response:
If you're continuously accessing it, I would avoid a nested list
.
Your dict
result is easy enough if you utilize dict.get
.
data = [{"tags": ["Project A", "Task 1"], "time": 50},
{"tags": ["Project A", "Task 2"], "time": 20},
{"tags": ["Do a thing"], "time": 10},
{"tags": ["Project B", "Do a thing"], "time": 50}]
result = {}
for tags in data:
project, *task = tags['tags']
task_time = tags['time']
if task:
result[project] = result.get(project, {'time': 0, 'sub': {}})
result[project]['sub'][task[0]] = {'time': task_time}
result[project]['time'] = task_time
else:
result[project] = {'time': task_time}
print(result)
# {
# 'Project A': {
# 'time': 70,
# 'sub': {
# 'Task 1': {'time': 50},
# 'Task 2': {'time': 20}
# }
# },
# 'Do a thing': {'time': 10},
# 'Project B': {
# 'time': 50,
# 'sub': {
# 'Do a thing': {'time': 50}
# }
# }
# }
Some highlights:
This unpacks the 'tags'
key into 2 variables. If only one item exists in the list
under that key, an empty []
is assigned to task
.
project, *task = tags['tags']
We know in this branch the 'sub'
key is needed. dict.get()
is used here to retrieve the project
key. If it doesn't exist, a default dict
structure is assigned to it.
result[project] = result.get(project, {'time': 0, 'sub': {}})
Since the initial value of result[project]['time']
is set to 0
. We can continuously add on to that key.
result[project]['time'] = task_time
And finally, for single tasks, when task
is "Falsey".
result[project] = {'time': task_time}
Edit:
For something tags size agnostic, you can use the same dict.get
trick and us it to nest arbitrarily deep. It also has the added bonus of removing the if
statement.
data = [{"tags": ["Project A", "Task 1"], "time": 50},
{"tags": ["Project A", "Task 2", "Part 1"], "time": 10},
{"tags": ["Project A", "Task 2", "Part 2"], "time": 10},
{"tags": ["Do a thing"], "time": 10},
{"tags": ["Project B", "Do a thing"], "time": 50}]
result = {}
for tags in data:
project, *tasks = tags['tags']
task_time = tags['time']
result[project] = result.get(project, {'time': 0})
result[project]['time'] = task_time
current_dict = result[project]
for task in tasks:
current_dict['sub'] = current_dict.get('sub', {})
current_dict = current_dict['sub']
current_dict[task] = current_dict.get(task, {'time': 0})
current_dict[task]['time'] = task_time
current_dict = current_dict[task]
print(result)
# {
# 'Project A': {
# 'time': 70,
# 'sub': {
# 'Task 1': {'time': 50},
# 'Task 2': {
# 'time': 20,
# 'sub': {
# 'Part 1': {'time': 10},
# 'Part 2': {'time': 10}
# }
# }
# }
# },
# 'Do a thing': {'time': 10},
# 'Project B': {
# 'time': 50,
# 'sub': {
# 'Do a thing': {'time': 50}
# }
# }
# }