I have a list of headings and subheadings of a document.
test_list = ['heading', 'heading','sub-heading', 'sub-heading', 'heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading', 'sub-heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading','sub-sub-heading', 'heading']
I want to assign unique index to each of the heading and the subheading like follows:
seg_ids = ['1', '2', '2_1', '2_2', '3', '3_1', '3_1_1', '3_1_2', '3_2', '3_3', '3_3_1', '3_3_2', '3_3_3', '4']
This is my code to create this result but it is messy and it is restricted to depth 3. If there is any document with a sub-sub-sub heading the code would become more complicated. Is there any pythonic way to do this?
seg_ids = []
for idx, an_ele in enumerate(test_list):
head_id = 0
subh_id = 0
subsubh_id = 0
if an_ele == 'heading' and idx == 0: # if it is the first element
head_id = '1'
seg_ids.append(head_id)
else:
last_seg_ids = seg_ids[idx-1].split('_') # find the depth of the last element
head_id = last_seg_ids[0]
if len(last_seg_ids) == 2:
subh_id = last_seg_ids[1]
elif len(last_seg_ids) == 3:
subh_id = last_seg_ids[1]
subsubh_id = last_seg_ids[2]
if an_ele == 'heading':
head_id= str(int(head_id) 1)
subh_id = 0 # reset sub_heading index
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-heading':
subh_id= str(int(subh_id) 1)
subsubh_id = 0 # reset sub_sub_heading index
elif an_ele == 'sub-sub-heading':
subsubh_id= str(int(subsubh_id) 1)
else:
print('ERROR')
if subsubh_id==0:
if subh_id !=0:
seg_ids.append(head_id '_' subh_id)
else:
seg_ids.append(head_id)
if subsubh_id !=0:
seg_ids.append(str(head_id) '_' str(subh_id) '_' str(subsubh_id))
print(seg_ids)
CodePudding user response:
def get_level(s):
return s.count('-')
def translate(test_list):
seg_ids = []
levels = [0]*9
last_level = 99
for an_ele in test_list:
level = get_level(an_ele)
if level <= last_level:
levels[level] = 1
else:
levels[level] = 1
seg_ids.append( '_'.join(str(k) for k in levels[:level 1]))
last_level = level
return seg_ids
print(translate(['heading', 'heading','sub-heading', 'sub-heading', 'heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading', 'sub-heading', 'sub-heading', 'sub-sub-heading', 'sub-sub-heading','sub-sub-heading', 'heading']))
Output:
['1', '2', '2_1', '2_2', '3', '3_1', '3_1_1', '3_1_2', '3_2', '3_3', '3_3_1', '3_3_2', '3_3_3', '4']
This fixes the maximum number of levels at 9. You could extend that by setting levels=[0]
and then extending it if the new level was beyond the end, but this gets the point across.
CodePudding user response:
You may use the split('-')
method to find the level of the heading:
subs_amount = an_ele.split('-')
You can deduce the level of the heading from the length of the subs_amount
list. If the length is 1, then it is a "heading"
. If it's 3, it is a "sub-sub-heading"
. Etc.
Then, have a list store_levels
to store the indexes of the previous headings of greater level, like Tim Roberts says in their comment:
if len(subs_amount) > len(store_levels):
store_levels.append(1) #add a sub-level
elif len(subs_amount) == len(store_levels):
store_levels[-1] = 1 #add a heading of the same level
else:
del store_levels[-1] #go back to the level above
Now, to build your output, you just have to "_".join(store_levels)
and append it to the output.
Sorry for not using the same variable names as you. I did so not to confuse or change their use. I hope my code is clear enough so you can implement it to yours.