Home > Software design >  Remove keys with empty strings in all the dictionaries in a list (essentially remove empty columns)
Remove keys with empty strings in all the dictionaries in a list (essentially remove empty columns)

Time:10-03

  • This is a list of dictionary
  • Some keys on a dictionary (as in the element of a list) may contain missing values
  • I want to remove the keys entirely from all the dictionaries in the list, if they have empty strings as value which is in every dictionary element of the list. (see the code for a better explanation)
  • The key-value structure of the dictionary element remains the same
  • Preferably in a opitimized way possible without using third party libraries. I have a working solution and it is very greedy.

Code example

dict_data = [
  {"a": "lorem", "b": "ipsum", "c": ""},
  {"a": "lorem2", "b": "ipsum1", "c": ""},
  {"a": "", "b": "ipsum3", "c": ""},
  {"a": "lore3", "b": "", "c": ""}
]

In this situation, I want to remove, the key "c" from all the dictionaries, as that key has no value (empty string) in each dictionary of the list. If you convert this to a table structure this will represent a column with no values.

Expected Result

The result will look something like this:

dict_data = [
  {"a": "lorem", "b": "ipsum"},
  {"a": "lorem2", "b": "ipsum1"},
  {"a": "", "b": "ipsum3"},
  {"a": "lore3", "b": ""}
]

Only the "c" key across all the dictionary in the list is removed.

What I tried till now:

Works but not satisfied with the amount of for loops.

# will contain {"key_with_mssing_values": "number_of_rows_that_has_missing_values_for_this_key"}
missing_values_dict = {}

for row in dict_data:
    for key, value in row.items():
        if not value:
            if key in missing_values_dict:
                missing_values_dict[key]  =1
            else:
                missing_values_dict[key] = 1

# missing_values_dict ==> {'c': 4, 'a': 1, 'b': 1}

for key, value in missing_values_dict.items():
    # if the value is equal to the length of the list
    # it means it is missing values on all the rows/dictionaries
    if value == len(dict_data):
        [row.pop(key, None) for row in dict_data]

# dict_data
## [{'a': 'lorem', 'b': 'ipsum'}, {'a': 'lorem2', 'b': 'ipsum1'}, {'a': '', 'b': 'ipsum3'}, {'a': 'lore3', 'b': ''}]

I appreciate some help. Thank you.

CodePudding user response:

If you need to check ALL rows are empty, you don't need to iterate all elements in all rows. Just ones that were empty previously.

empty = set(dict_data[0].keys())
for d in dict_data:
    for k in empty.copy():
        if d[k] != "":
            empty -= {k}

for k in empty:
    for d in dict_data:
        d.pop(k, None)
        
print(dict_data)

Output:

[{'a': 'lorem', 'b': 'ipsum'},
 {'a': 'lorem2', 'b': 'ipsum1'},
 {'a': '', 'b': 'ipsum3'},
 {'a': 'lore3', 'b': ''}]
  • Related