Condense a Python Dictionary with Similar Keys-CodePudding

I currently have a dictionary that has several keys that are similar but are formatted differently (Visual Studio, Visual studio / JavaScript,Javascript,javascript).

How would I condense the dictionary so there's only one of a certain key, (Visual Studio, JavaScript, etc.) rather than the above example?

Note: Elements such as Vue and Vue.js are meant to be separate keys.

Is there something obvious that I'm missing?

Code for reference

def getVal(keys, data):
    techCount = dict()
    other = 0
    remList = []

    # Initialize Dictionary with Keys
    for item in keys:
        techCount[item] = 0

    # Load Values into Dictionary
    for item in data:
        techCount[item]  = 1

    # Creates the 'Other' field
    for key, val in techCount.items():
        if val <= 1:
            other  = 1
            remList.append(key)

    techCount['Other'] = other

    # Remove Redundant Keys
    for item in remList:
        techCount.pop(item)

    
    # Sort the Dictionary
    techCount = {key: val for key, val in sorted(
        techCount.items(), key=lambda ele: ele[1])}

    # Break up the Data
    keys = techCount.keys()
    techs = techCount.values()

    return keys, techs

Full List:

JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4       
Azure: 4        
AngularJs: 2
Java: 3
Visual Studio: 5
SQL: 4
Javascript: 5
Typescript: 3
AngularJS: 3
WordPress: 2
Zoho: 3
Drupal: 2
CSS: 9
.NET: 3
Python: 6
ReactJS: 3
HTML: 8
ASP.NET: 2
PHP: 2
Jira: 2
Other: 43

CodePudding user response：

How you solve this really depends on how data is structured-is it a list, a dictionary, or a string? Here I'll assume the data are in a dict() which seems the most likely given the data are like:

JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4       
Azure: 4        
AngularJs: 2
Java: 3
Visual Studio: 5

It seems like the problem is solely one of mixed case characters. If you convert all to lower case you'll get some collisions which you want to aggregate. Here is one way:

tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = dict()

for item in tech_count.items():
    norm_key = item[0].lower()
    if norm_key not in consolidated:
        consolidated[norm_key] = item[1]
    else:
        consolidated[norm_key]  = item[1]

print(consolidated)

or if you want to do this succinctly as suggested by @juanpa.arrivillaga then you could do it

tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = dict()

for item in tech_count.items():
    norm_key = item[0].lower()
    consolidated[norm_key] = consolidated.get(norm_key, 0)   item[1]

print(consolidated)

A more specialized data structure for this sort of thing is the collections.Counter which ships with python. One benefit to the counter is that querying for keys you have not yet seen will return 0 values which can make for fewer edge case considerations.

With counter one way would look like this:

from collections import Counter
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}

consolidated = Counter()

for item in tech_count.items():
    norm_key = item[0].lower()
    consolidated[norm_key] = consolidated.get(norm_key, 0)   item[1]

print(consolidated)
consolidated['assembly'] # returns 0

Now consolidated will have the sum of the counts from the colliding key-value pairs in the original dictionary. If there are more similar transformations on the keys you could write a separate function that takes a string as input and replace the item[0].lower() keys.

CodePudding user response：

If you were able to fundamentally standarize the same word (with different capital letters) you should be able to properly "condense" the dictionary. How can we achieve this? Simple, you could make every key value lowercase when building your dictionary:

# Initialize Dictionary with Keys
for item in keys:
    techCount[item.lower()] = 0

# Load Values into Dictionary
for item in data:
    techCount[item.lower()]  = 1