I currently have a dictionary that has several keys that are similar but are formatted differently (Visual Studio, Visual studio / JavaScript,Javascript,javascript).
How would I condense the dictionary so there's only one of a certain key, (Visual Studio, JavaScript, etc.) rather than the above example?
Note: Elements such as Vue and Vue.js are meant to be separate keys.
Is there something obvious that I'm missing?
Code for reference
def getVal(keys, data):
techCount = dict()
other = 0
remList = []
# Initialize Dictionary with Keys
for item in keys:
techCount[item] = 0
# Load Values into Dictionary
for item in data:
techCount[item] = 1
# Creates the 'Other' field
for key, val in techCount.items():
if val <= 1:
other = 1
remList.append(key)
techCount['Other'] = other
# Remove Redundant Keys
for item in remList:
techCount.pop(item)
# Sort the Dictionary
techCount = {key: val for key, val in sorted(
techCount.items(), key=lambda ele: ele[1])}
# Break up the Data
keys = techCount.keys()
techs = techCount.values()
return keys, techs
Full List:
JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4
Azure: 4
AngularJs: 2
Java: 3
Visual Studio: 5
SQL: 4
Javascript: 5
Typescript: 3
AngularJS: 3
WordPress: 2
Zoho: 3
Drupal: 2
CSS: 9
.NET: 3
Python: 6
ReactJS: 3
HTML: 8
ASP.NET: 2
PHP: 2
Jira: 2
Other: 43
CodePudding user response:
How you solve this really depends on how data
is structured-is it a list, a dictionary, or a string? Here I'll assume the data are in a dict()
which seems the most likely given the data are like:
JavaScript: 3
C#: 9
Visual studio: 2
Docker: 4
Azure: 4
AngularJs: 2
Java: 3
Visual Studio: 5
It seems like the problem is solely one of mixed case characters. If you convert all to lower case you'll get some collisions which you want to aggregate. Here is one way:
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = dict()
for item in tech_count.items():
norm_key = item[0].lower()
if norm_key not in consolidated:
consolidated[norm_key] = item[1]
else:
consolidated[norm_key] = item[1]
print(consolidated)
or if you want to do this succinctly as suggested by @juanpa.arrivillaga then you could do it
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = dict()
for item in tech_count.items():
norm_key = item[0].lower()
consolidated[norm_key] = consolidated.get(norm_key, 0) item[1]
print(consolidated)
A more specialized data structure for this sort of thing is the collections.Counter
which ships with python. One benefit to the counter is that querying for keys you have not yet seen will return 0
values which can make for fewer edge case considerations.
With counter one way would look like this:
from collections import Counter
tech_count = {'JavaScript': 3, 'Visual studio': 2, 'Visual Studio': 5, 'Javascript': 5}
consolidated = Counter()
for item in tech_count.items():
norm_key = item[0].lower()
consolidated[norm_key] = consolidated.get(norm_key, 0) item[1]
print(consolidated)
consolidated['assembly'] # returns 0
Now consolidated will have the sum of the counts from the colliding key-value pairs in the original dictionary. If there are more similar transformations on the keys you could write a separate function that takes a string as input and replace the item[0].lower()
keys.
CodePudding user response:
If you were able to fundamentally standarize the same word (with different capital letters) you should be able to properly "condense" the dictionary. How can we achieve this? Simple, you could make every key value lowercase when building your dictionary:
# Initialize Dictionary with Keys
for item in keys:
techCount[item.lower()] = 0
# Load Values into Dictionary
for item in data:
techCount[item.lower()] = 1