I have a program I'm working on where I need to read a .txt file which has multiple rows of data that look like this:
[ABC/DEF//25GHI////JKLM//675//]
My program below can print each sequence on a new line for analysis, however the function is where I'm having issues. I can get it to remove the individual numerical values "675" and leave the alphanumeric ones. (Removes 675 from sample)
a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
lines = [line.rstrip('\n') for line in file]
print(*lines, sep = "\n")
cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9] $'):
for line in lines:
tokens = line.split(delimiter)
tokens = [re.sub(remove, "", token) for token in tokens]
clean_list = list(filter(lambda e:e.strip(), tokens))
cleaned_data.append(clean_list)
print(clean_list) # Quick check if function works
split_lines(lines, "/")
This then prints out separated rows like this, removing the white spaces (where "/" was, and numerical values)
["ABC", "DEF", "25GHI", "JKLM"]
What I'm trying to do is then use the "cleaned_data" list that contains these newly delimited rows, and quantify them to output this:
4x ["ABC", "DEF", "25GHI", "JKLM"]
What can I do next using "cleaned_data" to read each row and print a count of duplicate strings?
CodePudding user response:
from pprint import pprint
unique_data = {}
cleaned_data = [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 3, 4, 5, 'a', 'b', [1, 2,
],
[1, 2, ]]
for item in cleaned_data:
key = str(item) # convert mutable objects like list to immutable string.
if not unique_data.get(key): # key does not exist
unique_data[key] = 1, item # Add count of 1 and the data
else: # A duplicate has been encountered
# Increment the count
unique_data[key] = (unique_data[key][0] 1), item
for k, v in unique_data.items():
print(f"{v[0]}:{v[1]}")
Output:
1:1
1:2
2:3
2:4
2:5
2:a
2:b
1:c
1:d
2:[1, 2]
CodePudding user response:
If you just need to get rid of duplicates:
deduped_row_of_cleaned_data = list(set(row_of_cleaned_data))
If you need to know how many duplicates just subtract len(deduped_row_of_cleaned_data) from len(row_of_cleaned_data).
If you need a count of all duplicates you can create a list assigned empty dictionary from your deduped row:
empty_dict=dict.from_keys(list(set(row_of_cleaned_data)),[])
Then loop through the list to add each value:
for item in row_of_cleaned_data:
empty_dict[item].append(item)
The loop through the dictionary to get the counts:
for key, value in empty_dict.items():
empty_dict[key] = len(value)
After that, you have the deduped data in
list(empty_dict.keys())
and counts of each item in
list(empty_dict.values()).