I have a list of a list of strings in Python, after reading from a .DAT file, as the following:
datContent = [['\x00\x00\x00\x00\x00\x00NGDUID\x00\x00\x00\x00\x00C\SAMPLEx00\x00\x00\x00', 'x00\x00\x00\x00NGDUID\x00\x00\x00\x00\x00C\SAMPLE2x00\x00\x00\x00'],
['\x00\x00x00\x00CY\x0059British', 'Columbia', '/', 'Colombie-Britannique\x00\x00\x00\', '\x00\x00\x00\x00212TroisRivieres-Montreal\x00\x00\x00\x00\'],
...] #Sublist contains strings
I am trying to parse the datContent so that it basically removes all the \x00\ terms. This is what I tried so far:
for i in range(len(datContent)):
datContent[i]=[s.replace("\\x00\\", "") for s in datContent[i]]
This piece of code doesn't seem to remove those terms. Preferably, I would want a list of list with all elements besides the x00 elements:
datContent=[['NGDUID', 'SAMPLE', 'NGDUID', 'SAMPLE2'], ['CY', '59BritishColumbia/Columbie-Britannique', 'TroisRivieres-Montreal'], ..]]
When I run a for loop through the list of lists and print each element:
for i in datContent[0]:
print(i) #this prints the correct elements (skips every x00 element)
Any suggestions?
CodePudding user response:
A step-by-step approach would involve creating a new list of lists as follows:
datContent = [['\x00\x00\x00\x00\x00\x00NGDUID\x00\x00\x00\x00\x00C\SAMPLEx00\x00\x00\x00', '\x00\x00\x00\x00NGDUID\x00\x00\x00\x00\x00C\SAMPLE2x00\x00\x00\x00'], [
'\x00\x00\x00\x00CY\x0059British', 'Columbia', '/', 'Colombie-Britannique\x00\x00\x00', '\x00\x00\x00\x00212TroisRivieres-Montreal\x00\x00\x00\x00']]
newDatContent = []
for row in datContent:
newRow = []
for string in row:
newRow.append(string.replace('\x00', ''))
newDatContent.append(newRow)
print(newDatContent)
Output:
[['NGDUIDC\\SAMPLEx00', 'NGDUIDC\\SAMPLE2x00'], ['CY59British', 'Columbia', '/', 'Colombie-Britannique', '212TroisRivieres-Montreal']]
CodePudding user response:
To get a list of list with all elements besides the x00 elements, you need to use the x00 pattern as delimiter.
A step-by-step using RE:
import re
def convertDat(datContent):
result = []
for dat in datContent:
#Convert list in string
dat = str(dat)
#Remove the list delimiters chars: , ' [ ]
dat = re.sub( r"[,'\[\] ]" , r"",dat)
#Replace the x00 patterns to delimiter ,
dat = re.sub( r"\\x00C\\|\\x00|x00|\\", r",", dat)
#Recreate the list
dat = dat.split(",")
#Remove empty strings
dat = list(filter(None,dat))
result.append(dat)
return result
newContent = convertDat(datContent)
print(newContent)
Output
[['NGDUID', 'SAMPLE', 'NGDUID', 'SAMPLE2'], ['CY', '59BritishColumbia/Colombie-Britannique', '212TroisRivieres-Montreal']]