Two level data contrast-CodePudding

Two must levels of data, there are two columns, a list of tags, another column data, hope the two data, the number of the same label if different, to be listed separately?
In a different way to deal with is very slow, may be usage is wrong, is there a better or faster way,

1. The data format below
E39aa0582fb3d25b42a77fb9cb09472d 99

2. Use mysql query directly, running a day don't respond,
SELECT the m1. Tag, m1. Value, value
s2.FROM the main m1 LEFT JOIN sub s2
On the m1. Tag=s2. The tag
Where m1. Value & lt; & gt; S2. The value;

3. With the shell is very slow, the data is the data table is exported to the two text files
#!/bin/bash
While the read the line
Do
NS1=` echo $line | awk '{print $1}' `
NS2=` echo $line | awk '{print $NF}' `
NS3=` grep $NS1 main. CSV | awk '{print $NF}' `
If [$NS2!=$NS3]
Then
Echo "MSGID=$NS1 SUBSEQ=$NS2 COUNT=$NS3" & gt; & gt; O1. Out
Fi
Echo "MSGID=$NS1"
Done & lt; Sub. CSV

4. Handle very slowly in python
The import re
The import OS

Mainfilename='main. CSV'
Subfilename='sub. CSV'

Mainfile=open (mainfilename, 'r')

Def search1 (searchstr, var1) :
For McOntent mainfile in readlines () :
M=re the.findall (searchstr, McOntent)
If m:
Print (" -- -- ", m, "_")

Subfile=open (subfilename, "r")
Contents=subfile. Readlines ()
For the content in contents:
Tag=content. the split (' \ t ') [0]. Replace (" ", "")
Value=https://bbs.csdn.net/topics/content.split (' \ t ') [1]. The replace (" ", "")
Search1 (value) tag,

CodePudding user response:

High into the air. You do this complexity, it is easy to think of two methods (1) will be ordered two data respectively, and then compared. 2. The two data hash respectively into the dictionary, and then query. The second method code simpler, can get this done in a few lines.