I have data regarding a directory structure of unknown (and massive) size and data regarding the same structure from perforce. Using Python, I need to be able to match the local data with the perforce data and generate a list of files that reflects all of the data on the users workspace (local directory), including all of the files missing from perforce, as well as all the data in the depot that is missing from the workspace.
Local Directory Structure Data:
- I have full control over how I mine out that data (currently using os.walk)
Perforce Data:
- Not much control over how the data is returned
- Currently comes as a list of dictionaries
- Data returns very fast regardless of size.
#this list is hundreds of thousands of entries.
p4data_example = [{'depotFile': '//Path/To/Data/file.extension', 'clientFile': 'X:\\Path\\To\\Data\\file.extension', 'isMapped': '', 'headAction': 'add', 'headType': 'text', 'headTime': '00000', 'headRev': '1', 'headChange': '0000', 'headModTime': '00000', 'haveRev': '', 'otherOpen': ['stuff'], 'otherAction': ['move/delete'], 'otherChange': ['00000'], 'otherOpens': '1'}]
I need to operate on the local directory files whether or not they have matching p4 data.
path_to_data = "X:\Path\To\Data"
p4data = p4.run('fstat', "%s\..." % path_to_data)
for root, dirs, files in os.walk(path_to_data, topdown = False):
for file in files:
os.path.join(root,file)
matchingp4 = None
for p4item in p4Data:
if p4item['clientFile'] == file_name:
matchingp4 = p4item
break
do_stuff_with_data(foo, bar)
I am confident this is not the most efficient way to handle this.
The extended time seems to come from:
- Getting all of the local data
- Needing to loop over the data so many times to find matches.
I need this to run as fast as possible. Ideally this would run in just a couple seconds but I understand that not knowing how large the data set can get will cause this to vary by an unknown amount.
CodePudding user response:
Using Python, I need to be able to match the local data with the perforce data and find all of the local files missing from perforce and all of the perforce data that differs from the local data.
(snip)
I am confident this is not the most efficient way to handle this.
Correct. Just run p4 reconcile
and Perforce will do all of this automatically. :)
reconcile
does essentially what you're trying to do, but much more efficiently -- the client walks the local tree, sends a list of files to the server, and then instead of doing an NxN comparison the server uses the mapping information to directly request additional client checks (i.e. checksumming to detect differences) as appropriate for individual files.