Home > Net >  Unable to extract nested tar within zip file i.e. a .tar file inside a zip file and so on
Unable to extract nested tar within zip file i.e. a .tar file inside a zip file and so on

Time:08-03

I have gone through the link of how to extract a .tar file and several link on SOF using Java. However, I didnt find any which can relate to my concerns which is multilevel or nested .tar/.tgz/.zip file. my concern is with something like below

Abc.tar.gz
    --DEF.tar
          --sample1.txt
          --sample2.txt 
    --FGH.tgz
          --sample3.txt
-sample4.txt    

This is the simple one which I can give here . As it can be in any compressed combination with the folder like .tar inside .tar and .gz and again .tgz and so on....

My problem is I am able to extract till the first level using Apache Commons Compress library. that is if Abc.tar.gz gets extracted then in the destination/output folder its only DEF.tar available . beyond that my extraction is not working.

I tried to give the output of first to the input to the second on the fly but I got stuck with FileNotFoundException. As at that point of time output file would have not been in place and the second extraction not able to get the file.

Pseudocode:

public class CommonExtraction {
   
    
    TarArchiveInputStream tar = null;
    if((sourcePath.trim().toLowerCase.endsWith(".tar.gz")) || sourcePath.trim().toLowerCase.endsWith(".tgz")) {
        try {
        tar=new TarArchiveInputStream(new GzipCompressorInputStream(new BufferedInputStream(new FileInputStream(sourcePath))));
        extractTar(tar,destPath)
        } catch (Exception e) {
            e.printStackTrace();
        }
        }
        }
        
        Public static void extractTar(TarArchiveInputStream tar, String outputFolder) {
        try{
        TarArchiveEntry entry;
        while (null!=(entry=(TarArchiveEntry)tar.getNextTarEntry())) {
        if(entry.getName().trim().toLowerCase.endsWith(".tar")){
        final String path = outputFolder   entry.getName()
        tar=new TarArchiveInputStream(new BufferedInputStream(new FileInputStream(path))) // failing as .tar folder after decompression from .gz not available at destination path
        extractTar(tar,outputFolder)
        }
        extractEntry(entry,tar,outputFolder)        
        }
        tar.close();
        }catch(Exception ex){
                 ex.printStackTrace();
        }
        }
        
        Public static void extractEntry(TarArchiveEntry entry , InputStream tar, String outputFolder){
        final String path = outputFolder   entry.getName()
        if(entry.isDirectory()){
        new File(path).mkdirs();
        }else{
        //create directory for the file if not exist
        }
        // code to read and write until last byte is encountered
        }
        
    }

Ps: please ignore the syntax and all in the code.

CodePudding user response:

Try this

try (InputStream fi = file.getInputStream();
    InputStream bi = new BufferedInputStream(fi);
    InputStream gzi = new GzipCompressorInputStream(bi, false);
    ArchiveInputStream archive = new TarArchiveInputStream(gzi)) {

        withArchiveStream(archive, result::appendEntry);
}

As i see what .tar.gz and .tgz is same formats. And my method withArchiveEntry is:

private void withArchiveStream(ArchiveInputStream archInStream, BiConsumer<ArchiveInputStream, ArchiveEntry> entryConsumer) throws IOException {
    ArchiveEntry entry;
    while((entry = archInStream.getNextEntry()) != null) {
        entryConsumer.accept(archInStream, entry);
    }
}

private void appendEntry(ArchiveInputStream archive, ArchiveEntry entry) {

    if (!archive.canReadEntryData(entry)) {
        throw new IOException("Can`t read archive entry");
    }

    if (entry.isDirectory()) {
        return;
    }


    // And for example
    String content = new String(archive.readAllBytes(), StandardCharsets.UTF_8);
    System.out.println(content);
}

CodePudding user response:

You have a recursive problem, so you can use recursion to solve it. Here is some pseudocode to show how it can be done:

public class ArchiveExtractor
{
    public void extract(File file)
    {
        List<File> files; // list of extracted files

        if(isZip(file))
            files = extractZip(file);
        else if(isTGZ(file))
            files = extractTGZ(file);
        else if(isTar(file))
            files = extractTar(file);
        else if(isGZip(file))
            files = extractGZip(file);

        for(File f : files)
        {
            if(isArchive(f))
                extract(f); // recursive call
        }
    }

    private List<File> extractZip(File file)
    {
        // extract archive and return list of extracted files
    }

    private List<File> extractTGZ(File file)
    {
        // extract archive and return list of extracted files
    }

    private List<File> extractTar(File file)
    {
        // extract archive and return list of extracted files
    }

    private List<File> extractGZip(File file)
    {
        // extract archive and return list of extracted file
    }
}
  • Related