找到你要的答案

Q:How can I tail a zipped file without reading its entire contents?

Q:我怎么能尾压缩的文件不阅读其全部内容?

I want to emulate the functionality of gzcat | tail -n.

This would be helpful for times when there are huge files (of a few GB's or so). Can I tail the last few lines of such a file w/o reading it from the beginning? I doubt that this won't be possible since I'd guess for gzip, the encoding would depend on all the previous text.

But still I'd like to hear if anyone has tried doing something similar - maybe investigating over a compression algorithm that could provide such a feature.

我想效仿gzcat |尾巴的功能,

这将是有益的时候,有巨大的文件(一些GB的左右)。我能把这样一个文件的最后几行从开头开始读吗?我怀疑这是不可能的因为我猜gzip,编码将取决于所有以前的文本。

但我还是想听听是否有人尝试做类似的事情-也许调查压缩算法,可以提供这样的功能。

answer1: 回答1:

No, you can't. The zipping algorithm works on streams and adapts its internal codings to what the stream contains to achieve its high compression ratio.

Without knowing what the contents of the stream are before a certain point, it's impossible to know how to go about de-compressing from that point on.

Any algorithm which allows you to de-compress arbitrary parts of it will require multiple passes over the data to compress it.

不,你不能拉算法基于流和适应其内部编码什么流包含实现高压缩比。

如果不知道流的内容在某一点之前,就不可能知道如何从那个点开始去压缩。

任何允许你对其任意部分进行压缩的算法都需要多个数据的传递来压缩它。

answer2: 回答2:

BGZF is used to created index gzip compressed BAM files created by Samtools. These are randomly accessible.

http://samtools.sourceforge.net/

转换是用来创建索引的gzip压缩的BAM文件创建samtools。这些是随机访问。

http://samtools.sourceforge.net/

answer3: 回答3:

If you have control over what goes into the file in the first place, if it's anything like a ZIP file you could store chunks of predetermined size with filenames in increasing numerical order and then just decompress the last chunk/file.

如果你有到在第一个文件是什么,如果它像任何一个zip文件可以存储预定大小的块文件名增加数字顺序然后解压的最后块/文件。

answer4: 回答4:

If it's an option, then bzip2 might be a better compression algorithm to use for this purpose.

Bzip2 uses a block compression scheme. As such, if you take a chunk of the end of your file which you are sure is large enough to contain all of the last chunk, then you can recover it with bzip2recover.

The block size is selectable at the time the file is written. In fact that's what happens when you set -1 (or --fast) to -9 (or --best) as compression options, which correspond to block sizes of 100k to 900k. The default is 900k.

The bzip2 command line tools don't give you a nice friendly way to do this with a pipeline, but then given bzip2 is not stream oriented, perhaps that's not surprising.

如果它是一个选项,然后bzip2可能是一个更好的压缩算法用于此目的的。

bzip2使用一块压缩方案。因此,如果你拿一块你的文件,你肯定是大到足以包含所有的最后一块的结尾,那么你可以恢复它bzip2recover。

在文件写入时,块大小是可选择的。事实上,当你设置1(或快速)到9(或最好的)作为压缩选项,对应于块大小100K到900K。默认的是900k。

bzip2命令行工具不会给你一个友好的方式做到这一点的一个管道,然后给予bzip2不是面向流的,也许这不足为奇。

answer5: 回答5:

zindex creates and queries an index on a compressed, line-based text file in a time- and space-efficient way.

https://github.com/mattgodbolt/zindex

在创建和查询的一个压缩指标,线在时间和空间的基于文本文件的有效途径。

https://github.com/mattgodbolt/zindex

algorithm  compression