Global Information Lookup Global Information

Compressed data structure information


The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a conventional data structure for the problem, but whose size can be substantially smaller. The size of the compressed data structure is typically highly dependent upon the information entropy of the data being represented.

Important examples of compressed data structures include the compressed suffix array[1][2] and the FM-index,[3] both of which can represent an arbitrary text of characters T for pattern matching. Given any input pattern P, they support the operation of finding if and where P appears in T. The search time is proportional to the sum of the length of pattern P, a very slow-growing function of the length of the text T, and the number of reported matches. The space they occupy is roughly equal to the size of the text T in entropy-compressed form, such as that obtained by Prediction by Partial Matching or gzip. Moreover, both data structures are self-indexing, in that they can reconstruct the text T in a random access manner, and thus the underlying text T can be discarded. In other words, they simultaneously provide a compressed and quickly searchable representation of the text T. They represent a substantial space improvement over the conventional suffix tree and suffix array, which occupy many times more space than the size of T. They also support searching for arbitrary patterns, as opposed to the inverted index, which can support only word-based searches. In addition, inverted indexes do not have the self-indexing feature.

An important related notion is that of a succinct data structure, which uses space roughly equal to the information-theoretic minimum, which is a worst-case notion of the space needed to represent the data. In contrast, the size of a compressed data structure depends upon the particular data being represented. When the data are compressible, as is often the case in practice for natural language text, the compressed data structure can occupy space very close to the information-theoretic minimum, and significantly less space than most compression schemes.[example needed][citation needed]

  1. ^ Grossi, Roberto; Vitter, Jeffrey Scott (January 2005). "Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching" (PDF). SIAM Journal on Computing. 35 (2): 378–407. doi:10.1137/S0097539702402354. hdl:1808/18962.
  2. ^ R. Grossi, A. Gupta, and J. S. Vitter, High-Order Entropy-Compressed Text Indexes, Proceedings of the 14th Annual SIAM/ACM Symposium on Discrete Algorithms, January 2003, 841-850.
  3. ^ Ferragina, P.; Manzini, G. (2000). "Opportunistic data structures with applications". Proceedings 41st Annual Symposium on Foundations of Computer Science. pp. 390–398. doi:10.1109/SFCS.2000.892127. ISBN 0-7695-0850-2. S2CID 12530704.

and 21 Related for: Compressed data structure information

Request time (Page generated in 0.8467 seconds.)

Compressed data structure

Last Update:

The term compressed data structure arises in the computer science subfields of algorithms, data structures, and theoretical computer science. It refers...

Word Count : 471

Succinct data structure

Last Update:

that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself. Suppose...

Word Count : 2896

List of data structures

Last Update:

Brodal queue In these data structures each tree node compares a bit slice of key values. Radix tree Suffix tree Suffix array Compressed suffix array FM-index...

Word Count : 911

Compressed suffix array

Last Update:

science, a compressed suffix array is a compressed data structure for pattern matching. Compressed suffix arrays are a general class of data structure that...

Word Count : 744

Compressed sensing

Last Update:

Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring...

Word Count : 5864

Trie

Last Update:

digital tree or prefix tree, is a type of k-ary search tree, a tree data structure used for locating specific keys from within a set. These keys are most...

Word Count : 3395

Compressive strength

Last Update:

In mechanics, compressive strength (or compression strength) is the capacity of a material or structure to withstand loads tending to reduce size (as...

Word Count : 2417

Structure

Last Update:

minerals and chemicals. Abstract structures include data structures in computer science and musical form. Types of structure include a hierarchy (a cascade...

Word Count : 2140

Audio file format

Last Update:

coding format and can be uncompressed, or compressed to reduce the file size, often using lossy compression. The data can be a raw bitstream in an audio coding...

Word Count : 918

Dictionary coder

Last Update:

lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure...

Word Count : 592

Compressed air

Last Update:

Compressed air is air kept under a pressure that is greater than atmospheric pressure. Compressed air is an important medium for transfer of energy in...

Word Count : 1624

Binary decision diagram

Last Update:

compressed representations, operations are performed directly on the compressed representation, i.e. without decompression. Similar data structures include...

Word Count : 2937

List of file formats

Last Update:

AT3 – Sony's UMD data compression ARC – pre-Zip data compression ARC – Nintendo U8 Archive (mostly Yaz0 compressed) ARJ – ARJ compressed file ASS, SSA –...

Word Count : 13920

Quadtree

Last Update:

A quadtree is a tree data structure in which each internal node has exactly four children. Quadtrees are the two-dimensional analog of octrees and are...

Word Count : 4710

Sparse matrix

Last Update:

efficient access and matrix operations, such as CSR (Compressed Sparse Row) or CSC (Compressed Sparse Column). DOK consists of a dictionary that maps...

Word Count : 3182

Quantile

Last Update:

Computing approximate quantiles from data arriving from a stream can be done efficiently using compressed data structures. The most popular methods are t-digest...

Word Count : 2912

Circular buffer

Last Update:

is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end. This structure lends itself easily to buffering data streams...

Word Count : 1436

Exif

Last Update:

PCM or ITU-T G.711 μ-law PCM for uncompressed audio data, and IMA-ADPCM for compressed audio data). It does not support JPEG 2000 or GIF encoded images...

Word Count : 3447

Archive file

Last Update:

are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space...

Word Count : 901

Audio codec

Last Update:

digital data stream (a codec) that encodes or decodes audio. In software, an audio codec is a computer program implementing an algorithm that compresses and...

Word Count : 349

DriveSpace

Last Update:

the compressed contents of a compressed drive was stored in a single file implied the possibility of a user accidentally deleting all of their data by...

Word Count : 2076

PDF Search Engine © AllGlobal.net