Data processing technique to eliminate duplicate copies of repeating data
In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.
The deduplication process requires comparison of data 'chunks' (also known as 'byte patterns') which are unique, contiguous blocks of data. These chunks are identified and stored during a process of analysis, and compared to other chunks within existing data. Whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times (the match frequency is dependent on the chunk size), the amount of data that must be stored or transferred can be greatly reduced.[1][2]
A related technique is single-instance (data) storage, which replaces multiple copies of content at the whole-file level with a single shared copy. While possible to combine this with other forms of data compression and deduplication, it is distinct from newer approaches to data deduplication (which can operate at the segment or sub-block level).
Deduplication is different from data compression algorithms, such as LZ77 and LZ78. Whereas compression algorithms identify redundant data inside individual files and encodes this redundant data more efficiently, the intent of deduplication is to inspect large volumes of data and identify large sections – such as entire files or large sections of files – that are identical, and replace them with a shared copy.
^"Understanding Data Deduplication". Druva. 2009-01-09. Archived from the original on 2019-08-06. Retrieved 2019-08-06.
^Cite error: The named reference snia was invoked but never defined (see the help page).
and 25 Related for: Data deduplication information
In computing, datadeduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve...
Dell EMC Data Domain was Dell EMC’s datadeduplication storage system. Development began with the founding of Data Domain, and continued since that company’s...
security of their files. At the heart of the complaint was the policy of datadeduplication, where the system checks if a file has been uploaded before by any...
matching, identifying inaccuracy of data, overall quality of existing data, deduplication, and column segmentation. Such data problems can also be identified...
up deduplication in Wiktionary, the free dictionary. The term deduplication refers generally to eliminating duplicate or redundant information. Data deduplication...
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs...
called deduplication. It can occur on a server before any data moves to backup media, sometimes referred to as source/client side deduplication. This approach...
require external scripts and software for utilization. Native data compression and deduplication, although the latter is largely handled in RAM and is memory...
5D optical data storage (also branded as Superman memory crystal, a reference to the Kryptonian memory crystals from the Superman franchise) is an experimental...
makes the best possible usage of storage. Data maintenance DatadeduplicationData scrubbing End-to-end data protection Redundancy (engineering) Redundancy...
Cloud storage Hybrid cloud storage DatadeduplicationData proliferation Data storage tag used for capturing research data Disk utility File system List of...
April 2019). "Introduction to DataDeduplication in Windows Server 2012". Microsoft Tech Community. "DataDeduplication interoperability". docs.microsoft...
Datadeduplication was missing in early versions of ReFS. It was implemented in v3.2, debuting in Windows Server v1709. Support for alternate data streams...
Microsoft Exchange Server, DB2, and Teradata. Quest Software offers datadeduplication, and protection for NAS filers (NDMP). NetVault Backup is based on...
copy-on-write (COW) data, datadeduplication, reflink copies, online data and metadata scrubbing, highly accurate reporting of data loss or bad sectors...
between snapshots to a binary stream) Incremental backup Out-of-band datadeduplication (requires userspace tools) Ability to handle swap files and swap partitions...
January 7, 2024. Rick Vanover (14 September 2011). "Windows Server 8 datadeduplication". Archived from the original on 2016-07-18. Retrieved 2011-12-02....
Low-Power Double Data Rate (LPDDR), also known as LPDDR SDRAM, is a type of synchronous dynamic random-access memory that consumes less power and is targeted...
splitting file streams. Such content-defined chunking is often used for datadeduplication. Several programs, including gzip (with the --rsyncable option) and...
and each Williams tube could typically store about 256 to 2560 bits of data. Because the electron beam is essentially inertia-free and can be moved anywhere...
block allocation until data is flushed to disk; in contrast, some file systems allocate blocks immediately, even when the data goes into a write cache...
important to enable transferring data. Tape data storage is now used more for system backup, data archive and data exchange. The low cost of tape has...
for general storage and transfer of data. NAND or NOR flash memory is also often used to store configuration data in digital products, a task previously...