What is Deduplication?
In information technology, Deduplication, or data deduplication, is a process that identifies redundant data (duplicate detection) and eliminates it before it is written to non-volatile disk. The process compresses data like other methods and hence reduces the amount of data that is sent from a transmitter to a receiver. It is almost impossible to predict the efficiency of deduplication algorithms because their efficiency is dependent on the data structure and the rate of change. Deduplication however is currently the most efficient way to reduce data, where a pattern is observable from backup cycle to backup cycle.
The main area of application for deduplication is backup, where it can practically achieve 1:12 compression rates from cycle to cycle. Deduplication algorithms are essentially useful for every application area where data is copied repeatedly. Deduplication is greatly advantageous for Hyper-V virtual machine backups and database server backups due to the need to perform cyclic backups and due to the block-oriented data structure of such systems.
How it Works
Deduplication systems operate differently than classic compression methods, using only a few pattern matching methods on the so-called “block level”, i.e. files are as divided into a number of blocks of equal size (usually powers of two). Herein also lies the distinction to the Single Instance Storage (SIS), which eliminates identical files (also known as content-addressed storage, CAS).
An important function of de-duplication is the “fingerprinting”. Files are split into segments of varying size (chunks). Files are scanned at the byte level to find out which segments provide the highest rate of repetition, which in turn provides maximum data reduction when using references to the original elements.
For example, when backing up data from disk to tape media there is usually only a relatively low ratio of new or modified to unmodified data between two full backups. Without deduplication, two full backups need still at least twice the storage space on tape. Deduplication detects identical parts in the data set and skips those. These unique segments are recorded in a list, the data blocks are only repeated by reference.
These pointers take up much less space than the referenced byte sequence. When the file is restored, data blocks are only read once and written out multiple times. An index structure indicates which parts are unique and how components are connected in order to recreate the original file again.
However, when deduplication is being used, backups are no longer independent full backups. When an increment is lost, it leads to data loss and the file cannot be restored again.
Methods
There are two ways to create a file index. The “reverse referencing ” method stores the first common element and all other identical blocks get a reference to the first. “Forward-Referencing” stores always the most recent shared data block and references the previously encountered items. There is some controversy about whether data can be restored quicker with either of those two methods. Additional processing strategies, such as “in-band” and “out-band” focus on whether parsers process the data stream “on the fly”, or after it has been stored at the destination. In the first case, only one data stream needs to exist. In the latter case the file may be examined in parallel using multiple data streams.
Chunking (fingerprinting)
Fingerprinting attempts o to determine how the incoming data stream can be disassembled into pieces, to produce as many identical blocks of data as possible. This process is called chunking
Identification of Blocks
Depending on how changes to the file are made and how precisely they can be detected, there will be less redundancy in the backup file. However, the block index complexity increases as well when a complex detection algorithm is being used. It is, therefore, crucial to select the best block identification method to find common blocks depending on the nature of the data.
Source
Wikipedia
BackupChain (Backup Software for Windows & Hyper-V offering Deduplication)
Backup Software Overview
Server Backup SoftwareDownload BackupChain
Cloud Backup
Backup VMware Workstation
Backup FTP
Backup VirtualBox
Backup File Server
Hyper-V Backup
Backup Hyper-VPopular
- Hyper-V Links, Guides, Tutorials & Comparisons
- Veeam Alternative
- How to Back up Cluster Shared Volumes
- DriveMaker: Map FTP, SFTP, S3 Site to a Drive Letter (Freeware)
Resources
- Free Hyper-V Server
- Remote Desktop Services Blog
- SCDPM Blog
- SCOM Blog
- V4 Articles
- Knowledge Base
- FAQ
- Sitemap
- Backup Education
- Archive 2024
- Archive 2022
- Archive 2021
- Archive 2020
- Archive 2018
- Archive 2017
- Archive 2016
- Archive 2015
- Archive 2014
- Archive 2013
- Hyper-V Scripts in PowerShell
- FastNeuron
- BackupChain (Greek)
- BackupChain (Deutsch)
- BackupChain (Spanish)
- BackupChain (French)
- BackupChain (Dutch)
- BackupChain (Italian)
Backup Software List
BackupChain
Veeam
Unitrends
Symantec Backup Exec
BackupAssist
Acronis
Zetta
Altaro
Windows Server Backup
Microsoft DPM
Ahsay
CommVault
IBM
Other Backup How-To Guides
- Windows Server 2012 R2 and Windows 8.1 Backup Software
- How to Resize, Shrink, or Expand VHDX in Hyper-V
- Windows Server 2025 Direct ISO Download Link
- 4 Things You Need To Know About Hyper-V Integration Services
- Resolving VSS_WS_FAILED_AT_FREEZE Backup Errors
- Free eBooks for Hyper-V and Windows Server Admins
- Windows 8 Client Hyper-V Limitations, Intro, and Pitfalls
- KB 2885465: CPU resources are not allocated correctly for VMs on Windows Server 2012
- Download Links for Windows 10 & Windows Server Technical Preview
- Disaster Recovery Strategies and Recovery Objectives
- Windows Server 2019 ISO Free Download + Hyper-V Server 2019
- How to remove Acronis and StorageCraft VSS driver / provider
- Hyper-V on RAID is Slow…Why?
- Hyper-V, VMware, and VirtualBox Hypervisor Limitations
- How to fix ‘Microsoft Hyper-V VSS Writer’ is in failed state, Writer Failure code: 0x800423f3
- Microsoft Hyper-V Backup for VHD and VHDX VMs
- Hyper-V VHD or VHDX? Advantages, Limitations, and Disadvantages
- How to Convert VHD Files to VHDX Disks in Hyper-V
- How to Install Hyper-V Server 2012 R2 in VMware Workstation (works for Windows Server 2008 too)
- Hyper-V CSV backup: What needs to be considered for VM backups?