What is Deduplication?
In information technology, Deduplication, or data deduplication, is a process that identifies redundant data (duplicate detection) and eliminates it before it is written to non-volatile disk. The process compresses data like other methods and hence reduces the amount of data that is sent from a transmitter to a receiver. It is almost impossible to predict the efficiency of deduplication algorithms because their efficiency is dependent on the data structure and the rate of change. Deduplication however is currently the most efficient way to reduce data, where a pattern is observable from backup cycle to backup cycle.
The main area of application for deduplication is backup, where it can practically achieve 1:12 compression rates from cycle to cycle. Deduplication algorithms are essentially useful for every application area where data is copied repeatedly. Deduplication is greatly advantageous for Hyper-V virtual machine backups and database server backups due to the need to perform cyclic backups and due to the block-oriented data structure of such systems.
How it Works
Deduplication systems operate differently than classic compression methods, using only a few pattern matching methods on the so-called “block level”, i.e. files are as divided into a number of blocks of equal size (usually powers of two). Herein also lies the distinction to the Single Instance Storage (SIS), which eliminates identical files (also known as content-addressed storage, CAS).
An important function of de-duplication is the “fingerprinting”. Files are split into segments of varying size (chunks). Files are scanned at the byte level to find out which segments provide the highest rate of repetition, which in turn provides maximum data reduction when using references to the original elements.
For example, when backing up data from disk to tape media there is usually only a relatively low ratio of new or modified to unmodified data between two full backups. Without deduplication, two full backups need still at least twice the storage space on tape. Deduplication detects identical parts in the data set and skips those. These unique segments are recorded in a list, the data blocks are only repeated by reference.
These pointers take up much less space than the referenced byte sequence. When the file is restored, data blocks are only read once and written out multiple times. An index structure indicates which parts are unique and how components are connected in order to recreate the original file again.
However, when deduplication is being used, backups are no longer independent full backups. When an increment is lost, it leads to data loss and the file cannot be restored again.
Methods
There are two ways to create a file index. The “reverse referencing ” method stores the first common element and all other identical blocks get a reference to the first. “Forward-Referencing” stores always the most recent shared data block and references the previously encountered items. There is some controversy about whether data can be restored quicker with either of those two methods. Additional processing strategies, such as “in-band” and “out-band” focus on whether parsers process the data stream “on the fly”, or after it has been stored at the destination. In the first case, only one data stream needs to exist. In the latter case the file may be examined in parallel using multiple data streams.
Chunking (fingerprinting)
Fingerprinting attempts o to determine how the incoming data stream can be disassembled into pieces, to produce as many identical blocks of data as possible. This process is called chunking
Identification of Blocks
Depending on how changes to the file are made and how precisely they can be detected, there will be less redundancy in the backup file. However, the block index complexity increases as well when a complex detection algorithm is being used. It is, therefore, crucial to select the best block identification method to find common blocks depending on the nature of the data.
Source
Wikipedia
BackupChain (Backup Software for Windows & Hyper-V offering Deduplication)
Backup Software Overview
BackupChain Server Backup SoftwareDownload BackupChain
Cloud Backup
Backup VMware Workstation
Backup FTP
Backup VirtualBox
Backup File Server
Hyper-V Backup
Backup Hyper-VPopular
- Hyper-V Links, Guides, Tutorials & Comparisons
- Veeam Alternative
- How to Back up Cluster Shared Volumes
- DriveMaker: Map FTP, SFTP, S3 Site to a Drive Letter (Freeware)
Resources
- Free Hyper-V Server
- Remote Desktop Services Blog
- SCDPM Blog
- SCOM Blog
- V4 Articles
- Knowledge Base
- FAQ
- Sitemap
- Backup Education
- Archive 2024
- Archive 2022
- Archive 2021
- Archive 2020
- Archive 2018
- Archive 2017
- Archive 2016
- Archive 2015
- Archive 2014
- Archive 2013
- Hyper-V Scripts in PowerShell
- FastNeuron
- BackupChain (Greek)
- BackupChain (Deutsch)
- BackupChain (Spanish)
- BackupChain (French)
- BackupChain (Dutch)
- BackupChain (Italian)
Backup Software List
BackupChain
Veeam
Unitrends
Symantec Backup Exec
BackupAssist
Acronis
Zetta
Altaro
Windows Server Backup
Microsoft DPM
Ahsay
CommVault
IBM
Other Backup How-To Guides
- How to Delete Hyper-V Backup Checkpoint That’s Stuck
- Hyper-V Stop 0x0000000A BSOD Error Causes and Fixes KB2776366
- Windows Server 2019 ISO Free Download + Hyper-V Server 2019
- Backup Strategies for Large VMs with and without Deduplication
- Windows 11 Final Release Free ISO Download Links
- Windows Server 2022 ISO Final Release Free Download
- Current Windows Server 2012 Updates and Hotfixes
- What is Hyper-V and What Operating Systems are Supported?
- 9 Editions of Windows Server 2012 Compared At a Glance
- Windows 10 Hyper-V Backup Solution to Back up VHD & VHDX Virtual Machines
- Destination Path Too Long Fixed: Freeware Tool Deletes Long Path
- How to Install Hyper-V on Windows 8
- Azure Stack HCI ISO Download for Free
- Windows 8 Client Hyper-V Limitations, Intro, and Pitfalls
- How to Fix VolSnap 28 Error “The shadow copy could not be created…
- Hyper-V Host Disk Backup, Physical Host and Virtual Machine Backup
- How to Convert from Dynamic VHD/VHDX Disk Format to / from Fixed in Hyper-V
- Windows Server 2012 Failover Cluster Important Updates
- Volume Shadow Copy Service error EndPrepareSnapshots Cannot Find Anymore Diff Area
- How to Fix Error ID 10178 in Hyper-V VMMS