ABSTRACT
The Reversible Data Hiding technique is a new technique that allows hiding of information within a digital image and also allows the extraction of the hidden data without distortion of the image. The new technique solves several problems faced by digital image users, particularly in sensitive military, legal and medical applications. There are several algorithms based on this technique. Of all these algorithms the latest one is the most efficient and it utilizes the zero or the minimum points of the histogram of an image and slightly modifies the pixel grayscale values to embed data into the image.
The latest algorithm can embed more data than many of the existing reversible data hiding algorithms. The latest algorithm has got less computational complexity and short execution time
INTRODUCTION
DATA HIDING is the process to hide data
(representing some information) into cover media. That
is, the data hiding process links two sets of data, a set of the
media. That is, some permanent distortion has occurred to the
In some applications, such as medical diagnosis and law enforcement, it is critical to reverse the marked media back to
sensing and high-energy particle physical experimental investigation, it is also desired that the original cover media can be
recovered because of the required high-precision nature. The
marking techniques satisfying this requirement are referred to
as reversible, lossless, distortion-free, or invertible data hiding
techniques. Reversible data hiding facilitates immense possibility
of applications to link two sets of data in such a way that
the cover media can be losslessly recovered after the hidden data
have been extracted out, thus providing an additional avenue of
handling two different sets of data.
Obviously, most of the existing data hiding techniques are not reversible. For instance, the widely utilized spread-spectrum based data hiding methods are not invertible owing to truncation (for the purpose to prevent over/underflow) error and round-off error. The well-known least significant bit plane (LSB) based schemes are not lossless owing to bit replacement without “memory.” Another category of data hiding techniques, quantization-index-modulation (QIM) based schemes, are not distortion-free owing
to quantization error.
Some of reversible marking techniques are:-
Honsinger’s Algorithm is carried out
in the spatial domain. It uses modulo 256 addition (assuming
that eight-bit grayscale images are considered) to embed the hash
value of the original image for authentication. The embedding
formula is Iw=I+Wmod(256), in which I denotes the
original image,Iw the marked image, and W=W(H(I),k)
the watermark, where H(I) denotes the hash function operated
on the original image I, and k the secret key. Because of
using modulo 256 addition, the over/underflow is prevented and
the reversibility is achieved. Some annoying salt-and-pepper
noise, however, is generated owing to possible grayscale value
flipping over between 0 and 255 in either direction during the
modulo 256 addition.
The second reversible marking technique(Macq and Deweyend) was developed in the transform domain which is based on a lossless multiresolution transform and the idea of patchwork. It also uses modulo 256 addition. Another spatial domain technique was Fridrich algorithm that losslessly compresses some selected bit plane(s) to leave space for data embedding.
Because the necessary bookkeeping data are also embedded in the cover media as an overhead, the method is reversible. Since these techniques aimed at authentication,
the amount of hidden data were limited. The capacity of Vleeschouwer method, which is based on the idea of patchwork and modulo 256 addition, is also limited except that the hidden data exhibit some robustness against high quality JPEG compression. Since it uses modulo 256 addition, it also suffers from salt-and-pepper noise. As a result, the technique cannot be utilized in many applications.This observation was valid to all lossless data hiding algorithms that use modulo 256 addition to achieve reversibility.
The first reversible marking technique that is suitable for a large amount of data hiding was the Goljan’s algorithm This technique first segments an image into nonoverlapping blocks, and then introduces a discriminating function to classify these blocks into three groups: R(egular), S(ingular), and U(nusable). It further introduces a flipping operation, which can convert an R-block to an S-block and vice versa. A U-block remains intact after the flipping operation. By
assigning, say, binary 1 to an R-block and binary 0 to an S-block, all R- and S-blocks are scanned in a chosen sequential order, resulting in a biased (meaning that the binary numbers of 1 and 0 are not balanced) binary sequence. This biased binary sequence is losslessly compressed to leave space for data embedding and the compressed bit sequence is embedded into the cover media as an
overhead for later reconstruction of the original image. In data
embedding, the R- and S-blocks are scanned once again and the
flipping operation is applied whenever necessary to make the
changed R- and S-block sequence coincident with the to-be-embedded data followed by the overhead data mentioned above.
While it is novel and successful in reversible data hiding, the
payload is still not large enough for some applications. Specifically, the embedding capacity ranged from
3 to 41 kb for a 512 x 512 x 8 cover grayscale image when
the embedding amplitude is 4 (the estimated average PSNR
of the marked image versus the original image is 39 dB) .
Another problem with the method is that when the embedding
strength increases in order to increase the payload, the visual
quality of the marked image will drop severely due to annoying
artifacts.
To increase the payload dramatically, a new lossless
data hiding technique (Xuan algorithm) based on integer wavelet transform (IWT) (a second generation wavelet transform, which has avoided round-off errors) was developed. Because
of the superior decorrelation capability of wavelet transform,
the selected bit plane compression of IWT coefficients in
high frequency subbands creates more space for data hiding,
resulting in a two to five times payload as large as that in goljan’s method. Specifically, its payload ranges from 15 to 94 kb for a
512 x 512 x 8 grayscale image at the same (39 dB) PSNR
of the marked images compared with the original images. To
achieve reversible data hiding, a histogram modification is
applied in its pre-processing to prevent over/underflow. This
histogram modification causes, however, a relatively low PSNR
of the marked image versus the original image though there
are no annoying artifacts.
Another technique is the Celik algorithm. The main idea is that in the embedding phase, the host signal,S is quantized and the residual,r is obtained. Then the CALIC lossless image compression Algorithm is adopted, with the quantized values as side information, to efficiently compress the quantization residuals to create high capacity for the payload data . The compressed residual and the payload data are concatenated and embedded into the host signal via generalized-LSB modification method. The payload of this technique is from 15 to 143 kb for a 512 x 512 x 8 grayscale image while the PSNR is 38 dB. Even though the payload is high, the PSNR is still not high enough.
A new reversible data embedding
technique, which can embed a large amount of data (5–80 kb
for a 512 x 512 x 8 grayscale image) while keeping a very
high visual quality for all natural images has been introduced. The PSNR of the marked image versus the original image is higher than 48 dB. It utilizes the zero or the minimum point
of the histogram and slightly modifies the pixel
grayscale values to embed data. This technique can be applied to
virtually all types of images. It can be successfully
applied on different types of images, including some commonly
used images, medical images, texture images, aerial images, and
all of the 1096 images in CorelDRAW database. The computation
of the latest technique is quite simple and the execution
time is rather short. Although the latest data hiding
technique is applied to still images, it is also applicable to videos
which consist of a sequence of images.
WATERMARKING
Before the introduction of reversible data hiding technique, for military, legal and medical applications, the concerned people had to choose between an image that is been watermarked to establish its trustworthiness and one that isn’t watermarked but preserves all the original information, allowing it to be enlarged or enhanced to show detail. When information is embedded using the reversible data hiding method, authorized users can do both.
The watermarking is a form of digital rights management that uses data embedded into a recording to secure the music(image) by identifying the computer or device attempting to play it and determining whether a license to play the music(image) exists on that device. These watermark programs can also transmit information about the consumer- for example, credit card numbers-back to the content owner, such as a record company or website. Before the introduction of reversible data hiding techniques watermarking of the images were done.
Digital watermarking is a technique which allows an individual to add hidden copyright notices or other verification messages to digital audio, video or image signals and documents. Such hidden message is a group of bits describing information pertaining to the signal or to the author of the signal (name, place etc).While the addition of the hidden message to the signal does not restrict that signal’s use, it provides a mechanism to track the signal to the original owner.
A watermark can be classified into two subtypes – visible and invisible. Visible watermarks change the signal altogether such that the watermarked signal is totally different from the actual signal. eg:-adding an image as a watermark to another image. Invisible watermarks do not change the signal to a perceptually great extent that is there are only minor variations in the output signal. An example of an individual watermark is when some bits are added to an image modifying only its least significant bits.
Histogram
In an image processing context, the histogram of an image normally refers to a histogram of the pixel intensity values. This histogram is a graph showing the number of pixels in an image at each different intensity value found in that image. For an 8-bit gray scale image there are 256 different possible intensities and so the histogram will graphically display 256 numbers showing the distribution of pixels amongst those gray scale values. The image is scanned in a single pass and a running count of the number of pixels found at each intensity value is kept. Then it is used to construct a suitable histogram.
ALGORITHM
The latest algorithm having high data embedding capacity and high PSNR ratio can be illustrated using the
For a given grayscale image, say, the
512 x 8 , first its histogram is generated.
A. Illustration of Embedding Algorithm Using an Example
With One Zero Point and
1) In the histogram, first find a zero point, and then a
peak point. A zero point corresponds to the grayscale
value which no pixel in the given image assumes, e.g.,h(255)
as shown in Fig. 1.Apeak point corresponds to the
grayscale value which the maximum number of pixels in
the given image assumes, e.g.,h(154) as shown in fig:-1. For the sake of notational simplicity, only one zero point and one peak point are used in this example to illustrate
the principle of the algorithm. The objective of finding
the peak point is to increase the embedding capacity as
large as possible since in this algorithm, the number of bits that can be embedded into an image
equals to the number of pixels which are associated with
the peak point.
2) The whole image is scanned in a sequential order, say,
row-by-row, from top to bottom, or, column-by-column,
from left to right. The grayscale value of pixels between
155 (including 155) and 254 (including 254) is incremented
by “1.” This step is equivalent to shifting the range
of the histogram, [155 254], to the right-hand side by 1
unit, leaving the grayscale value 155 empty.
3) The whole image is scanned once again in the same sequential order. Once a pixel with grayscale value of 154
is encountered, we check the to-be-embedded data sequence.
If the corresponding to-be-embedded bit in the
sequence is binary “1,” the pixel value is incremented by
1. Otherwise, the pixel value remains intact. (Note that
this step may be included into Step 2, described above.
For illustration purposes, we choose to present the embedding
algorithm in these three steps.)
The above three steps complete the data embedding process. Now it can be observed that the data embedding capacity of this algorithm when only one pair of zero and peak points is used equals to the number of pixels that assume the grayscale value of the peak point as mentioned in Step 1.
B. Pseudocode Embedding Algorithm
The zero point defined above may not exist for some image histograms so the concept of minimum point is more
general. By minimum point, we mean such a grayscale value, ,
that a minimum number of pixels assume this value, i.e.,h(b)
is minimum. Accordingly, the peak point discussed above is referred to as maximum point.
1) Pseudocode Embedding AlgorithmWith One Pair of Maximum and Minimum Points:
For an M x N image, each pixel grayscale value
x € [0,255].
1) Generate its histogram H(x).
2) In the histogram H(x), find the maximum point h(a),a € [255]
and the minimum point zero h(b) b € [0,255] .
3) If the minimum point h(b) > 0, recode the coordinate (i,j)
of those pixels and the pixel grayscale value b as
overhead bookkeeping information (referred to as overhead
information for short). Then set h(b) = 0.
4) Without loss of generality, assume a <>
part of the histogram H(x) with x € (a,b) to the right
by 1 unit. This means that all the pixel grayscale values
(satisfying x € (a,b)) are added by 1.
5) Scan the image, once meet the pixel (whose grayscale
value is a ), check the to-be-embedded bit. If the to-be-embedded
bit is “1”, the pixel grayscale value is changed
to a+1 . If the bit is “0”, the pixel value remains a.
2) Actual Data Embedding Capacity (Pure Payload):
In this way, the actual data embedding capacity,C, is calculated as follows:
C=h(a)-O
where O denotes the amount of data used to represent the overhead information. It is also referred to as pure payload .Clearly, if the required payload is greater than the actual capacity,more pairs of maximum point and minimum point need
to be used. The embedding algorithm with multiple pairs of
maximum point and minimum point is as below
3) Pseudocode Embedding Algorithm With Multiple Pairs of Maximum and Minimum Points:
Without loss of generality, only a pseudocode embedding algorithm for the case of three pairs of maximum and minimum points is shown. It is straightforward to generate this code to handle the cases where any other number of multiple pairs of maximum and minimum points is used.
For an M x N image with pixel grayscale values x € [0,255].
1) Generate its histogram H(x) .
2) In the histogram H(x), find three minimum point
h(b1), h(b2),h(b3) . Without loss of generality, assume
three minimum points satisfy the following condition:
0
3) In the intervals of (0,b1) and (b3,255), find the maximum
point h(a1) ,h(a3), respectively, and assume a1 € (0,b1),a3 € (b3,255).
4) In the intervals (b1,b2) and (b2,b3), find the maximum
points in each interval. Assume they are h(a12),h(a21) ,
b1 <>
5) Find a point having a larger histogram value in each of the
following three maximum point pairs (h(a1),h(a12)),(h(a21),h(a23)), and (h(a32),h(a3)), respectively.
Without loss of generality, assume h(a1),h(a23),h(a3) are the three selected maximum points.
6) Then (h(a1),h(b1)),(h(a23),h(b2)),(h(a3),h(b3)) are
the three pairs of maximum and minimum points. For each
of these three pairs, Steps 3-5 of Pseudocode Embedding AlgorithmWith One Pair of Maximum and Minimum Points is applied That is, each of these three pairs is treated as a case
of one pair of maximum and minimum points.
C. Pseudocode Extraction Algorithm
Only the simple case of one pair of minimum point and maximum point is described since , the general cases of multiple pairs of maximum and minimum points can be decomposed as a few one pair cases. That is, the multiple pair case can be treated as the multiple repetition of the data extraction for one pair case. Assume the grayscale value of the maximum point and the minimum points are a and b, respectively.Without loss of generality, assume a <>
each pixel grayscale value x € [0,255] .
1) Scan the marked image in the same sequential order as
that used in the embedding procedure. If a pixel with its
grayscale value a+1 is encountered, a bit “1” is extracted.
If a pixel with its value a is encountered, a bit “0” is extracted.
2) Scan the image again, for any pixel whose grayscale value
x € (a,b), the pixel value is subtracted by 1.
3) If there is overhead bookkeeping information found in
the extracted data, set the pixel grayscale value (whose
coordinate (i,j)is saved in the overhead) as b.
In this way, the original image can be recovered without any distortion.
D. Embedding and Extraction Flow Charts
The latest reversible data hiding and extraction algorithms can be illustrated by the flow charts shown in
Figs. 4 and 5, respectively.
E. Lower Bound of the PSNR of a Marked Image Versus the Original Image
The lower bound of the PSNR of a marked image generated by the algorithm versus the original image is larger than 48 dB. It can be clearly observed from the embedding algorithm that the pixels whose grayscale values are between the minimum point and the maximum point will be either added or subtracted by 1 in the data embedding process. Therefore, in the worst case, the grayscale values of all pixels will be either incremented or decremented by 1, implying that the resultant mean square error (MSE) is at most equal to one, i.e., MSE = 1.
much higher than that of all reversible data hiding techniques.
F. Applicability
Where the maximum points and minimum points of a given image histogram are located, the right side, the left side or the middle, is not important for the applicability of the latest
reversible data hiding algorithm. Instead, the key issue is if the
histogram has maximum and minimum points, i.e., if the histogram
changes up-and-down enough. An extreme example in
which the this algorithm does not work is an image having
an exactly horizontal histogram. One of such a man-made image
G. Computational Complexity
The computational load of this algorithm is light
since it does not need to apply any transform such as discrete cosine transform (DCT), discrete wavelet transform (DWT), and
fast Fourier transform (FFT). All the processing is in the spatial
histogram, determining minimum and maximum (and possibly
subminimum and submaximum) points, scanning pixels, and
adding or subtracting pixel grayscale values by one in the spatial
domain. Hence, the execution time of the algorithm is rather
the data embedding. Hence, the computational complexity is O(3MN). Since the multiple pair case is just a multiple repetition
of the one pair case, the total computation complexity is O(3kMN)suppose there are k pairs.
COMPARITIVE STUDY
The latest reversible data hiding algorithm can be applied to many different types of images, including some commonly used images, medical images, texture images, aerial images, and all of the 1096 images in the CorelDRAW database. Overall comparison between the existing reversible marking techniques and the latest technique in terms of pure payload and the PSNR is shown in Table :-1. Comparison results on two typically different images, Lena and Baboon, which often lead to different data embedding performances, are shown in Table :-2. From these comparisons, it is observed that the latest technique has got the highest lower bound of the PSNR with a quite large data embedding capacity.
CONCLUSION
The latest reversible data hiding technique is able to embed about 5–80 kb into a 512 x 512 x 8 grayscale image while the PSNR of the marked image versus the
original image is above 48 dB. In addition, this algorithm can be applied to virtually all types of images. It can be applied to many frequently used images, medical images, texture images, aerial images, and all of the 1096 images in the CorelDRAW database. Furthermore, this algorithm is quite simple, and the execution time is rather short. Therefore, its overall performance is better than many existing reversible data hiding algorithms. This reversible data hiding technique can be deployed for a wide range of applications in the areas such as secure medical image data systems, and image authentication in the medical field and law enforcement, and the other fields where the rendering of the original images is required or desired.