Reversible Data Hiding

ABSTRACT

The Reversible Data Hiding technique is a new technique that allows hiding of information within a digital image and also allows the extraction of the hidden data without distortion of the image. The new technique solves several problems faced by digital image users, particularly in sensitive military, legal and medical applications. There are several algorithms based on this technique. Of all these algorithms the latest one is the most efficient and it utilizes the zero or the minimum points of the histogram of an image and slightly modifies the pixel grayscale values to embed data into the image.

The latest algorithm can embed more data than many of the existing reversible data hiding algorithms. The latest algorithm has got less computational complexity and short execution time

INTRODUCTION

DATA HIDING is the process to hide data

(representing some information) into cover media. That

is, the data hiding process links two sets of data, a set of the embedded data and another set of the cover media data. The relationship between these two sets of data characterizes different applications. For instance, in covert communications, the hidden data may often be irrelevant to the cover media. In authentication, however, the embedded data are closely related to the cover media. In these two types of applications, invisibility of hidden data is an important requirement. In most cases of data hiding, the cover media will experience some distortion due to data hiding and cannot be inverted back to the original

media. That is, some permanent distortion has occurred to the cover media even after the hidden data have been extracted out.

In some applications, such as medical diagnosis and law enforcement, it is critical to reverse the marked media back to the original cover media after the hidden data are retrieved for some legal considerations. In other applications, such as remote

sensing and high-energy particle physical experimental investigation, it is also desired that the original cover media can be

recovered because of the required high-precision nature. The

marking techniques satisfying this requirement are referred to

as reversible, lossless, distortion-free, or invertible data hiding

techniques. Reversible data hiding facilitates immense possibility

of applications to link two sets of data in such a way that

the cover media can be losslessly recovered after the hidden data

have been extracted out, thus providing an additional avenue of

handling two different sets of data.

Obviously, most of the existing data hiding techniques are not reversible. For instance, the widely utilized spread-spectrum based data hiding methods are not invertible owing to truncation (for the purpose to prevent over/underflow) error and round-off error. The well-known least significant bit plane (LSB) based schemes are not lossless owing to bit replacement without “memory.” Another category of data hiding techniques, quantization-index-modulation (QIM) based schemes, are not distortion-free owing

to quantization error.

Some of reversible marking techniques are:-

Honsinger’s Algorithm is carried out

in the spatial domain. It uses modulo 256 addition (assuming

that eight-bit grayscale images are considered) to embed the hash

value of the original image for authentication. The embedding

formula is Iw=I+Wmod(256), in which I denotes the

original image,Iw the marked image, and W=W(H(I),k)

the watermark, where H(I) denotes the hash function operated

on the original image I, and k the secret key. Because of

using modulo 256 addition, the over/underflow is prevented and

the reversibility is achieved. Some annoying salt-and-pepper

noise, however, is generated owing to possible grayscale value

flipping over between 0 and 255 in either direction during the

modulo 256 addition.

The second reversible marking technique(Macq and Deweyend) was developed in the transform domain which is based on a lossless multiresolution transform and the idea of patchwork. It also uses modulo 256 addition. Another spatial domain technique was Fridrich algorithm that losslessly compresses some selected bit plane(s) to leave space for data embedding.

Because the necessary bookkeeping data are also embedded in the cover media as an overhead, the method is reversible. Since these techniques aimed at authentication,

the amount of hidden data were limited. The capacity of Vleeschouwer method, which is based on the idea of patchwork and modulo 256 addition, is also limited except that the hidden data exhibit some robustness against high quality JPEG compression. Since it uses modulo 256 addition, it also suffers from salt-and-pepper noise. As a result, the technique cannot be utilized in many applications.This observation was valid to all lossless data hiding algorithms that use modulo 256 addition to achieve reversibility.

The first reversible marking technique that is suitable for a large amount of data hiding was the Goljan’s algorithm This technique first segments an image into nonoverlapping blocks, and then introduces a discriminating function to classify these blocks into three groups: R(egular), S(ingular), and U(nusable). It further introduces a flipping operation, which can convert an R-block to an S-block and vice versa. A U-block remains intact after the flipping operation. By

assigning, say, binary 1 to an R-block and binary 0 to an S-block, all R- and S-blocks are scanned in a chosen sequential order, resulting in a biased (meaning that the binary numbers of 1 and 0 are not balanced) binary sequence. This biased binary sequence is losslessly compressed to leave space for data embedding and the compressed bit sequence is embedded into the cover media as an

overhead for later reconstruction of the original image. In data

embedding, the R- and S-blocks are scanned once again and the

flipping operation is applied whenever necessary to make the

changed R- and S-block sequence coincident with the to-be-embedded data followed by the overhead data mentioned above.

While it is novel and successful in reversible data hiding, the

payload is still not large enough for some applications. Specifically, the embedding capacity ranged from

3 to 41 kb for a 512 x 512 x 8 cover grayscale image when

the embedding amplitude is 4 (the estimated average PSNR

of the marked image versus the original image is 39 dB) .

Another problem with the method is that when the embedding

strength increases in order to increase the payload, the visual

quality of the marked image will drop severely due to annoying

artifacts.

To increase the payload dramatically, a new lossless

data hiding technique (Xuan algorithm) based on integer wavelet transform (IWT) (a second generation wavelet transform, which has avoided round-off errors) was developed. Because

of the superior decorrelation capability of wavelet transform,

the selected bit plane compression of IWT coefficients in

high frequency subbands creates more space for data hiding,

resulting in a two to five times payload as large as that in goljan’s method. Specifically, its payload ranges from 15 to 94 kb for a

512 x 512 x 8 grayscale image at the same (39 dB) PSNR

of the marked images compared with the original images. To

achieve reversible data hiding, a histogram modification is

applied in its pre-processing to prevent over/underflow. This

histogram modification causes, however, a relatively low PSNR

of the marked image versus the original image though there

are no annoying artifacts.

Another technique is the Celik algorithm. The main idea is that in the embedding phase, the host signal,S is quantized and the residual,r is obtained. Then the CALIC lossless image compression Algorithm is adopted, with the quantized values as side information, to efficiently compress the quantization residuals to create high capacity for the payload data . The compressed residual and the payload data are concatenated and embedded into the host signal via generalized-LSB modification method. The payload of this technique is from 15 to 143 kb for a 512 x 512 x 8 grayscale image while the PSNR is 38 dB. Even though the payload is high, the PSNR is still not high enough.

A new reversible data embedding

technique, which can embed a large amount of data (5–80 kb

for a 512 x 512 x 8 grayscale image) while keeping a very

high visual quality for all natural images has been introduced. The PSNR of the marked image versus the original image is higher than 48 dB. It utilizes the zero or the minimum point

of the histogram and slightly modifies the pixel

grayscale values to embed data. This technique can be applied to

virtually all types of images. It can be successfully

applied on different types of images, including some commonly

used images, medical images, texture images, aerial images, and

all of the 1096 images in CorelDRAW database. The computation

of the latest technique is quite simple and the execution

time is rather short. Although the latest data hiding

technique is applied to still images, it is also applicable to videos

which consist of a sequence of images.

WATERMARKING

Before the introduction of reversible data hiding technique, for military, legal and medical applications, the concerned people had to choose between an image that is been watermarked to establish its trustworthiness and one that isn’t watermarked but preserves all the original information, allowing it to be enlarged or enhanced to show detail. When information is embedded using the reversible data hiding method, authorized users can do both.

The watermarking is a form of digital rights management that uses data embedded into a recording to secure the music(image) by identifying the computer or device attempting to play it and determining whether a license to play the music(image) exists on that device. These watermark programs can also transmit information about the consumer- for example, credit card numbers-back to the content owner, such as a record company or website. Before the introduction of reversible data hiding techniques watermarking of the images were done.

Digital watermarking is a technique which allows an individual to add hidden copyright notices or other verification messages to digital audio, video or image signals and documents. Such hidden message is a group of bits describing information pertaining to the signal or to the author of the signal (name, place etc).While the addition of the hidden message to the signal does not restrict that signal’s use, it provides a mechanism to track the signal to the original owner.

A watermark can be classified into two subtypes – visible and invisible. Visible watermarks change the signal altogether such that the watermarked signal is totally different from the actual signal. eg:-adding an image as a watermark to another image. Invisible watermarks do not change the signal to a perceptually great extent that is there are only minor variations in the output signal. An example of an individual watermark is when some bits are added to an image modifying only its least significant bits.

Histogram

In an image processing context, the histogram of an image normally refers to a histogram of the pixel intensity values. This histogram is a graph showing the number of pixels in an image at each different intensity value found in that image. For an 8-bit gray scale image there are 256 different possible intensities and so the histogram will graphically display 256 numbers showing the distribution of pixels amongst those gray scale values. The image is scanned in a single pass and a running count of the number of pixels found at each intensity value is kept. Then it is used to construct a suitable histogram.

ALGORITHM

The latest algorithm having high data embedding capacity and high PSNR ratio can be illustrated using the Lena image as an example. The data embedding and extracting of this algorithm is represented in terms of pseudocode.

For a given grayscale image, say, the Lena image 512 x

512 x 8 , first its histogram is generated.

A. Illustration of Embedding Algorithm Using an Example

With One Zero Point and One Peak Point

1) In the histogram, first find a zero point, and then a

peak point. A zero point corresponds to the grayscale

value which no pixel in the given image assumes, e.g.,h(255)

as shown in Fig. 1.Apeak point corresponds to the

grayscale value which the maximum number of pixels in

the given image assumes, e.g.,h(154) as shown in fig:-1. For the sake of notational simplicity, only one zero point and one peak point are used in this example to illustrate

the principle of the algorithm. The objective of finding

the peak point is to increase the embedding capacity as

large as possible since in this algorithm, the number of bits that can be embedded into an image

equals to the number of pixels which are associated with

the peak point.

2) The whole image is scanned in a sequential order, say,

row-by-row, from top to bottom, or, column-by-column,

from left to right. The grayscale value of pixels between

155 (including 155) and 254 (including 254) is incremented

by “1.” This step is equivalent to shifting the range

of the histogram, [155 254], to the right-hand side by 1

unit, leaving the grayscale value 155 empty.

3) The whole image is scanned once again in the same sequential order. Once a pixel with grayscale value of 154

is encountered, we check the to-be-embedded data sequence.

If the corresponding to-be-embedded bit in the

sequence is binary “1,” the pixel value is incremented by

1. Otherwise, the pixel value remains intact. (Note that

this step may be included into Step 2, described above.

For illustration purposes, we choose to present the embedding

algorithm in these three steps.)

The above three steps complete the data embedding process. Now it can be observed that the data embedding capacity of this algorithm when only one pair of zero and peak points is used equals to the number of pixels that assume the grayscale value of the peak point as mentioned in Step 1.

B. Pseudocode Embedding Algorithm

The zero point defined above may not exist for some image histograms so the concept of minimum point is more

general. By minimum point, we mean such a grayscale value, ,

that a minimum number of pixels assume this value, i.e.,h(b)

is minimum. Accordingly, the peak point discussed above is referred to as maximum point.

1) Pseudocode Embedding AlgorithmWith One Pair of Maximum and Minimum Points:

For an M x N image, each pixel grayscale value

x € [0,255].

1) Generate its histogram H(x).

2) In the histogram H(x), find the maximum point h(a),a € [255]

and the minimum point zero h(b) b € [0,255] .

3) If the minimum point h(b) > 0, recode the coordinate (i,j)

of those pixels and the pixel grayscale value b as

overhead bookkeeping information (referred to as overhead

information for short). Then set h(b) = 0.

4) Without loss of generality, assume a <>

part of the histogram H(x) with x € (a,b) to the right

by 1 unit. This means that all the pixel grayscale values

(satisfying x € (a,b)) are added by 1.

5) Scan the image, once meet the pixel (whose grayscale

value is a ), check the to-be-embedded bit. If the to-be-embedded

bit is “1”, the pixel grayscale value is changed

to a+1 . If the bit is “0”, the pixel value remains a.

2) Actual Data Embedding Capacity (Pure Payload):

In this way, the actual data embedding capacity,C, is calculated as follows:

C=h(a)-O

where O denotes the amount of data used to represent the overhead information. It is also referred to as pure payload .Clearly, if the required payload is greater than the actual capacity,more pairs of maximum point and minimum point need

to be used. The embedding algorithm with multiple pairs of

maximum point and minimum point is as below

3) Pseudocode Embedding Algorithm With Multiple Pairs of Maximum and Minimum Points:

Without loss of generality, only a pseudocode embedding algorithm for the case of three pairs of maximum and minimum points is shown. It is straightforward to generate this code to handle the cases where any other number of multiple pairs of maximum and minimum points is used.

For an M x N image with pixel grayscale values x € [0,255].

1) Generate its histogram H(x) .

2) In the histogram H(x), find three minimum point

h(b1), h(b2),h(b3) . Without loss of generality, assume

three minimum points satisfy the following condition:

0<255.

3) In the intervals of (0,b1) and (b3,255), find the maximum

point h(a1) ,h(a3), respectively, and assume a1 € (0,b1),a3 € (b3,255).

4) In the intervals (b1,b2) and (b2,b3), find the maximum

points in each interval. Assume they are h(a12),h(a21) ,

b1 <>

5) Find a point having a larger histogram value in each of the

following three maximum point pairs (h(a1),h(a12)),(h(a21),h(a23)), and (h(a32),h(a3)), respectively.

Without loss of generality, assume h(a1),h(a23),h(a3) are the three selected maximum points.

6) Then (h(a1),h(b1)),(h(a23),h(b2)),(h(a3),h(b3)) are

the three pairs of maximum and minimum points. For each

of these three pairs, Steps 3-5 of Pseudocode Embedding AlgorithmWith One Pair of Maximum and Minimum Points is applied That is, each of these three pairs is treated as a case

of one pair of maximum and minimum points.

C. Pseudocode Extraction Algorithm

Only the simple case of one pair of minimum point and maximum point is described since , the general cases of multiple pairs of maximum and minimum points can be decomposed as a few one pair cases. That is, the multiple pair case can be treated as the multiple repetition of the data extraction for one pair case. Assume the grayscale value of the maximum point and the minimum points are a and b, respectively.Without loss of generality, assume a <>

each pixel grayscale value x € [0,255] .

1) Scan the marked image in the same sequential order as

that used in the embedding procedure. If a pixel with its

grayscale value a+1 is encountered, a bit “1” is extracted.

If a pixel with its value a is encountered, a bit “0” is extracted.

2) Scan the image again, for any pixel whose grayscale value

x € (a,b), the pixel value is subtracted by 1.

3) If there is overhead bookkeeping information found in

the extracted data, set the pixel grayscale value (whose

coordinate (i,j)is saved in the overhead) as b.

In this way, the original image can be recovered without any distortion.

D. Embedding and Extraction Flow Charts

The latest reversible data hiding and extraction algorithms can be illustrated by the flow charts shown in

Figs. 4 and 5, respectively.

E. Lower Bound of the PSNR of a Marked Image Versus the Original Image

The lower bound of the PSNR of a marked image generated by the algorithm versus the original image is larger than 48 dB. It can be clearly observed from the embedding algorithm that the pixels whose grayscale values are between the minimum point and the maximum point will be either added or subtracted by 1 in the data embedding process. Therefore, in the worst case, the grayscale values of all pixels will be either incremented or decremented by 1, implying that the resultant mean square error (MSE) is at most equal to one, i.e., MSE = 1.

This resultant lower bound of PSNR is

much higher than that of all reversible data hiding techniques.

F. Applicability

Where the maximum points and minimum points of a given image histogram are located, the right side, the left side or the middle, is not important for the applicability of the latest

reversible data hiding algorithm. Instead, the key issue is if the

histogram has maximum and minimum points, i.e., if the histogram

changes up-and-down enough. An extreme example in

which the this algorithm does not work is an image having

an exactly horizontal histogram. One of such a man-made image

G. Computational Complexity

The computational load of this algorithm is light

since it does not need to apply any transform such as discrete cosine transform (DCT), discrete wavelet transform (DWT), and

fast Fourier transform (FFT). All the processing is in the spatial domain. The required processing mainly lies on generating

histogram, determining minimum and maximum (and possibly

subminimum and submaximum) points, scanning pixels, and

adding or subtracting pixel grayscale values by one in the spatial

domain. Hence, the execution time of the algorithm is rather short. Assume the image height is M and the width is N. For one pair of minimum and maximum points, there is a need to scan the whole image three times in the embedding as in

the data embedding. Hence, the computational complexity is O(3MN). Since the multiple pair case is just a multiple repetition

of the one pair case, the total computation complexity is O(3kMN)suppose there are k pairs.

COMPARITIVE STUDY

The latest reversible data hiding algorithm can be applied to many different types of images, including some commonly used images, medical images, texture images, aerial images, and all of the 1096 images in the CorelDRAW database. Overall comparison between the existing reversible marking techniques and the latest technique in terms of pure payload and the PSNR is shown in Table :-1. Comparison results on two typically different images, Lena and Baboon, which often lead to different data embedding performances, are shown in Table :-2. From these comparisons, it is observed that the latest technique has got the highest lower bound of the PSNR with a quite large data embedding capacity.

CONCLUSION

The latest reversible data hiding technique is able to embed about 5–80 kb into a 512 x 512 x 8 grayscale image while the PSNR of the marked image versus the

original image is above 48 dB. In addition, this algorithm can be applied to virtually all types of images. It can be applied to many frequently used images, medical images, texture images, aerial images, and all of the 1096 images in the CorelDRAW database. Furthermore, this algorithm is quite simple, and the execution time is rather short. Therefore, its overall performance is better than many existing reversible data hiding algorithms. This reversible data hiding technique can be deployed for a wide range of applications in the areas such as secure medical image data systems, and image authentication in the medical field and law enforcement, and the other fields where the rendering of the original images is required or desired.