Steganography means hiding a message inside a harmless document. It does not seem new, but when I am talking about hiding a message inside in HTML code then?
Let’s talk about HTML steganography technique, HTML is simplified version of SGML that offers information in the network using HTTP. HTML documents were first introduced in 1989 by “Tim Burners-Lee”.
Steganography can be broadly classified into three types on the basis of the type media used: text steganography, image steganography, and audio steganography. A steganography technique that uses text as the cover media is called a text steganography. It is one of the most difficult types of steganography technique. This is because text files have a very small amount of redundant data to hide a secret message.
Research on text steganography techniques.
There has been tremendous research in the field of text steganography. Some of the text steganography works are listed below.
Moerland proposed a text steganography technique by using specific characters from the words. In this method, some specific characters from certain words are selected and are used to hide the secret information. For e.g. the first character of first word of each paragraph can be used to hide a secret message one character at a time such that by placing these characters side by side, we get the whole message.
Moerland also discussed the text steganography approach by using punctuation marks. The idea behind this approach is to utilize the presence of punctuation marks like a comma (,), semi colon (:), quotes („, “) etc. in the text for encoding a secret message. The use of punctuation marks is quite common in the normal english text and hence it becomes difficult for the intruder to recognize the presence of secret message in the text document. This accounts for the security of the technique.
Low, Maxemchuk, Brassil, Gorman, and Alattar proposed a text steganography technique by using line shifting method. In this method, the lines of the text are shifted to some degrees say 1/300 inch up or down and then the information is hidden by creating a hidden unique shape of the text.
There are lot’s of research has been done in text steganography.
Now, We are moving to HTML steganography technique.
The proposed to use HTML tags to hide information in the html tags and their attributes. It is based on the ordering of attributes in the HTML there will be no impact on document or file.
The key component of the technique is the generation of the key file. The key file is essentially a collection of key combinations stored in the form of rows and columns. These combinations are generated by thorough scanning of the HTML documents. The attributes combinations used in the HTML tags are used to generate a key file.
The key file contains two types of attributes, corresponding to two columns:
The primary attribute is in the first column and the second attribute is in the second
column. These attribute combinations aids in the hiding process.
- Basic Procedure of HTML
- The hiding process scans each attribute of each html tag, and checks to see whether that attribute exists in the primary attribute field of the key file.
- If yes, its corresponding secondary attribute is searched in the corresponding html tag. If found, then this combination of the attribute is used to hide a bit. If not, skip this attribute.
- The hiding of a bit is determined by the order of the attributes in the attribute combination. If primary attribute is followed by a secondary attribute, it can hide a bit 1; else it can hide a bit 0.
- The extractor program extracts the message from stego text by first identifying the attribute combinations that hides a bit and then finding the bit corresponding to the order of those attributes.
- If primary attribute is followed by secondary attribute, a bit 1 is detected, else a 0 is detected.