Sunday 20 December 2015

DATA COMPRESSION Text and video compression

DATA COMPRESSION
Text and video compression

WHAT IS DATA COMPRESSION:

“Reducing the amount of space needed to store a piece of data”.
WHY DO WE NEED TO COMPRESS DATA :
·         In the past we needed to keep data small because of storage limitations.
·         Today, computer storage is relatively cheap
·          but now we have an even more pressing reason to shrink our data: the need to share it with others.
The Web and its underlying networks have inherent bandwidth.
inherent bandwidth  :
restrictions that define the maximum number of bits or bytes that can be transmitted from one place to another in a fixed amount of time.
compression ratio:
·         The compression ratio gives an indication of how much compression occurs. The compression ratio is the size of the compressed data divided by the size of the original data.
·         The compression ratio gives an indication of how much compression occurs. The compression ratio is the size of the compressed data divided by the size of the original data.
Lossless Compression:
A data compression technique can be lossless, which means the data can be retrieved without losing any of the original information.
Lossy Compression:
Or it can be lossy, in which case some information is lost in the process of compaction. Although we never want to lose information, in some cases the loss is acceptable.
Text compression
                    
·         Text Compression Alphabetic information (text) is a fundamental type of data.
·         Therefore, it is important that we find ways to store text efficiently and transmit text efficiently between one computer and another.
·          The following sections examine three types of text compression:
o   keyword encoding
o    run-length encoding
o    Huffman encoding
Keyword Encoding Consider how often you use words such as “the,” “and,” “which,” “that,” and “what.” If these words took up less space (that is, had fewer characters), our documents would shrink in size. Even though the savings on each word would be small, they are used so often in a typical document that the combined savings would add up quickly. One fairly straightforward method of text compression is called keyword encoding, in which frequently used words are replaced with a single character. To decompress the document, you reverse the process: replace the single characters with the appropriate full word.
Run-Length Encoding In some situations, a single character may be repeated over and over again in a long sequence. This type of repetition doesn’t generally take place in English text, but often occurs in large data streams, such as DNA sequences. A text compression technique called run-length encoding capitalizes on these situations. Run-length encoding is sometimes called recurrence coding. In run-length encoding, a sequence of repeated characters is replaced by a flag character, followed by the repeated character, followed by a single digit that indicates how many times the character is repeated. For example
 consider the following string of seven repeated ‘A’ characters:
AAAAAAA
If we use the ‘*’ character as our flag, this string would be encoded as:
*A7
The flag character is the indication that
Huffman Encoding Another text compression technique, called Huffman encoding, is named after its creator, Dr. David Huffman. Why should the character “X”, which is seldom used in text, take up the same number of bits as the blank, which is used very frequently? Huffman codes address this question by using variable-length bit strings to represent each character. That is, a few characters may be represented by five bits, and another few by six bits, and yet another few by seven bits, and so forth. This approach is contrary to the idea of a character set, in which each character is represented by a fixed-length bit string (such as 8 or 16). The idea behind this approach is that if we use only a few bits to represent characters that appear often and reserve longer bit strings for characters that don’t appear often, the overall size of the document being represented is small.
VIDEO COMPRESSION:
Video information is one of the most complex types of information to capture and compress to get a result that makes sense to the human eye. Video clips contain the equivalent of many still images, each of which must be compressed

Video Codecs Codec stand for COmpressor/DECompressor. A video codec refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network. Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video. The goal therefore is not to lose information that affects the viewer’s senses.
Temporal compression:
Temporal compression looks for differences between consecutive frames. If most of an image in two frames hasn’t changed, why should we waste space to duplicate all of the similar information? A keyframe is chosen as the basis to compare the differences, and its entire image is stored. For consecutive images, only the changes (called delta frames) are stored. Temporal compression is effective in video that changes little from frame to frame, such as a scene that contains little movement.
Spatial compression:
Spatial compression removes redundant information within a frame. This problem is essentially the same as that faced when compressing still images. Spatial video compression often groups pixels into blocks (rectangular areas) that have the same color, such as a portion of a clear blue sky. Instead of storing each pixel, the color and the coordinates of the area are stored instead. This idea is similar to run-length encoding described earlier in this chapter.



No comments:

Post a Comment