DATA COMPRESSION
Text and video compression
WHAT IS DATA COMPRESSION:
“Reducing the amount of space
needed to store a piece of data”.
WHY DO WE NEED TO COMPRESS
DATA :
·
In the past we needed to keep data small because
of storage limitations.
·
Today, computer storage is relatively cheap
·
but now
we have an even more pressing reason to shrink our data: the need to share it
with others.
The Web and its underlying
networks have inherent bandwidth.
inherent
bandwidth :
restrictions that define the
maximum number of bits or bytes that can be transmitted from one place to
another in a fixed amount of time.
compression ratio:
·
The compression ratio gives an indication of how
much compression occurs. The compression ratio is the size of the compressed
data divided by the size of the original data.
·
The compression ratio gives an indication of how
much compression occurs. The compression ratio is the size of the compressed
data divided by the size of the original data.
Lossless
Compression:
A data compression technique can be
lossless, which means the data can be retrieved without losing any of the
original information.
Lossy
Compression:
Or it can be lossy, in which case
some information is lost in the process of compaction. Although we never want
to lose information, in some cases the loss is acceptable.
Text compression
·
Text Compression Alphabetic information (text)
is a fundamental type of data.
·
Therefore, it is important that we find ways to
store text efficiently and transmit text efficiently between one computer and
another.
·
The following
sections examine three types of text compression:
o
keyword encoding
o
run-length encoding
o
Huffman
encoding
Keyword
Encoding Consider how often you use words such as “the,”
“and,” “which,” “that,” and “what.” If these words took up less space (that is,
had fewer characters), our documents would shrink in size. Even though the
savings on each word would be small, they are used so often in a typical
document that the combined savings would add up quickly. One fairly
straightforward method of text compression is called keyword encoding, in which
frequently used words are replaced with a single character. To decompress the
document, you reverse the process: replace the single characters with the
appropriate full word.
Run-Length
Encoding In some situations, a single character may be
repeated over and over again in a long sequence. This type of repetition
doesn’t generally take place in English text, but often occurs in large data
streams, such as DNA sequences. A text compression technique called run-length
encoding capitalizes on these situations. Run-length encoding is sometimes
called recurrence coding. In run-length encoding, a sequence of repeated
characters is replaced by a flag character, followed by the repeated character,
followed by a single digit that indicates how many times the character is
repeated. For example
consider the
following string of seven repeated ‘A’ characters:
AAAAAAA
If we use the ‘*’ character as our flag, this string would
be encoded as:
*A7
The flag character is the indication that
Huffman
Encoding Another text compression technique, called Huffman
encoding, is named after its creator, Dr. David Huffman. Why should the
character “X”, which is seldom used in text, take up the same number of bits as
the blank, which is used very frequently? Huffman codes address this question
by using variable-length bit strings to represent each character. That is, a
few characters may be represented by five bits, and another few by six bits,
and yet another few by seven bits, and so forth. This approach is contrary to
the idea of a character set, in which each character is represented by a
fixed-length bit string (such as 8 or 16). The idea behind this approach is
that if we use only a few bits to represent characters that appear often and
reserve longer bit strings for characters that don’t appear often, the overall
size of the document being represented is small.
VIDEO COMPRESSION:
Video information is one of the most complex types of
information to capture and compress to get a result that makes sense to the
human eye. Video clips contain the equivalent of many still images, each of
which must be compressed
Video
Codecs Codec stand for COmpressor/DECompressor.
A video codec refers to the methods used to shrink the size of a movie to allow
it to be played on a computer or over a network. Almost all video codecs use
lossy compression to minimize the huge amounts of data associated with video.
The goal therefore is not to lose information that affects the viewer’s senses.
Temporal compression:
Temporal compression looks for differences between
consecutive frames. If most of an image in two frames hasn’t changed, why
should we waste space to duplicate all of the similar information? A keyframe is chosen as the basis to
compare the differences, and its entire image is stored. For consecutive
images, only the changes (called delta
frames) are stored. Temporal compression is effective in video that changes
little from frame to frame, such as a scene that contains little movement.
Spatial compression:
Spatial compression removes redundant information within a
frame. This problem is essentially the same as that faced when compressing
still images. Spatial video compression often groups pixels into blocks
(rectangular areas) that have the same color, such as a portion of a clear blue
sky. Instead of storing each pixel, the color and the coordinates of the area
are stored instead. This idea is similar to run-length encoding described
earlier in this chapter.
No comments:
Post a Comment