Kompresija podataka — разлика између измена

С Википедије, слободне енциклопедије
Садржај обрисан Садржај додат
.
(нема разлике)

Верзија на датум 11. март 2019. у 04:05

U obradi signala, kompresija podataka, kodiranju izvora,[1] ili redukcija bitne stope obuhvataju kodiranje informacija koristeći manji broj bitova nego u originoj reprezentaciji.[2] Kompresija može da bude bilo sa gubicima ili bez guvitaka. Kompresija bez gubitka redukuje bitove putem identifikacije i eliminacije statističke izlišnosti. Informacije se ne gube pri kopresiji bez gubitaka. Kompresija sa gubicima redukuje bitove uklanjanjem nepotrebnih ili manje važnih informacija.[3]

Proces redukovanja veličine datoteke sa podacima se obično naziva kompresijom podataka. U kontekstu prenosa podataka, toj proces se naziva kodiranjem izvora; kodiranje izvršeno na izvoru podataka pre njihovog skladištenja ili prenosa.[4] Kodiranje izvora ne treba mešati sa kodiranjem kanala, za detekciju i korekciju greški, ili kodiranjem linije, sredstva za mapiranje podataka na signal.

Kompresija je korisna je se njom redukuju resursi neophodni za skladištenje i prenos podataka. Computational resources are consumed in the compression process and, usually, in the reversal of the process (decompression). Data compression is subject to a space–time complexity trade-off. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involves trade-offs among various factors, including the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources required to compress and decompress the data.[5][6]

Kompresija bez gubitka

Algoritmi kompresije bez gubitka obično iskorišćavaju statistical redundancy to represent data without losing any information, so that the process is reversible. Lossless compression is possible because most real-world data exhibits statistical redundancy. For example, an image may have areas of color that do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be encoded as "279 red pixels". This is a basic example of run-length encoding; there are many schemes to reduce file size by eliminating redundancy.

Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage.[7] DEFLATE is a variation on LZ optimized for decompression speed and compression ratio, but compression can be slow. In the mid-1980s, following work by Terry Welch, the Lempel–Ziv–Welch (LZW) algorithm rapidly became the method of choice for most general-purpose compression systems. LZW is used in GIF images, programs such as PKZIP, and hardware devices such as modems.[8] LZ methods use a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded. Grammar-based codes like this can compress highly repetitive input extremely effectively, for instance, a biological data collection of the same or closely related species, a huge versioned document collection, internet archival, etc. The basic task of grammar-based codes is constructing a context-free grammar deriving a single string. Other practical grammar compression algorithms include Sequitur and Re-Pair.

The strongest modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling.[9] In a further refinement of the direct use of probabilistic modelling, statistical estimates can be coupled to an algorithm called arithmetic coding. Arithmetic coding is a more modern coding technique that uses the mathematical calculations of a finite-state machine to produce a string of encoded bits from a series of input data symbols. It can achieve superior compression compared to other techniques such as the better-known Huffman algorithm. It uses an internal memory state to avoid the need to perform a one-to-one mapping of individual input symbols to distinct representations that use an integer number of bits, and it clears out the internal memory only after encoding the entire string of data symbols. Arithmetic coding applies especially well to adaptive data compression tasks where the statistics vary and are context-dependent, as it can be easily coupled with an adaptive model of the probability distribution of the input data. An early example of the use of arithmetic coding was in an optional (but not widely used) feature of the JPEG image coding standard.[10] It has since been applied in various other designs including H.263, H.264/MPEG-4 AVC and HEVC for video coding.[11]

Kompresija sa gubicima

U kasnim 1980-tim, digital images became more common, and standards for lossless image compression emerged. In the early 1990s, lossy compression methods began to be widely used.[8] In these schemes, some loss of information is accepted as dropping nonessential detail can save storage space. There is a corresponding trade-off between preserving information and reducing size. Lossy data compression schemes are designed by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to the variations in color. JPEG image compression works in part by rounding off nonessential bits of information.[12] A number of popular compression formats exploit these perceptual differences, including psychoacoustics for sound, and psychovisuals for images and video.

Lossy image compression can be used in digital cameras, to increase storage capacities with minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 video coding format for video compression.

In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the audio signal. Compression of human speech is often performed with even more specialized techniques; speech coding, or voice coding, is sometimes distinguished as a separate discipline from audio compression. Different audio and speech compression standards are listed under audio coding formats. Voice compression is used in internet telephony, for example, audio compression is used for CD ripping and is decoded by the audio players.[9]

Teorija

Teoretsku zaleđinu kompresije pruža teorija informacije (koja je blisko srodna sa algorithmic information theory) for lossless compression and rate–distortion theory for lossy compression. These areas of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Coding theory is also related to this. The idea of data compression is also deeply connected with statistical inference.[13]

Mašinsko učenje

There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution) while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as a justification for using data compression as a benchmark for "general intelligence."[14][15][16]

Reference

  1. ^ Wade, Graham (1994). Signal coding and processing (2 изд.). Cambridge University Press. стр. 34. ISBN 978-0-521-42336-6. Приступљено 2011-12-22. „The broad objective of source coding is to exploit or remove 'inefficient' redundancy in the PCM source and thereby achieve a reduction in the overall source rate R. 
  2. ^ Mahdi, O.A.; Mohammed, M.A.; Mohamed, A.J. (новембар 2012). „Implementing a Novel Approach an Convert Audio Compression to Text Coding via Hybrid Technique” (PDF). International Journal of Computer Science Issues. 9 (6, No. 3): 53—59. Приступљено 6. 3. 2013. 
  3. ^ Pujar, J.H.; Kadlaskar, L.M. (мај 2010). „A New Lossless Method of Image Compression and Decompression Using Huffman Coding Techniques” (PDF). Journal of Theoretical and Applied Information Technology. 15 (1): 18—23. 
  4. ^ Salomon, David (2008). A Concise Introduction to Data Compression. Berlin: Springer. ISBN 9781848000728. 
  5. ^ S. Mittal; J. Vetter (2015), „A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems”, IEEE Transactions on Parallel and Distributed Systems, IEEE 
  6. ^ Tank, M.K. (2011). Implementation of Limpel-Ziv algorithm for lossless compression using VHDL. Thinkquest 2010: Proceedings of the First International Conference on Contours of Computing Technology. Berlin: Springer. стр. 275—283. 
  7. ^ Navqi, Saud; Naqvi, R.; Riaz, R.A.; Siddiqui, F. (април 2011). „Optimized RTL design and implementation of LZW algorithm for high bandwidth applications” (PDF). Electrical Review. 2011 (4): 279—285. 
  8. ^ а б Wolfram, Stephen (2002). A New Kind of Science. Wolfram Media, Inc. стр. 1069. ISBN 1-57955-008-8. 
  9. ^ а б Mahmud, Salauddin (март 2012). „An Improved Data Compression Method for General Data” (PDF). International Journal of Scientific & Engineering Research. 3 (3): 2. Приступљено 6. 3. 2013. 
  10. ^ Lane, Tom. „JPEG Image Compression FAQ, Part 1”. Internet FAQ Archives. Independent JPEG Group. Приступљено 6. 3. 2013. 
  11. ^ G. J. Sullivan; J.-R. Ohm; W.-J. Han; T. Wiegand (децембар 2012). „Overview of the High Efficiency Video Coding (HEVC) Standard” (PDF). IEEE Transactions on Circuits and Systems for Video Technology. IEEE. 22 (12). Приступљено 2017-08-12. 
  12. ^ Arcangel, Cory. „On Compression” (PDF). Приступљено 6. 3. 2013. 
  13. ^ Marak, Laszlo. „On image compression” (PDF). University of Marne la Vallee. Архивирано из оригинала (PDF) 28. 5. 2015. г. Приступљено 6. 3. 2013. 
  14. ^ Mahoney, Matt. „Rationale for a Large Text Compression Benchmark”. Florida Institute of Technology. Приступљено 5. 3. 2013. 
  15. ^ Shmilovici A.; Kahiri Y.; Ben-Gal I.; Hauser S. (2009). „Measuring the Efficiency of the Intraday Forex Market with a Universal Data Compression Algorithm” (PDF). 33 (2). Computational Economics: 131—154. 
  16. ^ I. Ben-Gal (2008). „On the Use of Data Compression Measures to Analyze Robust Designs” (PDF). 54 (3). IEEE Transactions on Reliability: 381—388. 

Literatura

  • Reza, Fazlollah M. (1994) [1961]. An Introduction to Information Theory. New York: Dover [McGraw-Hill]. ISBN 0-486-68210-2. 
  • Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C. New York: John Wiley & Sons, Inc. ISBN 0-471-12845-7. 
  • Auffarth, B; Lopez-Sanchez, M.; Cerquides, J. (2010). „Comparison of Redundancy and Relevance Measures for Feature Selection in Tissue Classification of CT images”. Advances in Data Mining. Applications and Theoretical Aspects. Springer. стр. 248—262. CiteSeerX 10.1.1.170.1528Слободан приступ. 

Spoljašnje veze