AIRS Data Compression (Preliminary Work)
The first part of our work was determining what a good baseline would
be for our compression goals. We try to answer the questions: Can the
data be compressed and can we improve on current algorithms?
Preliminary Processing
We did not want any of our algorithms to break the dimensionality of the data
so as not to disturb the natural continuity of the images. The 90x135 images
represent readings from geographically adjacent (or close) locations, and the
2378 channels are readings from adjacent frequencies, so we would expect
continuity along all three dimensions. The video (right) shows the 90x135 scan
image as the channel varies, and supports our hypothesis.
|
(1:27 MPEG, 10.89 MB) |
The data contained 51 discontinuous channels (see channel 964 for an
example); we assumed that these were so called "radiometric reference
channels". Our group performed a simple calculation (the square root of
the integral of the squared differences between two adjacent channels)
to discover these "bad" channels, and store them separately.
Preliminary Work
We applied a 2D Discrete Cosine Transform to the original 90x135 image,
and noticed that many of the coefficients were negligible (as seen by the
blue in the image below).
After applying the discrete cosine transform to each 90x135 image
in our cube, we looked at the new cube along the channels (Figure 4).
Because of the structure of the image, we chose the Haar transform
as a preliminary transform.
Original Slice
|
Slice after 2D DCT
|
Slice after Haar
|
We then used an adapted Huffman algorithm for coding of
the transformed image. Due to project constraints (mostly
error correction demands of NASA), our group will focus on
fixed-length codes for the final product. However, Huffman
is provably optimal (it is a form of entropy coding) and
thus serves as a good benchmark for our future work.
Preliminary Results
We achieved a 10:1 preliminary compression ratio (lossy).
Original
|
Compressed<->Decompressed
|
Absolute Error: ± 0.2
|
Ongoing Work
- Adaptive Thresholding: We use a constant threshold to remove the "bad" images. This is not
suitable since the range of the data varies from one section of the
spectrum to another. The impact of bad channels on the compression ratio
is significant. Therefore, we will study more adaptive thresholding
techniques.
- Non-separable wavelet transforms: Because of physical properties
of the signal, we believe that wavelets will deliver the best transformation
of the data (with respect to entropy). Many of current applications of
wavelets use so-called "separable wavelet bases" which impose a structure on
the plane which is artificial to most images and leads to damage that becomes
obvious at the compression ratios demanded by the new hyperspectral
instruments.
- Fixed-Length Coding Algorithms: The electromagnetic spectrum is
unstable, and, therefore, error correction is a key issue. To minimize
the impact of error correction on our compression ratio, we will be
looking at non-entropy coding techniques.
- Assembly Programming: The software will reside onboard the
real-time satellite system, and, therefore, must be optimized to
operate in this environment. The upper bound on the entire procedure
is 12 seconds during which time it must process 1 granule (6 minutes
or 231.1416 MB of data). For the new GOES-R system, the data rate
is 34 times greater. Therefore, some assembly will be written along
key execution paths (especially in the advanced mathematical subroutines
-- eigenvector calculation, etc).
Notes
*We were provided Level 1B AIRS data. The data is
shared as an HDF EOS file (this is a standard for scientific data and can
be accessed via matlab's hdfsw gateway API). Each HDF file (or level 1B
"granule") represents 6 minutes of data (or 135 scans for each of the
2378 channels).
|