首页->大浪淘沙

说“层”(重写版)

作者:马健
邮箱:stronghorse@tom.com
主页:http://stronghorse.yeah.net
发布日期:2007.01.10
最近更新:2009.09.22

目录
一、MRC模型
二、DjVu中的层
三、PDF中的层
一、PDG中的层


2007年初本文以读书笔记的形式首发,记录了我对几种常见电子文档格式的认识,但似乎内容有些散,这次以MCR模型为纲,内容全部重写。


一、MRC模型

MRC的全称为Mixed Raster Content,中文直译为“混合光栅内容”,是一种表达混合光栅图像的有效模型,目前已成为国际推荐标准,编号ISO/IEC 16485-2000。在该标准的“Introduction and background”部分介绍了MRC的背景:

The Mixed Raster Content (MRC) Recommendation is a way of describing raster-oriented (scanned and/or rasterized synthetic images) documents with both bi-level (text and/or line-art) and multi-level (colour/continuous-tone) data within a page. The goal of this MRC Recommendation is to make exchange of raster-oriented mixed content colour documents among users with varied communication systems possible with higher speed, higher image quality and modest computing resources (memory, storage and processing power).

接着介绍了MRC模型的基本思路,我自己总结是“内容分层,按需压缩”:

The best approach to achieve high compression ratios and retain quality is to compress the different segments of the raster data according to their individual attributes. Text and line-art data (bi-level data) would be compressed with an approach that puts high emphasis on maintaining the detail and structure of the input. Pictures and colour gradients (multi-level data) would be compressed using an approach that puts a high emphasis on maintaining the smoothness and accuracy of the colours. These different data types (bi-level and multi-level) are often conceptualized as being on separate layers/planes within the page.

This separation of the data by importance of content (spatial detail vs. colour) also directly implies that it is advantageous to use different resolutions for the different data, with a high spatial resolution used for text/line-art and high colour resolution for images/gradients.

This concept of data separation by importance of content has led to development of the base mode 3-layer model on which the MRC Recommendation is built...The base mode 3-layer model identifies three basic data types that may be contained within a page. These are multi-level data associated with contone colour (continuous-tone and/or palletized colour) image for which mid-to-low spatial and high colour resolution is typically appropriate for good reproduction; bi-level data associated with high detail of text/line-art for which high spatial and low colour resolution is typically appropriate for good reproduction; multi-level data associated with multi-level colours of the text/line-art data for which midto-high spatial and mid-colour resolution is typically appropriate for good reproduction...The process of image regeneration is controlled by the middle bi-level layer that acts as a mask or selector to select whether pixels from the background contone layer or foreground text/line-art colour layer will be reproduced. Due to its selection function this layer is referenced as the mask or selector layer, throughout this Recommendation the middle layer will be referenced as the mask layer. When the value of a mask layer pixel is one (1), the corresponding pixel from the foreground is selected and reproduced. When the value of the mask layer pixel is zero (0) the corresponding pixel from the background is selected and reproduced.

标准中随后用下面这个彩图形象解释了“The base mode 3-layer model”:

顺便一提:原版ISO/IEC 16485-2000中很多图是彩色的,D版就成了灰度,看起来有点费劲。

ISO/IEC 16485-2000第7.4节“Layer combination”对三层模型的显示过程给出了建议:

Image layers are rendered sequentially in ascending order of layer number (i.e. layer 1 then 3). The background layer (i.e. layer 1), if present, shall be rendered first...In event of an image layer (i.e. layer 3), or portion thereof, without a corresponding mask layer, the image layer shall be rendered on top of any previously rendered layer.

基本三层模型并不是一成不变的,ISO/IEC 16485-2000给出了灵活的建议:

Given limited device memory in many facsimile implementations and that mixed content pages often have a mixture of: text/line-art (monochrome or coloured) regions; contone image regions; text/line-art (monochrome or coloured) and contone image regions. There are provisions to subdivide the page into horizontal stripes that span the entire width of the page and isolate individual regions, see Figure 3. Stripes are composed of one or more layers as determined by the image type within the stripe.

The 3-layer model has 3 types of horizontal stripes that are implemented according to the type of data being addressed:

  • 3-layer stripe (3LS), so referenced since it contains all three of the foreground, mask and background layers as in Figure 1...;
  • 2-layer stripe (2LS), so referenced since it contains coded data for two of the three layers (the third is set to a fixed value). The two layers may be mask and background...or mask and foreground layers...;
  • 1-layer stripe (1LS), so referenced since it contains coded data for only one of the three layers (the other two are set to fixed values). The one layer may be mask, background  or foreground...

即在基本三层模型基础上,可以根据实际内容简化成2层、1层。而在标准文本的“A.6.4 N-layer stripe (NLS)”部分,则将三层模型扩展成了N层:

N-layer stripes (NLS), where N is an integer, are an extension to the basic structure of Recommendation T.44, as defined in this annex. The NLS contains more than three (3) layers; see Figure A.1. It provides a means to transfer one or more multi-level image layers (background, foreground, layer 5, layer 7, …) and one or more bi-level mask layers (layers 2, 4, 6, …) that define layer recombination on the same page. Beyond layer 1 (background), the layers occur in pairs, 2 and 3, 4 and 5, etc. The main mask layer (layer 2) must span the full dimension of the stripe while other layers (i.e. layer 1, 3, 4, 5, …) may have an offset and dimensions that are less than those of the stripe. The offset and dimensions of the masks need not be the same as those of the corresponding image layers, see Figure A.1. This capability enables representation of
richly coloured text, graphics, and line-art together with contone image using a combination of multi-level and bi-level coding methods.

总之,ISO/IEC 16485-2000对MRC进行了详细说明,从理论到实践都有,对通过扫描的方式实现文档电子化具有重要指导意义。

、DjVu中的层

按照Lizardtech公司2005年版《Lizardtech DjVu Reference DjVu V3》第3.1、3.2节的说明,DjVu是以MRC“基本三层模型”为基础的:

The principal imaging model used in DjVu is the "Mixed Raster Content" (MRC) model described in ITU-T Recommendation T.44, ISO/IEC 16485. In this model, an image is decomposed into foreground and background layers. To select whether a particular pixel comes from the foreground or background a bitonal "selection" or "mask" layer is provided. These three layers are compressed separately using techniques which are
optimized for each type of data.

The foreground and background layers are compressed using a wavelete-based continous-tone image compression technique known as IW44.

The mask layer is compressed using a bitonal image compression technique that takes advantage of repetitions of nearly identical shapes on the page (such as characters) to
efficiently compress text images.

A DjVu image need not contain all three layers and alternative compression techniques are available for each layer.

DjVu Documents can be single- or multi-page. Each page consists of a DjVu image as described above (photo, bitonal or an MRC-based composition). Such a page, by itself is a valid DjVu Document. Multipage Documents can take either of two forms : Bundled or Indirect.

在该文第7.1节中,对上面“A DjVu image need not contain all three layers”的说法进行了解释,即将DjVu图像分类成单层的Photo DjVu lmage、Bi-level DjVu lmage,及多层的Compound DjVu lmage,并对每一种详细说明。而在附录部分,则对IW44和JB2压缩数据流进行了详细说明,但对这两种算法的来历没说啥。

而在AT&T的Patrick Haffner、Leon Bottou、Yann Lecun与Lizardtech公司的Luc Vincent合著的论文《A General Segmentation Scheme For DjVu Document Compression》第2章中,对JB2算法的来历进行了介绍:

The mask image is encoded with a new bi-level image compression algorithm called JBZ or DjVuBitonal. It is a variation on AT&T's proposal to the emerging JBIG2 standard. The basic idea of JB2 is locate individual shapes on the page (such as characters), and use a shape clustering algorithm to find similarities between shapes. Shapes that are representative of each cluster (or in a cluster by themselves) are coded as individual bitmaps with a method similar to JBIG1.

在我写的《DjVu转PDF》中,就以此为基础讨论了JB2压缩的DjVu无损转换成JBig2压缩的PDF,及逆过程的实现。

三、PDF中的层

ISO 32000-1:2008以Adobe公司2006年版《PDF Reference 6th edition》为蓝本,但是删掉了后者第2章“Overview”部分,而恰恰是这个部分的2.1节对PDF的Imaging Model进行了解释:

At the heart of PDF is its ability to describe the appearance of sophisticated graphics and typography. This ability is achieved through the use of the Adobe imaging model, the same high-level, device-independent representation used in the PostScript page description language.

Although application programs could theoretically describe any page as a full-resolution pixel array, the resulting file would be bulky, device-dependent, and impractical for high-resolution devices. A high-level imaging model enables applications to describe the appearance of pages containing text, graphical shapes, and sampled images in terms of abstract graphical elements rather than directly in terms of device pixels. Such a description is economical and device-independent, and can be used to produce high-quality output on a broad range of printers, displays, and other output devices.

第2.1.2节具体说明Adobe imaging model:

The Adobe imaging model is a simple and unified view of two-dimensional graphics borrowed from the graphic arts. In this model, “paint” is placed on a page in selected areas:

  • The painted figures can be in the form of character shapes (glyphs), geometric shapes, lines, or sampled images such as digital representations of photographs.
  • The paint may be in color or in black, white, or any shade of gray. It may also take the form of a repeating pattern (PDF 1.2) or a smooth transition between colors (PDF 1.3).
  • Any of these elements may be clipped to appear within other shapes as they are placed onto the page.

...PDF 1.3 and earlier versions use an opaque imaging model in which each new graphics object painted onto a page completely obscures the previous contents of the page at those locations (subject to the effects of certain optional parameters that may modify this behavior; see Section 4.5.6, “Overprint Control”). No matter what color an object has—white, black, gray, or color—it is placed on the page as if it were applied with opaque paint. PDF 1.4 introduces a transparent imaging model in which objects painted on the page are not required to be fully opaque. Instead, newly painted objects are composited with the previously existing contents of the page, producing results that combine the colors of the object and its backdrop according to their respective opacity characteristics. The transparent imaging model is described in Chapter 7.

从上面的说明看,Adobe imaging model的雄心与范围要比MRC更远大,难怪搜遍ISO 32000-1:2008和《PDF Reference 6th edition》,都找不到“MCR”这个关键词——Adobe大概认为双方不是一路货色:按照《PDF Reference 6th edition》2.1.1节说,PDF基于页面描述语言(page description language),在PDF文件中需要用描述语言具体描述各种对象之间的关系、绘制过程;而MRC是基于分层图像,各层图像之间的关系及绘制方法是在规范里预先规定好的。第2.1.1节原文:

Among its other roles, PDF serves as a page description language, a language for describing the graphical appearance of pages with respect to an imaging model. An application program produces output through a two-stage process:

1.The application generates a device-independent description of the desired output
in the page description language.
2.A program controlling a specific output device interprets the description and renders it on that device.

The two stages may be executed in different places and at different times. The page description language serves as an interchange standard for the compact, device-
independent transmission and storage of printable or displayable documents.

如果PDF的对象范围仅限于图像对象,不含path、text等,PDF 1.4开始支持的transparent imaging model看起来就和MRC很像了。在《PDF Reference 6th edition》第7章、ISO 32000-1:2008第11章中,对此模型进行了详细解释。《PDF Reference 6th edition》第7.1节的原文:

Under the transparent imaging model, all of the objects on a page can potentially contribute to the result. Objects at a given point can be thought of as forming a transparency stack (or stack for short). The objects are arranged from bottom to top in the order in which they are specified. The color of the page at each point is determined by combining the colors of all enclosing objects in the stack according
to compositing rules defined by the transparency model.

这个看起来和MRC的N层模型描述是不是很像?在我写的《DjVu转PDF》中,就以这种相似性为基础,讨论了三层DjVu到PDF的转换,及MRC模型对开发图像转PDF软件的指导意义——其实这才是我费劲去写这两篇文章的真正原因。

四、PDG中的层

PDG是一种私有文件格式,到目前为止我没有见到任何公开发行的官方格式说明,所有的信息来源都是非官方的,信就信,不信我也没办法。

早期V1版PDG文件结构非常单纯,文件头中仅有版本号、图像尺寸等简单信息,文件格式也只有两种:如果是黑白页面,则采用CCITT G4算法的变种压缩后封装成V1版PDG,否则存储成标准JPG。

这样简单的格式定义实现起来固然方便,但没有任何加密、限制措施,导致D版满天飞。所以V1版的寿命其实并不长,很快就被V2版取代。按我个人的理解,V2版的改进主要体现在以下方面:

  • 支持多种加密算法,从最简单的0xH到最复杂(到目前为止)的6xH。加密算法与专用浏览器中的权限控制相结合,有效遏制了V1版时代疯狂D版的势头。
  • 支持多种格式,除V1版的CCITT G4变种外,JPG、DjVu、PDF等都可以封装进V2版PDG了,色彩也从V1时代的黑白二色扩展至24位真彩。 不过PDG中的DjVu均是单层,没见过多层的。
  • 支持文字层与插图层分离,并分别采用不同的压缩格式:文字层采用CCITT G4变种或黑白DjVu(其实是JB2),插图可以是彩色或灰度JPG、DjVu(IW44压缩)。

最后这一点其实是MRC模型“内容分层,按需压缩”思想的一种体现,能够在保证文字清晰度的基础上,有效减小最终的文件长度。不过与MRC的基本三层模型相比,V2版PDG更像 从MRC的双层模型扩展出的N层模型:没有模板层,或者可以认为模板层是按照前景层动态生成的,显示的时候插图层直接覆盖在背景文字层上;多个插图允许叠加,形成一个N层stack。

从前面对MRC模型的分析看,如果在将PDG转换成PDF或DjVu时忽略这些格式在MRC模型上的相似性,甚至在转换过程中破坏原先的MRC层次关系(如把分层PDG打印成PDF),可能会带来图像质量或文件长度方面的损失,反之则可以 尽量避免。这也是我对MRC感兴趣的原因之一。

不过对于V2版PDG中图像层与文字层的分离手段,我不太相信是通过软件自动完成,更像是用人眼判断后手工完成。如下面这两幅图*:

   

第一幅图文字没有切完整,第二幅图切出了阶梯,怎么看都像是人用鼠标切出来的。

*这两幅图出自清晰版《[法]Georges Jean原著 曹锦清,马振聘译.文字与书写 思想的符号.上海书店出版社,2001.》(SSID=10406715)第13、115页,在SSREADER 4.0 build 070511下显示,使用缺省底纹。