Automatic Paragraph Detection for Accessible PDF Documents

This paper describes a new algorithm for the automatic detection and tagging of paragraphs in PDF documents. This is an important feature of the PDF Accessibility Validation Engine (PAVE) [1 ] which is an open-source web application for the analysis and s

  • PDF / 2,956,745 Bytes
  • 6 Pages / 439.37 x 666.14 pts Page_size
  • 73 Downloads / 250 Views

DOWNLOAD

REPORT


)

InIT Institute of Applied Information Technology, ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland [email protected] Abstract. This paper describes a new algorithm for the automatic detection and tagging of paragraphs in PDF documents. This is an important feature of the PDF Accessibility Validation Engine (PAVE) [1] which is an open-source web appli‐ cation for the analysis and semi-automatic correction of accessibility issues in PDF documents. The tool is currently used by a large number of users, and their feedback is collected and evaluated. The evaluation so far revealed some major usability issues mainly due to the missing paragraph detection functionality. After an introduction in PDF accessibility this paper discusses the current usability issues with PAVE and describes the newly proposed algorithm to alleviate them. A first evaluation and conclusion of the results will be provided in the final paper. Keywords: Accessible PDF · Tagged PDF · Visual implement · Algorithm · Screen readers · Document accessibility

1

Introduction

Modern assistive technologies improve the ability for people with disability to work effectively with current software. People with impaired vision specifically face the chal‐ lenge that information is commonly presented visually on a screen. Screen readers [2] are used in such situations to render the content of the screen using speech synthesis or braille output devices. To allow users to navigate software interfaces and documents, screen readers must expose structural information to the user. For documents, a screen reader will provide key combinations that enumerate headings, figures or other elements. Listening to these items provides the user with an overview of the document, and selecting an item makes the screen reader navigate to the specific item and read the text of the document out aloud starting from that position. Some document formats have all content embedded in the kind of structural infor‐ mation needed by screen readers. The PDF format, however, is primarily designed to allow precise visual positioning of elements within a document, but these elements do not have any information about their structural relationship to other elements in the document. In order to introduce this kind of structural information the PDF format allows a tree of structural information separate to the content. This structural information, however, may be absent or incomplete depending on the software that generated the PDF file. © Springer International Publishing Switzerland 2016 K. Miesenberger et al. (Eds.): ICCHP 2016, Part I, LNCS 9758, pp. 367–372, 2016. DOI: 10.1007/978-3-319-41264-1_50

368

A. Darvishy et al.

There are a number of tools available to create accessible PDF documents [3], but PAVE [1] is the only available open source web based application for validating and fixing accessibility issues directly in the PDF files. It has been awarded with the first price at 2014’s Conference on Computers Helping People with Disabilities (ICCHP) [4]. It performs automat