Automatic Paragraph Detection for Accessible PDF Documents

This paper describes a new algorithm for the automatic detection and tagging of paragraphs in PDF documents. This is an important feature of the PDF Accessibility Validation Engine (PAVE) [1 ] which is an open-source web application for the analysis and s

PDF / 2,956,745 Bytes
6 Pages / 439.37 x 666.14 pts Page_size
73 Downloads / 289 Views

DOWNLOAD

REPORT

)

InIT Institute of Applied Information Technology, ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland [email protected] Abstract. This paper describes a new algorithm for the automatic detection and tagging of paragraphs in PDF documents. This is an important feature of the PDF Accessibility Validation Engine (PAVE) [1] which is an open-source web appli‐ cation for the analysis and semi-automatic correction of accessibility issues in PDF documents. The tool is currently used by a large number of users, and their feedback is collected and evaluated. The evaluation so far revealed some major usability issues mainly due to the missing paragraph detection functionality. After an introduction in PDF accessibility this paper discusses the current usability issues with PAVE and describes the newly proposed algorithm to alleviate them. A ﬁrst evaluation and conclusion of the results will be provided in the ﬁnal paper. Keywords: Accessible PDF · Tagged PDF · Visual implement · Algorithm · Screen readers · Document accessibility

1

Introduction

Modern assistive technologies improve the ability for people with disability to work eﬀectively with current software. People with impaired vision speciﬁcally face the chal‐ lenge that information is commonly presented visually on a screen. Screen readers [2] are used in such situations to render the content of the screen using speech synthesis or braille output devices. To allow users to navigate software interfaces and documents, screen readers must expose structural information to the user. For documents, a screen reader will provide key combinations that enumerate headings, ﬁgures or other elements. Listening to these items provides the user with an overview of the document, and selecting an item makes the screen reader navigate to the speciﬁc item and read the text of the document out aloud starting from that position. Some document formats have all content embedded in the kind of structural infor‐ mation needed by screen readers. The PDF format, however, is primarily designed to allow precise visual positioning of elements within a document, but these elements do not have any information about their structural relationship to other elements in the document. In order to introduce this kind of structural information the PDF format allows a tree of structural information separate to the content. This structural information, however, may be absent or incomplete depending on the software that generated the PDF ﬁle. © Springer International Publishing Switzerland 2016 K. Miesenberger et al. (Eds.): ICCHP 2016, Part I, LNCS 9758, pp. 367–372, 2016. DOI: 10.1007/978-3-319-41264-1_50

368

A. Darvishy et al.

There are a number of tools available to create accessible PDF documents [3], but PAVE [1] is the only available open source web based application for validating and ﬁxing accessibility issues directly in the PDF ﬁles. It has been awarded with the ﬁrst price at 2014’s Conference on Computers Helping People with Disabilities (ICCHP) [4]. It performs automat

Data Loading...

Automatic Paragraph Detection for Accessible PDF Documents

Recommend Documents

Layout Analysis of PDF Documents by Two-Dimensional Grammars for the Production of Accessible Textbooks

On Automatic Conversion from E-born PDF into Accessible EPUB3 and Audio-Embedded HTML5

Automatic Information Extraction from Scanned Documents

PDF

Automating Stress Detection from Handwritten Documents

An Approach for Logo Detection and Retrieval in Documents

Documents for International Trade

Automatic Detection of MPI Assertions

Gender Detection from Handwritten Documents Using Concept of Transfer-Learning

Documents

Probabilty Density Function (PDF)

A Robust Approach to Plagiarism Detection in Handwritten Documents