Linguistic Summaries

In many tasks, users are not interested in data stored in relational databases, but in summarized relational knowledge and “abstracts” from the data which are expressed in a useful and understandable way by linguistic terms. Linguistic Summaries (LSs) are

  • PDF / 726,194 Bytes
  • 33 Pages / 439.37 x 666.142 pts Page_size
  • 64 Downloads / 208 Views

DOWNLOAD

REPORT


Linguistic Summaries

Abstract In many tasks, users are not interested in data stored in relational databases, but in summarized relational knowledge and “abstracts” from the data which are expressed in a useful and understandable way by linguistic terms. Linguistic Summaries (LSs) are able to express the knowledge in the data that is concise and easily understandable by users. LSs are quantified sentences of natural language such as most of municipalities of high altitude and low pollution have small number of respiratory diseases. The truth value of summaries gets values from the unit interval as it is common in the fuzzy logic world. We start with simple LSs and continue with more complex ones. In this direction, selecting appropriate t-norms for aggregation and quality measures are discussed. Furthermore, a system for calculating summaries will not work properly, if it uses ill-defined membership functions. Focus is also on constructing these functions for summarizers, restrictions and quantifiers. The quality measures are also analysed, because the high truth value of sentence is not always a sufficient measure. Finally, possible applications are considered.

3.1 Benefits and Protoforms of Linguistic Summarization Retrieving tuples from databases is the topic of Chap. 2. In many other tasks users are not interested in data, but in summaries which briefly explain data and relations among attributes. Summarization can be realized by statistical methods. These methods summarize the essential information from a data set into few numbers [9]. Methods such as means, medians and deviations provide valuable information, e.g. in 2011 municipalities produced waste of the average amount of 217.9184 kg per inhabitant with standard deviation of 189.2839. However, interpreting data in this way is practicable for rather small specialized groups of people. When the quality of collected data is not high (e.g. errors in data collection or rough estimation), the calculations should fight with these issues. Hence, the following quotation holds: ... method of summarization would be especially practicable if it could provide us with summaries that are not as terse as the mean [48].

© Springer International Publishing Switzerland 2016 M. Hudec, Fuzziness in Information Systems, DOI 10.1007/978-3-319-42518-4_3

67

68

3 Linguistic Summaries

Expressing data summarization by two-valued logic is limited. The truth value of sentence (predicate) created by the universal quantifier (∀) is 1, only if all tuples meet the requirement (condition), e.g. all territorial units have length of local roads < 200 km. If the truth value is 0, then we do not know whether 1 or 99 % of tuples do not meet the requirement. The same comment about data quality in statistical methods holds here. If someone, who is responsible for the data collection, value of 198.92 km rounds to 200 km and moreover, no other territorial unit has length of road greater or equal 200 km, then the truth value of this sentence is 0. Keeping aforementioned facts in mind, data s