Medical reporting using speech recognition

The wish to control machines by means of speech is older than the ancestors of our computers. But only in recent years have computers gained the power to turn this dream into a useful tool for our daily work. There are different types of speech-controlled

  • PDF / 1,405,840 Bytes
  • 7 Pages / 595.276 x 790.866 pts Page_size
  • 78 Downloads / 286 Views

DOWNLOAD

REPORT


Introduction The wish to control machines by means of speech is ?lder than the ancestors of our computers. But only m recent years have computers gained the power to turn this dream into a useful tool for our daily work. There are different types of speech-controlled systems today: for example, some systems simplify the lives of handicapped people by operating lights or electrical appliances with the human voice. Other systems enable text to be entered into word processors; these speech recognition systems may be combined with speech-control functions, e.g. to enable the computer to recognize commands such as "Print" or "Save" during dictation. Speech recognition systems (SRS) have been spreading rapidly ever since some fundamental problems were solved. Among the first professions to recognize the advantages of this revolutionary technology were lawyers and medical doctors. Above all, radiologists are today using speech recognition in multi-workstation networks, but there are also software packages for surgical reports, general medical reports, neurologists, cardiologists, internists, orthopedists, and other medical areas [2-4,6]. In the meantime many dictation languages are extensively supported, e.g. English UK, English US, German, French and Dutch; some vendors even offer Italian Chinese, Arabic, Spanish and Swedish. '

The underlying technology Speech recognition poses a great challenge for computers, since there are many and various problems to be solved. First, the relationship between written letters and spoken words is only a very indirect one. Every language has its own set of sound units, the so-called phonemes. This means that the computer must analyze the sound vibrations recorded by a microphone according to an "acoustic

W. Hruby (ed.), Digital (R)Evolution in Radiology © Springer-Verlag Wien 2001

model" and convert these into individual phonemes; these phonemes are then combined into sets and assigned to suitable words taken from a large database. However, recognizing phonemes is not as easy as it may look. On the one hand, unwanted background noise such as ringing telephones or banging doors must be recognized and suppressed. On the other hand, people have different voices, accents and, above all, speaking styles. For every spoken word, the software chooses the most likely entry from several entries stored in the electronic dictionary, and then compares the word combinations found in the text with a "language model". This model reflects the fact that certain words frequently appear together in medical reports while other word combinations are extremely unlikely. Only after this analysis can the final transcription be performed by the speech recognition software [7,8]. Finally, varying speaking speeds must also be taken into consideration. Older systems required the spe.aker to pause between one word and the next [5], wh1ch made speech quite unnatural. Advanced software programs now allow natural dictation without artificial pauses, and even strongly varying speaking speeds are no problem. Varying s