The National Library of Medicine announces its adoption of the Anglo-American standard for the formulation of journal title abbreviations according to the American National Standard for the Abbreviation of Titles of Periodicals (1969), with individual words abbreviated, in turn, according to the International List of Periodical Title Word Abbreviations (1970).
Objective. The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbreviations, we have developed an algorithm to match abbreviations in text with their expansions.
Objective: To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly.
Abbreviations are widely used in medicine. The understanding of abbreviations is important for medical language processing and information retrieval systems. The Unified Medical Language System (UMLS) contains a large number of abbreviations. We hypothesized that extracting and studying the UMLS abbreviations can be helpful for understanding the characteristics of abbreviations in medicine. In this paper, we describe a method for extracting abbreviations from the UMLS. We evaluated the method and studied the ambiguous nature of the abbreviations. In addition, the coverage of the UMLS abbreviations in medical reports was studied. Using our method, we extracted 163,666 unique (abbreviation, full form) pairs from the UMLS with a precision of 97.5%, and a recall of 96%. The UMLS abbreviations were highly ambiguous: 33.1% of abbreviations with six characters or less had multiple meanings; the average number of different full forms for all abbreviations with six characters or less was 2.28. The coverage of the UMLS abbreviations in medical reports was over 66%.
Abbreviations are widely used in writing, and the understanding of abbreviations is important for natural language processing applications. Abbreviations are not always defined in a document and they are highly ambiguous. A knowledge base that consists of abbreviations with their associated senses and a method to resolve the ambiguities are needed. In this paper, we studied the UMLS coverage, textual variants of senses, and the ambiguity of abbreviations in MEDLINE abstracts. We restricted our study to three-letter abbreviations which were defined using parenthetical expressions. When grouping similar expansions together and representing senses using groups, we found that after ignoring senses where the total number of occurrences within the corresponding group was less than 100, 82.8% of the senses matched the UMLS, covered over 93% of occurrences that were considered, and had an average of 7.74 expansions for each sense. Abbreviations are highly ambiguous: 81.2% of the abbreviations were ambiguous, and had an average of 16.6 senses. However, after ignoring senses with occurrences of less than 5, 64.6% of the abbreviations were ambiguous, and had an average of 4.91 senses.
Various natural language processing (NLP) systems have been developed to unlock patient information from narrative clinical notes in order to support knowledge based applications such as error detection, surveillance and decision support. In many clinical notes, abbreviations are widely used without mention of their definitions, which is very different from the use of abbreviations in the biomedical literature. Thus, it is critical, but more challenging, for NLP systems to correctly interpret abbreviations in these notes. In this paper we describe a study of a two-step model for building a clinical abbreviation database: first, abbreviations in a text corpus were detected and then a sense inventory was built for those that were found. Four detection methods were developed and evaluated. Results showed that the best detection method had a precision of 91.4% and recall of 80.3%. A simple method was used to build sense inventories from two different knowledge sources: the Unified Medical Language System (UMLS) and a MEDLINE abbreviation database (ADAM). Evaluation showed the inventory from the UMLS appeared to be the more appropriate of the two for defining the sense of abbreviations, but was not ideal. It covered 35% of the senses and had an ambiguity rate of 40% for those that were covered. However...
Many abbreviations are used in the literature especially in the life sciences, and polysemous abbreviations appear frequently, making it difficult to read and understand scientific papers that are outside of a reader’s expertise. Thus, we have developed Allie, a database and a search service of abbreviations and their long forms (a.k.a. full forms or definitions). Allie searches for abbreviations and their corresponding long forms in a database that we have generated based on all titles and abstracts in MEDLINE. When a user query matches an abbreviation, Allie returns all potential long forms of the query along with their bibliographic data (i.e. title and publication year). In addition, for each candidate, co-occurring abbreviations and a research field in which it frequently appears in the MEDLINE data are displayed. This function helps users learn about the context in which an abbreviation appears. To deal with synonymous long forms, we use a dictionary called GENA that contains domain-specific terms such as gene, protein or disease names along with their synonymic information. Conceptually identical domain-specific terms are regarded as one term, and then conceptually identical abbreviation-long form pairs are grouped taking into account their appearance in MEDLINE. To keep up with new abbreviations that are continuously introduced...
The processing of abbreviations in reading was examined with an eye movement experiment. Abbreviations were of two distinct types: Acronyms (abbreviations that can be read with the normal grapheme-phoneme correspondence rules, such as NASA) and initialisms (abbreviations in which the grapheme-phoneme correspondences are letter names, such as NCAA). Parafoveal and foveal processing of these abbreviations was assessed with the use of the boundary change paradigm (Rayner, 1975). Using this paradigm, previews of the abbreviations were either identical to the abbreviation (NASA or NCAA), orthographically legal (NUSO or NOBA), or illegal (NRSB or NRBA). The abbreviations were presented as capital letter strings within normal, predominantly lowercase sentences and also sentences in all capital letters such that the abbreviations would not be visually distinct. The results indicate that acronyms and initialisms undergo different processing during reading, and that readers can modulate their processing based on low-level visual cues (distinct capitalization) in parafoveal vision. In particular, readers may be biased to process capitalized letter strings as initialisms in parafoveal vision when the rest of the sentence is normal, lower case letters.
Clinical text is rich in acronyms and abbreviations, and they are highly ambiguous. As a pre-processing step before subsequent NLP analysis, we are developing and evaluating clinical abbreviation disambiguation methods. The evaluation of two sequential steps, the detection and the disambiguation of abbreviations, is reported here, for various types of clinical notes. For abbreviations detection, our result indicated the SPECIALIST Lexicon LRABR needed to be revised for better abbreviation detection. Our semi-supervised method using generated training data based on expanded form matching for 12 frequent abbreviations in our clinical notes reached over 90% accuracy in five-fold cross validation and unsupervised approach produced comparable results with the semi-supervised methods.
Recognition and identification of abbreviations is an important, challenging task in clinical natural language processing (NLP). A comprehensive lexical resource comprised of all common, useful clinical abbreviations would have great applicability. The authors present a corpus-based method to create a lexical resource of clinical abbreviations using machine-learning (ML) methods, and tested its ability to automatically detect abbreviations from hospital discharge summaries. Domain experts manually annotated abbreviations in seventy discharge summaries, which were randomly broken into a training set (40 documents) and a test set (30 documents). We implemented and evaluated several ML algorithms using the training set and a list of pre-defined features. The subsequent evaluation using the test set showed that the Random Forest classifier had the highest F-measure of 94.8% (precision 98.8% and recall of 91.2%). When a voting scheme was used to combine output from various ML classifiers, the system achieved the highest F-measure of 95.7%.
Acronyms and abbreviations within electronic clinical texts are widespread and often associated with multiple senses. Automated acronym sense disambiguation (WSD), a task of assigning the context-appropriate sense to ambiguous clinical acronyms and abbreviations, represents an active problem for medical natural language processing (NLP) systems. In this paper, fifty clinical acronyms and abbreviations with 500 samples each were studied using supervised machine-learning techniques (Support Vector Machines (SVM), Naïve Bayes (NB), and Decision Trees (DT)) to optimize the window size and orientation and determine the minimum training sample size needed for optimal performance. Our analysis of window size and orientation showed best performance using a larger left-sided and smaller right-sided window. To achieve an accuracy of over 90%, the minimum required training sample size was approximately 125 samples for SVM classifiers with inverted cross-validation. These findings support future work in clinical acronym and abbreviation WSD and require validation with other clinical texts.
Abbreviations are widely used in clinical notes and are often ambiguous. Word sense disambiguation (WSD) for clinical abbreviations therefore is a critical task for many clinical natural language processing (NLP) systems. Supervised machine learning based WSD methods are known for their high performance. However, it is time consuming and costly to construct annotated samples for supervised WSD approaches and sense frequency information is often ignored by these methods. In this study, we proposed a profile-based method that used dictated discharge summaries as an external source to automatically build sense profiles and applied them to disambiguate abbreviations in hospital admission notes via the vector space model. Our evaluation using a test set containing 2,386 annotated instances from 13 ambiguous abbreviations in admission notes showed that the profile-based method performed better than two baseline methods and achieved a best average precision of 0.792. Furthermore, we developed a strategy to combine sense frequency information estimated from a clustering analysis with the profile-based method. Our results showed that the combined approach largely improved the performance and achieved a highest precision of 0.875 on the same test set...
Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES. The evaluation used an expert-annotated gold standard set of clinical documents (derived from from 32 de-identified patient discharge summaries) containing 1,112 abbreviations. The existing NLP systems achieved suboptimal performance in abbreviation identification, with F-scores ranging from 0.165 to 0.601. MedLEE achieved the best F-score of 0.601 for all abbreviations and 0.705 for clinically relevant abbreviations. This study suggested that accurate identification of clinical abbreviations is a challenging task and that more advanced abbreviation recognition modules might improve existing clinical NLP systems.
Abbreviations are widely used in clinical documents and they are often ambiguous. Building a list of possible senses (also called sense inventory) for each ambiguous abbreviation is the first step to automatically identify correct meanings of abbreviations in given contexts. Clustering based methods have been used to detect senses of abbreviations from a clinical corpus . However, rare senses remain challenging and existing algorithms are not good enough to detect them. In this study, we developed a new two-phase clustering algorithm called Tight Clustering for Rare Senses (TCRS) and applied it to sense generation of abbreviations in clinical text. Using manually annotated sense inventories from a set of 13 ambiguous clinical abbreviations, we evaluated and compared TCRS with the existing Expectation Maximization (EM) clustering algorithm for sense generation, at two different levels of annotation cost (10 vs. 20 instances for each abbreviation). Our results showed that the TCRS-based method could detect 85% senses on average; while the EM-based method found only 75% senses, when similar annotation effort (about 20 instances) was used. Further analysis demonstrated that the improvement by the TCRS method was mainly from additionally detected rare senses...
Abbreviations are used to improve the speed of note keeping and to simplify patient notes. However studies have shown that they can reduce clarity, increase mistakes and cause confusion in management plans. Our review highlights the misuse of abbreviations in surgical note keeping.
El acceso al contenido de la Web en un derecho de todos en la sociedad de la información, por lo que hay que asegurar que todas las personas pueden acceder a la información sin tener en cuenta sus características de acceso. Las barreras de accesibilidad afectan a más grupos de usuarios que a los usuarios con discapacidad, y es fundamental trabajar en proporcionar accesibilidad a la Web que permita un acceso equitativo a todos. Para conseguir tal fin es esencial que exista tecnología de apoyo al profesional que diseña, desarrolla y mantiene una web accesible.
En este proyecto se presenta la aplicación ERTAUWWA, que son las siglas de Evaluation and Repair Tool for Abbreviations and Unusual Words for Web Accessibility, que en español significa Herramienta de Evaluación y Reparación para Abreviaturas y Palabras Inusuales para la Accesibilidad web. Esta aplicación se encuentra online, y su función principal es la de proporcionar soporte automático para cumplir con las pautas 3.1.3 y 3.1.4 de accesibilidad web impuestas por la Iniciativa de Accesibilidad Web (WAI) del World Wide Web Consortium (W3C) y publicadas en las Web Content Accessibility Guidelines 2.0 (WCAG 2.0).
Las pautas 3.1.3. Palabras Inusuales y 3.1.4 Abreviaturas de las WCAG 2.0 son relativas a hacer que los contenidos web textuales resulten legibles y comprensibles a través del tratamiento de las palabras inusuales y abreviaturas. El cumplir con estas pautas no es algo sencillo para el profesional...
The taxonomy of the family Filoviridae (marburgviruses and ebolaviruses) has changed several times since the discovery of its members, resulting in a plethora of species and virus names and abbreviations. The current taxonomy has only been partially accepted by most laboratory virologists. Confusion likely arose for several reasons: species names that consist of several words or which (should) contain diacritical marks, the current orthographic identity of species and virus names, and the similar pronunciation of several virus abbreviations in the absence of guidance for the correct use of vernacular names. To rectify this problem, we suggest (1) to retain the current species names Reston ebolavirus, Sudan ebolavirus, and Zaire ebolavirus, but to replace the name Cote d'Ivoire ebolavirus [sic] with Taï Forest ebolavirus and Lake Victoria marburgvirus with Marburg marburgvirus; (2) to revert the virus names of the type marburgviruses and ebolaviruses to those used for decades in the field (Marburg virus instead of Lake Victoria marburgvirus and Ebola virus instead of Zaire ebolavirus); (3) to introduce names for the remaining viruses reminiscent of jargon used by laboratory virologists but nevertheless different from species names (Reston virus...
The article describes the original method of creating a dictionary of
abbreviations based on the Google Books Ngram Corpus. The dictionary of
abbreviations is designed for Russian, yet as its methodology is universal it
can be applied to any language. The dictionary can be used to define the
function of the period during text segmentation in various applied systems of
text processing. The article describes difficulties encountered in the process
of its construction as well as the ways to overcome them. A model of evaluating
a probability of first and second type errors (extraction accuracy and
fullness) is constructed. Certain statistical data for the use of abbreviations
are provided.; Comment: 5 pages, 3 figures
La publicación original está disponible en: http://www.sedom.es/3_papeles/index.jsp; [ES] [Introducción] Las abreviaciones se utilizan en todos los documentos
asistenciales creando serios problemas
de comunicación entre los profesionales
y con los pacientes. El objetivo de este trabajo
es analizar las abreviaciones aparecidas en los
documentos de intercambio de información
entre los diversos niveles asistenciales: hojas
de urgencias del hospital, hojas de alta hospitalaria,
hojas de interconsulta de especializada
e Informes clínicos de especializada.
[Material y métodos] Cinco médicos del Centro de Salud de Xàtiva
analizaron, desde el 11 de abril al 11 de mayo
de 2005, 87 documentos identificando las
abreviaciones que contenían. En cada abreviación
se calculó su frecuencia y se buscó su significado,
procedencia (atención primaria, especializada
u hospitalaria), servicio, así como
la existencia de siglas polisémicas.
[Resultados] Se recogieron 433 abreviaciones diferentes en
las 1.253 registradas, de las que 25 aparecían
10 o más veces. Las más frecuentes fueron
“h”, “AP” y “a”. La mayor parte procedían de
Urgencias Hospitalarias (72%), Medicina Interna-
Ingreso hospitalario (6...