Página 1 dos resultados de 89 itens digitais encontrados em 0.009 segundos

Extração de metadados utilizando uma ontologia de domínio; Metadata extraction using a domain ontology

Oliveira, Luis Henrique Gonçalves de
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Dissertação Formato: application/pdf
Português
Relevância na Pesquisa
68.522153%
O objetivo da Web Semântica é prover a descrição semântica dos recursos através de metadados processáveis por máquinas. Essa camada semântica estende a Web já existente agregando facilidades para a execução de pesquisas, filtragem, resumo ou intercâmbio de conhecimento de maior complexidade. Dentro deste contexto, as bibliotecas digitais são as aplicações que estão iniciando o processo de agregar anotações semânticas às informações disponíveis na Web. Uma biblioteca digital pode ser definida como uma coleção de recursos digitais selecionados segundo critérios determinados, com alguma organização lógica e de modo acessível para recuperação distribuída em rede. Para facilitar o processo de recuperação são utilizados metadados para descrever o conteúdo armazenado. Porém, a geração manual de metadados é uma tarefa complexa e que demanda tempo, além de sujeita a falhas. Portanto a extração automática ou semi-automática desses metadados seria de grande ajuda para os autores, subtraindo uma tarefa do processo de publicação de documentos. A pesquisa realizada nesta dissertação visou abordar esse problema, desenvolvendo um extrator de metadados que popula uma ontologia de documentos e classifica o documento segundo uma hierarquia pré-definida. A ontologia de documentos OntoDoc foi criada para armazenar e disponibilizar os metadados extraídos...

OpenAIRE guidelines for data source managers: aiming for metadata harmonization

Príncipe, Pedro; Schirrwagen, Jochen
Fonte: Universidade do Minho Publicador: Universidade do Minho
Tipo: Conferência ou Objeto de Conferência
Publicado em //2015 Português
Relevância na Pesquisa
27.924102%
Poster apresentado no CERN Workshop on Innovations in Scholarly Communication (OAI9), Genebra, Suíça, junho 2015.; OpenAIRE2020 is the European Union initiative for the Open Access Infrastructure for Research in Europe which supports open scholarly communication and access to the research outputs of European funded projects. The infrastructure operates on three levels: gathering research outputs and policy harmonization and community outreach. The current OpenAIRE infrastructure and services, resulting from the FP7 projects OpenAIRE and OpenAIREplus, build on Open Access research results from a wide range of repositories and other data sources: institutional or thematic publication repositories, Open Access journals, data repositories, Current Research Information Systems (CRIS) and aggregators. The poster briefly outlines three complementary areas of the process of gathering Open Access and funded content: 1) the set of guidelines available for OpenAIRE data source managers (for Literature Repositories, Data Archives and CRIS managers), 2) the metadata curation and enrichment process that follows the OpenAIRE data curation policy, 3) the validator tool available for testing the guidelines compatibility of the data sources and their registration into the infrastructure. OpenAIRE collects research outputs from a network of data sources. To facilitate this...

The low availability of metadata elements for evaluating the quality of medical information on the World Wide Web.

Shon, J.; Musen, M. A.
Fonte: American Medical Informatics Association Publicador: American Medical Informatics Association
Tipo: Artigo de Revista Científica
Publicado em //1999 Português
Relevância na Pesquisa
27.759045%
A great barrier to the use of Internet resources for patient education is the concern over the quality of information available. We conducted a study to determine what information was available in Web pages, both within text and metadata source code, that could be used in the assessment of information quality. Analysis of pages retrieved from 97 unique sites using a simple keyword search for "breast cancer treatment" on a generic and a health-specific search engine revealed that basic publishing elements were present in low frequency: authorship (20%), attribution/references (32%), disclosure (41%), and currency (35%). Only one page retrieved contained all four elements. Automated extraction of metadata elements from the source code of 822 pages retrieved from five popular generic search engines revealed even less information. We discuss the design of a metadata-based system for the evaluation of quality of medical content on the World Wide Web that addresses current limitations in ensuring quality.

A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models

Misra, Dharitri; Chen, Siyuan; Thoma, George R.
Fonte: PubMed Publicador: PubMed
Tipo: Artigo de Revista Científica
Publicado em //2009 Português
Relevância na Pesquisa
27.759045%
One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.

Metadata Extraction Routines for Improving Infobutton Performance

Hulse, Nathan C.; Haug, Peter J.
Fonte: American Medical Informatics Association Publicador: American Medical Informatics Association
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
48.39376%
Infobuttons have been proven as an effective means for providing quick, context-specific links to pertinent information resources at the point of care. Current infobutton manager implementations, however, lack the ability to exchange metadata, are limited to a relatively small set of information providers, and are targeted primarily for a clinician audience. As part of a local effort to implement infobuttons for patient use via a tethered personal health record, we present a series of metadata extraction routines. These routines were constructed to extract key pieces of information from health information providers on the Internet, including content coverage, language availability, and readability scores. The extraction routines were tested using thirty different disease conditions against eight different providers. The routines yielded 183 potential infobutton targets and associated metadata for each. The capabilities of the extraction routines will be expanded to cover new types of metadata in the future.

EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

Baker, Ed
Fonte: Pensoft Publishers Publicador: Pensoft Publishers
Tipo: Artigo de Revista Científica
Publicado em 16/09/2013 Português
Relevância na Pesquisa
38.21599%
Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.

Automatic classification of documents with an in-depth analysis of information extraction and automatic summarization

Hohm, Joseph Brandon, 1982-
Fonte: Massachusetts Institute of Technology Publicador: Massachusetts Institute of Technology
Tipo: Tese de Doutorado Formato: 92 leaves; 3024644 bytes; 3024453 bytes; application/pdf; application/pdf
Português
Relevância na Pesquisa
37.393555%
Today, annual information fabrication per capita exceeds two hundred and fifty megabytes. As the amount of data increases, classification and retrieval methods become more necessary to find relevant information. This thesis describes a .Net application (named I-Document) that establishes an automatic classification scheme in a peer-to-peer environment that allows free sharing of academic, business, and personal documents. A Web service architecture for metadata extraction, Information Extraction, Information Retrieval, and text summarization is depicted. Specific details regarding the coding process, competition, business model, and technology employed in the project are also discussed.; by Joseph Brandon Hohm.; Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.; Includes bibliographical references (leaves 78-80).

Metadados de Bancos de Dados Relacionais: Extração e Exposição com o Protocolo OAI-PMH; Metadata of Relacional Database: Extraction and ExpositionWith OAI-PMH Protocol

KOWATA, Elisabete Tomomi
Fonte: Universidade Federal de Goiás; BR; UFG; Mestrado em Ciência da Computação; Ciências Exatas e da Terra - Ciências da Computação Publicador: Universidade Federal de Goiás; BR; UFG; Mestrado em Ciência da Computação; Ciências Exatas e da Terra - Ciências da Computação
Tipo: Dissertação Formato: application/pdf
Português
Relevância na Pesquisa
38.050935%
Information about a particular subject can be stored in different repositories such as databases, digital libraries, spreadsheets, text files, web pages etc. In this context of heterogeneous data sources, to query, possibly in natural language, to integrate information and to promote interoperability are tasks that depend, among other factors, on the prior knowledge that an user has regarding location, owner, content description of each information source etc. More specifically, in the case of database, this information are not usually stored in a catalogue of the database management system and to obtain is necessary to resort to the administrator s knowledge database. Another factor is the absence of search engines to databases in the web that access and make available the information in those repositories, data are limited due to the organizations themselves. In a shared information environment, it is highly relevant to make possible access to metadata that describe a data source, regardlessly of the device and format in which is stored. This study aims to describe a mechanism to promote interoperability of relational databases with other sources of information through the extraction and exposing of metadata using OAI-PMH; Informações sobre um determinado assunto podem estar armazenadas em diferentes repositórios como banco de dados...

Automated metadata extraction

Migletz, James J.
Fonte: Monterey, California. Naval Postgraduate School Publicador: Monterey, California. Naval Postgraduate School
Tipo: Tese de Doutorado
Português
Relevância na Pesquisa
38.050322%
Metadata is data that describes data. There are many computer forensic uses of metadata and being able to extract metadata automatically provides positive forensic implications. This thesis presents a new technique for batch processing disk images and automatically extracting metadata from files and file contents. The technique is embodied in a program called fiwalk that has a plug-in architecture allowing new metadata extractors to be readily incorporated. Output from fiwalk can be provided in multiple formats such as ARFF and text. The plug-ins created for this thesis include one created by Simson Garfinkel for extracting metadata from .jpeg files, two for Microsoft Office documents (one for prior to Office 2007 release and one for Office 2007 release), and a default plug-in for extracting metadata from .gif, .pdf, and .mp3 files. To better understand the metadata available in common file formats such as .doc, .docx, .odt, .pdf, .mp3, .mp4, .jpeg, tiff, and .gif, an examination of these formats is provided.

Preservation Metadata – adapting or adopting PREMIS for APSR

Lee, Bronwyn; Clifton, Gerard; Langley, Somaya
Fonte: Universidade Nacional da Austrália Publicador: Universidade Nacional da Austrália
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
37.924102%
APSR aims to establish a centre of excellence in sustainable digital resource management, and partner universities are developing demonstrator repositories built on sustainability principles. This presentation outlines the results of a project commissioned by APSR to specify requirements for the collection of metadata needed for long term continuity of access to digital collections. The project was called PRESTA (PREMIS Requirements Statement) but it took a broader view than PREMIS alone. The presentation will highlight the following areas of the project report: • recommended preservation metadata elements including ‘mandatory’ elements; • recommended supported file formats; • recommended tools for automatic metadata extraction and their capabilities; • gaps in preservation metadata collected in selected partner repositories and recommendations for enhancements; • functional specifications and use cases for preservation events and event ('history') logging, a significant piece of the digital preservation framework not yet included in partner repositories; • a draft METS profile for exchanging preservation metadata.; Powerpoint presentation made at APSR event Long-term Repositories: Taking the Shock out of the Future...

Tools and Techniques for Preservation Metadata Extraction and Collection

Black, Matthew
Fonte: Universidade Nacional da Austrália Publicador: Universidade Nacional da Austrália
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
38.146775%
Presentation topics include: the tools and techniques that can be used to automatically extract preservation metadata embedded or associated with digital objects; what to do with the extracted metadata; what can and can't be done with current tools (including the National Library of New Zealand's Preservation Metadata Extract Tool); future technologies; workflows; validation/extraction stacks; and the importance of good metadata structure for effective preservation reporting and risk analysis.; Powerpoint presentation made at APSR event Long-term Repositories: Taking the Shock out of the Future, August-September 2006.

Generating recommendations based on robust term extraction from users' reviews

D'Addio, Rafael Martins; Conrado, Merley da Silva; Rezende, Solange Oliveira; Manzato, Marcelo Garcia
Fonte: Universidade Federal da Paraíba – UFPB; Núcleo de Pesquisa e Extensão em Aplicações de Vídeo Digital - LAViD; Sociedade Brasileira de Computação – SBC; João Pessoa Publicador: Universidade Federal da Paraíba – UFPB; Núcleo de Pesquisa e Extensão em Aplicações de Vídeo Digital - LAViD; Sociedade Brasileira de Computação – SBC; João Pessoa
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
37.393555%
In this paper, we propose a technique to automatically describe items based on users' reviews in order to be used by recommender systems. For that, we extract items' features using a robust term extraction method that applies transductive semi-supervised learning to automatically identify aspects that represent the different subjects of the reviews. Then, we apply sentiment analysis in a sentence level to indicate the polarities, yielding a consensus of users regarding the features of items. Our approach is evaluated using a collaborative filtering method, and comparisons using structured metadata as baselines show promising results.; FAPESP (process numbers 2013/10756-5, 2009/16142-3, and 2013/22547-1)

Video metadata extraction in a videoMail system

Moskovchuk, Serhiy
Fonte: Universidade Nova de Lisboa Publicador: Universidade Nova de Lisboa
Tipo: Dissertação de Mestrado
Publicado em /05/2015 Português
Relevância na Pesquisa
37.387668%
Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving.

Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications : Berlin, 22-26 September 2008 : DC 2008 : Berlin, Germany; Proceedings of the International Conference on Dublin Core and Metadata Applications DC-2008, Berlin

Greenberg, Jane (Ed.); Wolfgang, Klas (Ed.)
Fonte: Singapore : Dublin Core Metadata Initiative ; [Göttingen] : Universitätsverlag Göttingen, 2008. Publicador: Singapore : Dublin Core Metadata Initiative ; [Göttingen] : Universitätsverlag Göttingen, 2008.
Tipo: Livro Formato: application/pdf
Português
Relevância na Pesquisa
58.522524%
vi, 217 p. : ill. ; 30 cm.; Metadata is a key aspect of our evolving infrastructure for information management, social computing, and scientific collaboration. DC-2008 will focus on metadata challenges, solutions, and innovation in initiatives and activities underlying semantic and social applications. Metadata is part of the fabric of social computing, which includes the use of wikis, blogs, and tagging for collaboration and participation. Metadata also underlies the development of semantic applications, and the Semantic Web — the representation and integration of multimedia knowledge structures on the basis of semantic models. These two trends flow together in applications such as Wikipedia, where authors collectively create structured information that can be extracted and used to enhance access to and use of information sources. Recent discussion has focused on how existing bibliographic standards can be expressed as Semantic Web vocabularies to facilitate the ingration of library and cultural heritage data with other types of data. Harnessing the efforts of content providers and end-users to link, tag, edit, and describe their information in interoperable ways (”participatory metadata”) is a key step towards providing knowledge environments that are scalable...

A Simple Extraction Procedure for Bibliographical Author Field

Constans, Pere
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/02/2009 Português
Relevância na Pesquisa
37.24066%
A procedure for bibliographic author metadata extraction from scholarly texts is presented. The author segments are identified based on capitalization and line break patterns. Two main author layout templates, which can retrieve from a varied set of title pages, are provided. Additionally, several disambiguating rules are described.

An Agent based Approach towards Metadata Extraction, Modelling and Information Retrieval over the Web

Ahmed, Zeeshan; Gerhard, Detlef
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 07/08/2010 Português
Relevância na Pesquisa
37.24066%
Web development is a challenging research area for its creativity and complexity. The existing raised key challenge in web technology technologic development is the presentation of data in machine read and process able format to take advantage in knowledge based information extraction and maintenance. Currently it is not possible to search and extract optimized results using full text queries because there is no such mechanism exists which can fully extract the semantic from full text queries and then look for particular knowledge based information.; Comment: In the proceedings of First International Workshop on Cultural Heritage on the Semantic Web in conjunction with the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference 2007, (ISWC + ASWC 2007), P 117, 12-15 November 2007

Conversion and metadata extraction frameworks

Droogmans, Lieven; Bosman, Ben
Fonte: Universidade de Cambridge Publicador: Universidade de Cambridge
Tipo: Outros
Português
Relevância na Pesquisa
37.387668%

Enhanced display of scientific articles using extended metadata

Roderic D. M. Page
Fonte: Nature Preceedings Publicador: Nature Preceedings
Tipo: Manuscript
Português
Relevância na Pesquisa
37.924102%
Although the Web has transformed science publishing, scientific papers themselves are still essentially "black boxes", with much of their content intended for human readers only. Typically, computer-readable metadata associated with an article is limited to bibliographic details. By expanding article metadata to include taxonomic names, identifiers for cited material (e.g., publications, sequences, specimens, and other data), and geographical coordinates, publishers could greatly increase the scientific value of their digital content. At the same time this will provide novel ways for users to discover and navigate through this content, beyond the relatively limited linkage provided by bibliographic citation. As a proof of concept, my entry in the Elsevier Grand Challenge extracted extended metadata from a set of articles from the journal _Molecular Phylogeny and Evolution_ and used it to populate a entity-attribute-value database. A simple web interface to this database enables an enhanced display of the content of an article, including a map of localities mentioned either explicitly or implicitly (through links to geotagged data), taxonomic coverage, and both data and citation links. Metadata extraction was limited to information listed in tables in the articles (such as GenBank sequences and specimen codes)...

DBClear: A Generic System for Clearinghouses

Hellweg, Heiko; Hermes, Bernd; Stempfhuber, Maximilian; Enderle, W.; Fischer, T.
Fonte: euroCRIS; Kassel University Press Publicador: euroCRIS; Kassel University Press
Tipo: Conference Paper
Português
Relevância na Pesquisa
37.976436%
Presented at the CRIS2002 Conference in Kassel.-- 9 pages.-- Contains: Conference paper (PDF) and PPT presentation.; Clearinghouses – or subject gateways – are domain-specific collections of links to resources on the Internet. The links are described with metadata and structured according to a domain-specific subject hierarchy. Users access the information by searching in the metadata or by browsing the subject hierarchy.; The standards for metadata vary across existing clearinghouses and different technologies for storing and accessing the metadata are used. This makes it difficult to distribute the editorial or administrative work involved in maintaining a clearinghouse, or to exchange information with other systems.; DBClear is a generic, platform-independent clearinghouse system, whose metadata schema can be adapted to different standards. The data is stored in a relational database. It includes a workflow component to support distributed maintenance and automation modules for link checking and metadata extraction. The presentation of the clearinghouse on the Web can be modified to allow seamless integration into existing web sites.

Treatment of Semantic Heterogeneity using Metadata Extraction and Query Translation

Strötgen, Robert
Fonte: euroCRIS; Kassel University Press Publicador: euroCRIS; Kassel University Press
Tipo: Conference Paper
Português
Relevância na Pesquisa
58.050938%
Presented at the CRIS2002 Conference in Kassel.-- 9 pages.-- Contains: Conference paper (PDF) + PPT presentation.; The project CARMEN ("Content Analysis, Retrieval and Metadata: Effective Networking") aimed – among other goals – at improving the expansion of searches in bibliographic databases into Internet searches. We pursued a set of different approaches to the treatment of semantic heterogeneity (metadata extraction, query translation using statistic relations and cross-concordances). This paper describes the concepts and implementation of these approaches and the evaluation of the impact for the retrieval result.; The CARMEN Project was funded by the German Federal Ministry of Education and Research in the context of the programme “Global Info”, FKZ 08SFC08 3.