Página 1 dos resultados de 1063 itens digitais encontrados em 0.040 segundos

Critérios de seleção de sistemas de gerenciamento de banco de dados não relacionais em organizações privadas; Selection criteria of non-relational database management systems data in private organizations

Souza, Alexandre Morais de
Fonte: Biblioteca Digitais de Teses e Dissertações da USP Publicador: Biblioteca Digitais de Teses e Dissertações da USP
Tipo: Dissertação de Mestrado Formato: application/pdf
Publicado em 31/10/2013 Português
Relevância na Pesquisa
470.1173%
Sistemas de Gerenciamento de Banco de Dados Não Relacionais (SGBDs NoSQL) são pacotes de software para gerenciamento de dados utilizando um modelo não relacional. Dado o atual contexto de crescimento na geração de dados e a necessidade que as organizações possuem em coletar grande quantidade de informações de clientes, pesquisas científicas, vendas e outras informações para análises futuras, é importante repensar a forma de se definir um SGBD adequado levando em consideração fatores econômicos, técnicos e estratégicos da organização. Esta é uma pesquisa relacionada com o estudo do novo modelo de gerenciamento de banco de dados, conhecido como NoSQL e traz como contribuição apresentar critérios de seleção para auxiliar consumidores de serviços de banco de dados, em organizações privadas, a selecionar um SGBD NoSQL. Para atender a este objetivo foi realizada revisão da literatura com levantamento bibliográfico sobre processo de seleção de software e de SGBDs, levantando critérios utilizados para este fim. Feito o levantamento bibliográfico, definiu-se o método de pesquisa como sendo a aplicação de um Painel Delphi, na modalidade ranking form. Por meio do painel foi possível determinar, após a realização de duas rodadas e participando um grupo de especialistas misto formado por gerentes...

The Challenge of Big Data in Public Health: An Opportunity for Visual Analytics

Ola, Oluwakemi; Sedig, Kamran
Fonte: University of Illinois at Chicago Library Publicador: University of Illinois at Chicago Library
Tipo: Artigo de Revista Científica
Publicado em 05/02/2014 Português
Relevância na Pesquisa
482.1861%
Public health (PH) data can generally be characterized as big data. The efficient and effective use of this data determines the extent to which PH stakeholders can sufficiently address societal health concerns as they engage in a variety of work activities. As stakeholders interact with data, they engage in various cognitive activities such as analytical reasoning, decision-making, interpreting, and problem solving. Performing these activities with big data is a challenge for the unaided mind as stakeholders encounter obstacles relating to the data’s volume, variety, velocity, and veracity. Such being the case, computer-based information tools are needed to support PH stakeholders. Unfortunately, while existing computational tools are beneficial in addressing certain work activities, they fall short in supporting cognitive activities that involve working with large, heterogeneous, and complex bodies of data. This paper presents visual analytics (VA) tools, a nascent category of computational tools that integrate data analytics with interactive visualizations, to facilitate the performance of cognitive activities involving big data. Historically, PH has lagged behind other sectors in embracing new computational technology. In this paper...

Big Data Analytics in Immunology: A Knowledge-Based Approach

Zhang, Guang Lan; Sun, Jing; Chitkushev, Lou; Brusic, Vladimir
Fonte: Hindawi Publishing Corporation Publicador: Hindawi Publishing Corporation
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
471.27105%
With the vast amount of immunological data available, immunology research is entering the big data era. These data vary in granularity, quality, and complexity and are stored in various formats, including publications, technical reports, and databases. The challenge is to make the transition from data to actionable knowledge and wisdom and bridge the knowledge gap and application gap. We report a knowledge-based approach based on a framework called KB-builder that facilitates data mining by enabling fast development and deployment of web-accessible immunological data knowledge warehouses. Immunological knowledge discovery relies heavily on both the availability of accurate, up-to-date, and well-organized data and the proper analytics tools. We propose the use of knowledge-based approaches by developing knowledgebases combining well-annotated data with specialized analytical tools and integrating them into analytical workflow. A set of well-defined workflow types with rich summarization and visualization capacity facilitates the transformation from data to critical information and knowledge. By using KB-builder, we enabled streamlining of normally time-consuming processes of database development. The knowledgebases built using KB-builder will speed up rational vaccine design by providing accurate and well-annotated data coupled with tailored computational analysis tools and workflow.

Analyzing Big Data with the Hybrid Interval Regression Methods

Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying
Fonte: Hindawi Publishing Corporation Publicador: Hindawi Publishing Corporation
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
480.2018%
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.

parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics

Miroshnikov, Alexey; Conlon, Erin M.
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 26/09/2014 Português
Relevância na Pesquisa
479.3437%
Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.

pvsR: An Open Source Interface to Big Data on the American Political Sphere

Matter, Ulrich; Stutzer, Alois
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 01/07/2015 Português
Relevância na Pesquisa
483.3858%
Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS’ data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS’ new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition...

“Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online: Jetstream – a national science & engineering cloud"

Stewart, Craig A.
Fonte: Universidade de Indiana Publicador: Universidade de Indiana
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
480.10836%
The presentation describes Jetstream and its relevance to Big Data.; This research was supported in part by the National Science Foundation through Award ACI-1445604. This research was supported in part by the Indiana University Pervasive Technology Institute, which was established with the assistance of a major award from the Lilly Endowment, Inc.

Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

Gao, Xiaoming
Fonte: [Bloomington, Ind.] : Indiana University Publicador: [Bloomington, Ind.] : Indiana University
Tipo: Doctoral Dissertation
Português
Relevância na Pesquisa
479.50395%
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015; As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue...

Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online

Stewart, Craig A.
Fonte: Indiana University Publicador: Indiana University
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
480.10836%
The presentation describes Jetstream and its role in Big Data.; This research was supported in part by the National Science Foundation through Award ACI-1445604. This research was supported in part by the Indiana University Pervasive Technology Institute, which was established with the assistance of a major award from the Lilly Endowment, Inc.

Simulation Experiments: Better data, not just big data

Sanchez, Susan M.
Fonte: Escola de Pós-Graduação Naval Publicador: Escola de Pós-Graduação Naval
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
477.89152%
Data mining tools have been around for several decades, but the term “big data” has only recently captured widespread attention. Numerous success stories have been promulgated as organizations have sifted through massive volumes of data to find interesting patterns that are, in turn, transformed into actionable information. Yet a key drawback to the big data paradigm is that it relies on observational data—limiting the types of insights that can be gained. The simulation world is different. A “data farming” metaphor captures the notion of purposeful data generation from simulation models. Large-scale designed experiments let us grow the simulation output efficiently and effectively. We can explore massive input spaces, uncover interesting features of complex simulation response surfaces, and explicitly identify cause-and-effect relationships. With this new mindset, we can achieve quantum leaps in the breadth, depth, and timeliness of the insights yielded by simulation models.; This work was supported in part by the Naval Postgraduate School’s Acquisition Research Program and U.S. Marine Corps Expeditionary Energy Office.

A Scalable Machine Learning Online Service for Big Data Real-Time Analysis

Baldominos Gómez, Alejandro; Albacete García, Esperanza; Saez Achaerandio, Yago; Isasi, Pedro
Fonte: Ieee - The Institute Of Electrical And Electronics Engineers, Inc Publicador: Ieee - The Institute Of Electrical And Electronics Engineers, Inc
Tipo: info:eu-repo/semantics/acceptedVersion; info:eu-repo/semantics/bookPart; info:eu-repo/semantics/conferenceObject
Publicado em /12/2014 Português
Relevância na Pesquisa
472.51957%
This work describes a proposal for developing and testing a scalable machine learning architecture able to provide real-time predictions or analytics as a service over domain-independent big data, working on top of the Hadoop ecosystem and providing real-time analytics as a service through a RESTful API. Systems implementing this architecture could provide companies with on-demand tools facilitating the tasks of storing, analyzing, understanding and reacting to their data, either in batch or stream fashion; and could turn into a valuable asset for improving the business performance and be a key market differentiator in this fast pace environment. In order to validate the proposed architecture, two systems are developed, each one providing classical machine-learning services in different domains: the first one involves a recommender system for web advertising, while the second consists in a prediction system which learns from gamers' behavior and tries to predict future events such as purchases or churning. An evaluation is carried out on these systems, and results show how both services are able to provide fast responses even when a number of concurrent requests are made, and in the particular case of the second system, results clearly prove that computed predictions significantly outperform those obtained if random guess was used.; This research work is part of Memento Data Analysis project...

Big Data and HPC: Exploring Role of Research Data Alliance (RDA), a Report On Supercomputing 2013 Birds of a Feather

Plale, Beth
Fonte: Universidade de Indiana Publicador: Universidade de Indiana
Tipo: Relatório
Português
Relevância na Pesquisa
474.0501%
The ubiquity of today's data is not just transforming what is, it is transforming what will be laying the groundwork to drive new innovation. Today, research questions are addressed by complex models, by large data analysis tasks, and by sophisticated data visualization techniques, all requiring data. To address the growing global need for data infrastructure, the Research Data Alliance (RDA) was launched in FY13 as an international community-driven organization. We propose to bring together members of RDA with the HPC community to create a shared conversation around the utility of RDA for data-driven challenges in HPC.

A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science

Faghmous, James H.; Kumar, Vipin
Fonte: Mary Ann Liebert, Inc. Publicador: Mary Ann Liebert, Inc.
Tipo: Artigo de Revista Científica
Publicado em 01/09/2014 Português
Relevância na Pesquisa
484.6296%
Global climate change and its impact on human life has become one of our era's greatest challenges. Despite the urgency, data science has had little impact on furthering our understanding of our planet in spite of the abundance of climate data. This is a stark contrast from other fields such as advertising or electronic commerce where big data has been a great success story. This discrepancy stems from the complex nature of climate data as well as the scientific questions climate science brings forth. This article introduces a data science audience to the challenges and opportunities to mine large climate datasets, with an emphasis on the nuanced difference between mining climate data and traditional big data approaches. We focus on data, methods, and application challenges that must be addressed in order for big data to fulfill their promise with regard to climate science applications. More importantly, we highlight research showing that solely relying on traditional big data techniques results in dubious findings, and we instead propose a theory-guided data science paradigm that uses scientific theory to constrain both the big data techniques as well as the results-interpretation process to extract accurate insight from large climate data.

Distilling Big Data: Refining Quality Information in the Era of Yottabytes

Ramachandramurthy, Sivaraman; Subramaniam, Srinivasan; Ramasamy, Chandrasekeran
Fonte: Hindawi Publishing Corporation Publicador: Hindawi Publishing Corporation
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
482.05754%
Big Data is the buzzword of the modern century. With the invasion of pervasive computing, we live in a data centric environment, where we always leave a track of data related to our day to day activities. Be it a visit to a shopping mall or hospital or surfing Internet, we create voluminous data related to credit card transactions, user details, location information, and so on. These trails of data simply define an individual and form the backbone for user-profiling. With the mobile phones and their easy access to online social networks on the go, sensor data such as geo-taggings and events and sentiments around them contribute to the already overwhelming data containers. With reductions in the cost of storage and computational devices and with increasing proliferation of Cloud, we never felt any constraints in storing or processing such data. Eventually we end up having several exabytes of data and analysing them for their usefulness has introduced new frontiers of research. Effective distillation of these data is the need of the hour to improve the veracity of the Big Data. This research targets the utilization of the Fuzzy Bayesian process model to improve the quality of information in Big Data.

Development of Self-Compressing BLSOM for Comprehensive Analysis of Big Sequence Data

Kikuchi, Akihito; Ikemura, Toshimichi; Abe, Takashi
Fonte: Hindawi Publishing Corporation Publicador: Hindawi Publishing Corporation
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
479.11355%
With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy...

A DNA-Based Semantic Fusion Model for Remote Sensing Data

Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H.
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 08/10/2013 Português
Relevância na Pesquisa
572.1092%
Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover...

A tomada de decisão no contexto do Big Data : estudo de caso único.

Canary, Vivian Passos
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Trabalho de Conclusão de Curso Formato: application/pdf
Português
Relevância na Pesquisa
482.73094%
A competição entre marcas está cada vez mais acirrada, exigindo que as empresas tomem decisões rápidas para criarem um diferencial competitivo frente aos concorrentes (BARTON e COURT, 2012). A fim de minimizar os riscos resultantes de uma tomada de decisão inadequada, os gestores deverão embasá-la com informações relevantes e seguras. O crescimento exponencial no volume de dados gerados em função dos avanços tecnológicos e da mudança de comportamento dos consumidores garantirá às organizações informações suficientes para isso, de forma rápida. Esse fenômeno é chamado de Big Data. No entanto, os gestores serão responsáveis por coletar, filtrar, tratar e analisar as informações que lhes forem úteis, aproveitando-se para gerar vantagem competitiva para os seus negócios. O objetivo da pesquisa é verificar o efeito dos fatores “5 V’s” (volume, variedade, velocidade, valor e veracidade) do Big Data no processo de tomada de decisão de executivos de diferentes níveis hierárquicos em um Sistema de Crédito Cooperativo. Para atingi-lo, foi utilizado o método de estudo de caso único. Como contribuição desta pesquisa estão: explorar o tema do Big Data de forma teórica e aliá-lo ao processo de tomada de decisão praticado de uma organização.; The competition among brands is continuously increasing...

IoT Big-Data Centred Knowledge Granule Analytic and Cluster Framework for BI Applications: A Case Base Analysis

Chang, Hsien-Tsung; Mishra, Nilamadhab; Lin, Chung-Chih
Fonte: Public Library of Science Publicador: Public Library of Science
Tipo: Artigo de Revista Científica
Publicado em 24/11/2015 Português
Relevância na Pesquisa
477.68223%
The current rapid growth of Internet of Things (IoT) in various commercial and non-commercial sectors has led to the deposition of large-scale IoT data, of which the time-critical analytic and clustering of knowledge granules represent highly thought-provoking application possibilities. The objective of the present work is to inspect the structural analysis and clustering of complex knowledge granules in an IoT big-data environment. In this work, we propose a knowledge granule analytic and clustering (KGAC) framework that explores and assembles knowledge granules from IoT big-data arrays for a business intelligence (BI) application. Our work implements neuro-fuzzy analytic architecture rather than a standard fuzzified approach to discover the complex knowledge granules. Furthermore, we implement an enhanced knowledge granule clustering (e-KGC) mechanism that is more elastic than previous techniques when assembling the tactical and explicit complex knowledge granules from IoT big-data arrays. The analysis and discussion presented here show that the proposed framework and mechanism can be implemented to extract knowledge granules from an IoT big-data array in such a way as to present knowledge of strategic value to executives and enable knowledge users to perform further BI actions.

Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges

Li, S.; Dragicevic, S.; Anton, F.; Sester, M.; Winter, S.; Coltekin, A.; Pettit, C.; Jiang, B.; Haworth, J.; Stein, A.; Cheng, T.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 10/11/2015 Português
Relevância na Pesquisa
482.39855%
Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing, analyzing, visualizing and verifying the quality of data. This has implications for the quality of decisions made with big data. Consequently, this position paper of the International Society for Photogrammetry and Remote Sensing (ISPRS) Technical Commission II (TC II) revisits the existing geospatial data handling methods and theories to determine if they are still capable of handling emerging geospatial big data. Further, the paper synthesises problems, major issues and challenges with current developments as well as recommending what needs to be developed further in the near future. Keywords: Big data, Geospatial, Data handling, Analytics, Spatial Modeling, Review; Comment: 25 pages, 3 figures

IU Research Technologies and the Research Data Alliance

Quick, Rob
Fonte: Indiana University Publicador: Indiana University
Tipo: Conferência ou Objeto de Conferência
Português
Relevância na Pesquisa
473.77055%
The Research Data Alliance (RDA) builds the social and technical bridges that enable open sharing of data.â Now in its third year, the Research Data Alliance has grown to over 3000 members from more than 100 countries worldwide. The RDA envisions open sharing of data across borders, technologies, and disciplines. With IU officially joining as a member organization in July of 2015, it is a good time to review the activity from the RDA with the researchers and IT professionals at IU that might benefit from the results and outputs. The presentation discusses the RDA organization structure, IU' s role as a member organization, and the current activities and outputs from the RDA working groups. This allows the technical and professional staff at IU to offer feedback on what important Big Data issues they face and how these issues might lead to discussion and action within RDA.