Página 1 dos resultados de 57 itens digitais encontrados em 0.003 segundos

Hardware-assisted visibility sorting for unstructured volume rendering

Callahan, Steven Paul; Ikits, Milan; Comba, Joao Luiz Dihl; Silva, Cláudio Teixeira
Fonte: Universidade Federal do Rio Grande do Sul Publicador: Universidade Federal do Rio Grande do Sul
Tipo: Artigo de Revista Científica Formato: application/pdf
Português
Relevância na Pesquisa
26.03%
Harvesting the power of modern graphics hardware to solve the complex problem of real-time rendering of large unstructured meshes is a major research goal in the volume visualization community. While, for regular grids, texture-based techniques are well-suited for current GPUs, the steps necessary for rendering unstructured meshes are not so easily mapped to current hardware. We propose a novel volume rendering technique that simplifies the CPU-based processing and shifts much of the sorting burden to the GPU, where it can be performed more efficiently. Our hardware-assisted visibility sorting algorithm is a hybrid technique that operates in both object-space and image-space In object-space, the algorithm performs a partial sort of the 3D primitives in preparation for rasterization. The goal of the partial sort is to create a list of primitives that generate fragments in nearly sorted order. In image-space, the fragment stream is incrementally sorted using a fixed-depth sorting network. In our algorithm, the object-space work is performed by the CPU and the fragment-level sorting is done completely on the GPU. A prototype implementation of the algorithm demonstrates that the fragment-level sorting achieves rendering rates of between one and six million tetrahedral cells per second on an ATI Radeon 9800.

Correção de referência de relógio para fluxo de transporte MPEG-2 em FPGA

Farias, Bruno Carvalho de
Fonte: Universidade Federal de Santa Catarina Publicador: Universidade Federal de Santa Catarina
Tipo: Dissertação de Mestrado Formato: 133 p.| il., grafs., tabs.
Português
Relevância na Pesquisa
26.35%
Dissertação (mestado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2014; O Fluxo de Transporte (Transport Stream - TS) MPEG-2 é um formato amplamente utilizado em sistemas de TV Digital para a transmissão de áudio, vídeo e informações relacionadas a programa. Entre outras informações, um fluxo de transporte carrega uma referência de tempo, conhecida como Referência de Relógio de Programa (Program Clock Reference - PCR), a qual é um retrato do relógio de 27 MHz do sistema. Esta informação permite a recuperação do relógio nos receptores, o qual garante a correta apresentação do conteúdo e até mesmo controla interfaces de saída. Porém, se o tempo de chegada dos pacotes de transporte variar durante a transmissão ou o processamento, tal cenário pode levar a erros no relógio do sistema, o que é conhecido como jitter. Os métodos tradicionais para a correção da informação do relógio de programa normalmente são baseadas em contadores/acumuladores de 27MHz com ponto flutuante, porém, não mitigam o jitter de PCR completamente. Métodos mais recentes usam contador/acumulador controlado por semáforo, e até mesmo propõem um esquema de adaptação de taxa integrada à correção da referência de relógio...

Novel Hybrid GPU–CPU Implementation of Parallelized Monte Carlo Parametric Expectation Maximization Estimation Method for Population Pharmacokinetic Data Analysis

Ng, C. M.
Fonte: Springer US Publicador: Springer US
Tipo: Artigo de Revista Científica
Publicado em 04/09/2013 Português
Relevância na Pesquisa
26.03%
The development of a population PK/PD model, an essential component for model-based drug development, is both time- and labor-intensive. A graphical-processing unit (GPU) computing technology has been proposed and used to accelerate many scientific computations. The objective of this study was to develop a hybrid GPU–CPU implementation of parallelized Monte Carlo parametric expectation maximization (MCPEM) estimation algorithm for population PK data analysis. A hybrid GPU–CPU implementation of the MCPEM algorithm (MCPEMGPU) and identical algorithm that is designed for the single CPU (MCPEMCPU) were developed using MATLAB in a single computer equipped with dual Xeon 6-Core E5690 CPU and a NVIDIA Tesla C2070 GPU parallel computing card that contained 448 stream processors. Two different PK models with rich/sparse sampling design schemes were used to simulate population data in assessing the performance of MCPEMCPU and MCPEMGPU. Results were analyzed by comparing the parameter estimation and model computation times. Speedup factor was used to assess the relative benefit of parallelized MCPEMGPU over MCPEMCPU in shortening model computation time. The MCPEMGPU consistently achieved shorter computation time than the MCPEMCPU and can offer more than 48-fold speedup using a single GPU card. The novel hybrid GPU–CPU implementation of parallelized MCPEM algorithm developed in this study holds a great promise in serving as the core for the next-generation of modeling software for population PK/PD analysis.

A Stream Algorithm for the SVD

Strumpen, Volker; Hoffmann, Henry; Agarwal, Anant
Fonte: MIT - Massachusetts Institute of Technology Publicador: MIT - Massachusetts Institute of Technology
Formato: 31 p.; 30567456 bytes; 1124918 bytes; application/postscript; application/pdf
Português
Relevância na Pesquisa
26.29%
We present a stream algorithm for the Singular-Value Decomposition (SVD) of anM X N matrix A. Our algorithm trades speed of numerical convergence for parallelism,and derives from a one-sided, cyclic-by-rows Hestenes SVD. Experimental results showthat we can create O(M) parallelism, at the expense of increasing the computationalwork by less than a factor of about 2. Our algorithm qualifes as a stream algorithmin that it requires no more than a small, bounded amount of local storage per processor and its compute efficiency approaches an optimal 100% asymptotically for largenumbers of processors and appropriate problem sizes.

Programmable Stream Processors

Kapasi, Ujval J.; Rixner, Scott; Dally, William J.; Khailany, Brucek; Ahn, Jung Ho; Mattson, Peter; Owens, John D.; Kapasi, Ujval J.; Rixner, Scott; Dally, William J.; Khailany, Brucek; Ahn, Jung Ho; Mattson, Peter; Owens, John D.
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
46.29%
Journal Paper; Stream processing promises to bridge the gap between inflexible special-purpose solutions and current programmable architectures that cannot meet the computational demands of media-processing applications.

Imagine: Media Processing with Streams

Khailany, Brucek; Dally, William J.; Kapasi, Ujval J.; Mattson, Peter; Namkoong, Jinyung; Owens, John D.; Towles, Brian; Chang, Andrew; Rixner, Scott; Khailany, Brucek; Dally, William J.; Kapasi, Ujval J.; Mattson, Peter; Namkoong, Jinyung; Owens, John D.
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
26.2%
Journal Paper; The Power-efficient Imagine stream processor achieves performance densities comparable to those of special-purpose embedded processors. Executing programs mapped to streams and kernels, a single Imagine processor is expected to have a peak performance of 20 Gflops and sustain 18.3 GOPS on MPEG-2 encoding.

Reconfigurable stream processors for wireless base-stations

Rajagopal, Sridhar; Rixner, Scott; Cavallaro, Joseph R.; Rajagopal, Sridhar; Rixner, Scott; Cavallaro, Joseph R.
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Relatório
Português
Relevância na Pesquisa
66.89%
Tech Report; This paper presents the design and use of reconfigurable stream processors for the physical layer processing in wireless base-stations. Stream processors, traditionally used for high performance media processing, use clusters of functional units to provide support for hundreds of functional units in a programmable architecture. We provide hardware support for reconfiguration in stream processors, enabling them to be power-efficient by adapting to the compute requirements of the application. We demonstrate the real-time implementation of a 32-user wireless base-station, employing multiuser channel estimation, multiuser detection and Viterbi decoding physical layer algorithms, supporting a data rate of 128 Kbps/user. The reconfigurable stream processor runs at 1.2 GHz and has an estimated power consumption of 12.38 W at full workload. However, basestations rarely operate at full capacity. When the base-station workload decreases, the reconfigurable stream processor adapts the number of clusters, functional units, voltage and frequency dynamically for power efficiency. When the application workload changes to 4 users, the reconfiguration support reduces the power to 300 mW at 433 MHz, providing a 41.27X decrease in power consumption. The cluster reconfiguration yields an additional 15-85% power savings over a stream processor with dynamic voltage and frequency scaling.

Improving power efficiency in stream processors through dynamic cluster reconfiguration

Rajagopal, Sridhar; Rixner, Scott; Cavallaro, Joseph R.; Rajagopal, Sridhar; Rixner, Scott; Cavallaro, Joseph R.
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Conference paper
Português
Relevância na Pesquisa
66.95%
Conference Paper; Stream processors support hundreds of functional units in a programmable architecture by clustering functional units and utilizing a bandwidth hierarchy. Clusters are the dominant source of power consumption in stream processors. When the data parallelism falls below the number of clusters, unutilized clusters can be turned off to save power. This paper improves power efficiency in stream processors by dynamically reconfiguring the number of clusters in a stream processor to match the time varying data parallelism of an application. We explore 3 mechanisms for dynamic reconfiguration: using memory, conditional streams and a multiplexer network. A 32-user wireless basestation is a prime example of a workload that benefits from such reconfiguration. When the number of users supported by the basestation dynamically changes from 32 to 4, the reconfiguration from a 32-cluster stream processor to a 4-cluster stream processor yields 15-85% power savings over and above a stream processor that uses conventional power saving techniques such as dynamic voltage and frequency scaling. The dynamic reconfiguration support extends stream processors from traditional high performance applications to power-sensitive applications in which the data parallelism varies dynamically and falls below the number of clusters.

Design space exploration for real-time embedded stream processors

Rajagopal, Sridhar; Cavallaro, Joseph R.; Rixner, Scott; Rajagopal, Sridhar; Cavallaro, Joseph R.; Rixner, Scott
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
66.78%
Journal Paper; We present a design framework for rapidly exploring the design space for stream processors in real-time embedded systems. Stream processors enable hundreds of arithmetic units in programmable pro-cessors by using clusters of functional units. However, to meet a certain real-time requirement for an embedded system, there is a trade-off between the number of arithmetic units in a cluster, number of clusters and the clock frequency as each solution meets real-time with a different power consumption. We have developed a design exploration tool that explores this trade-off and presents a heuristic that minimizes the power consumption in the (functional units, clusters, frequency) design space. Our design methodology relates the instruction level parallelism, subword parallelism and data parallelism to the organization of the functional units in an embedded stream processor. We show that the power minimization methodology also provides insights into the functional unit utilization of the processor. The design exploration tool exploits the static nature of signal processing workloads, providing an extremely fast design space exploration and provides an initial lower bound estimate of the real-time performance of the embedded processor. A sensitivity analysis of the design tool results to the technology and modeling also enables the designer to check the robustness of the design exploration.

Data-parallel digital signal processors: Algorithm mapping, architecture scaling and workload adaptation

Rajagopal, Sridhar
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Thesis; Text Formato: 174 p.; application/pdf
Português
Relevância na Pesquisa
36.97%
Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100's of arithmetic units in the DSP to meet real-time requirements. Data-parallel DSPs meet these requirements by employing clusters of functional units, enabling 100's of computations every clock cycle. These DSPs exploit instruction level parallelism and subword parallelism within clusters, similar to a traditional VLIW (Very Long Instruction Word) DSP, and exploit data parallelism across clusters, similar to vector processors. Stream processors are data-parallel DSPs that use a bandwidth hierarchy to support dataflow to 100's of arithmetic units and are used for evaluating the contributions of this thesis. Different software realizations of the dataflow in the algorithms can affect the performance of stream processors by greater than an order-of-magnitude. The thesis first presents the design of signal processing algorithms that map efficiently on stream processors by parallelizing the algorithms and by re-ordering the flow of data. The design space for stream processors also exhibits trade-offs between arithmetic units per cluster...

Data-parallel Digital Signal Processors: Algorithm Mapping, Architecture Scaling and Workload Adaptation

Rajagopal, Sridhar; Rajagopal, Sridhar
Fonte: Universidade Rice Publicador: Universidade Rice
Tipo: Thesis; Text; Text
Português
Relevância na Pesquisa
57.03%
PhD Thesis; Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100's of arithmetic units in the DSP to meet real-time requirements. Data-parallel DSPs meet these requirements by employing clusters of functional units, enabling 100's of computations every clock cycle. These DSPs exploit instruction level parallelism and subword parallelism within clusters, similar to a traditional VLIW (Very Long Instruction Word) DSP, and exploit data parallelism across clusters, similar to vector processors. Stream processors are data-parallel DSPs that use a bandwidth hierarchy to support dataflow to 100's of arithmetic units and are used for evaluating the contributions of this thesis. Different software realizations of the dataflow in the algorithms can affect the performance of stream processors by greater than an order-of-magnitude. The thesis first presents the design of signal processing algorithms that map efficiently on stream processors by parallelizing the algorithms and by re-ordering the flow of data. The design space for stream processors also exhibits trade-offs between arithmetic units per cluster...

Implementing lean material management in an extended value stream

Harper, Justin A., 1975-
Fonte: Massachusetts Institute of Technology Publicador: Massachusetts Institute of Technology
Tipo: Tese de Doutorado Formato: 101 p.
Português
Relevância na Pesquisa
26.29%
American Axle & Manufacturing, Inc. (AAM) is still in the process of transitioning to a culture of "lean manufacturing" as opposed to the current culture of "mass production". This thesis involved working with AAM employees and suppliers at various locations to understand how material flows between and within AAM's plants, the reasons for and analysis of the current state of material management, and opportunities for improvement. Attention is also given to the cultural and business context in which this work takes place, and the issues relating to efforts to implement change in large industrial organizations. This work produced two strategic-level products and one tactical-level product to improve lean material management at AAM described herein. Cultural observations are also provided. At the strategic level, one project focused upon making extended value stream maps of material flow between AAM plants and suppliers/processors. This information allows all decision-makers at AAM to objectively examine a common set of information, information which was previously unavailable to any one individual. Extended value stream mapping allowed supply chain inventory and lead time-reduction opportunities to be identified.; (cont.) The focus upon extended value streams increased awareness of the need to more fully account for costs in making part procurement decisions. Therefore...

Nephele Streaming: Stream Processing Under QoS Constraints At Scale

Lohrmann, Björn; Warneke, Daniel; Kao, Odej
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 05/08/2013 Português
Relevância na Pesquisa
36.35%
The ability to process large numbers of continuous data streams in a near-real-time fashion has become a crucial prerequisite for many scientific and industrial use cases in recent years. While the individual data streams are usually trivial to process, their aggregated data volumes easily exceed the scalability of traditional stream processing systems. At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of nodes. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today's parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a highly distributed scheme which allows these frameworks to detect violations of user-defined QoS constraints and optimize the job execution without manual interaction. As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For an example streaming application from the multimedia domain running on a cluster of 200 nodes...

Efficient pseudo-random number generators for biomolecular simulations on graphics processors

Zhmurov, A.; Rybnikov, K.; Kholodov, Y.; Barsegov, V.
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/03/2010 Português
Relevância na Pesquisa
26.03%
Langevin Dynamics, Monte Carlo, and all-atom Molecular Dynamics simulations in implicit solvent, widely used to access the microscopic transitions in biomolecules, require a reliable source of random numbers. Here we present the two main approaches for implementation of random number generators (RNGs) on a GPU, which enable one to generate random numbers on the fly. In the one-RNG-per-thread approach, inherent in CPU-based calculations, one RNG produces a stream of random numbers in each thread of execution, whereas the one-RNG-for-all-threads approach builds on the ability of different threads to communicate, thus, sharing random seeds across the entire GPU device. We exemplify the use of these approaches through the development of Ran2, Hybrid Taus, and Lagged Fibonacci algorithms fully implemented on the GPU. As an application-based test of randomness, we carry out LD simulations of N independent harmonic oscillators coupled to a stochastic thermostat. This model allows us to assess statistical quality of random numbers by comparing the simulation output with the exact results that would be obtained with truly random numbers. We also profile the performance of these generators in terms of the computational time, memory usage, and the speedup factor (CPU/GPU time).; Comment: 32 pages...

Resource Allocation for Multiple Concurrent In-Network Stream-Processing Applications

Benoit, Anne; Casanova, Henri; Rehn-Sonigo, Veronika; Robert, Yves
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 04/03/2009 Português
Relevância na Pesquisa
26.29%
This paper investigates the operator mapping problem for in-network stream-processing applications. In-network stream-processing amounts to applying one or more trees of operators in steady-state, to multiple data objects that are continuously updated at different locations in the network. The goal is to compute some final data at some desired rate. Different operator trees may share common subtrees. Therefore, it may be possible to reuse some intermediate results in different application trees. The first contribution of this work is to provide complexity results for different instances of the basic problem, as well as integer linear program formulations of various problem instances. The second second contribution is the design of several polynomial-time heuristics. One of the primary objectives of these heuristics is to reuse intermediate results shared by multiple applications. Our quantitative comparisons of these heuristics in simulation demonstrates the importance of choosing appropriate processors for operator mapping. It also allow us to identify a heuristic that achieves good results in practice.

Representations of Stream Processors Using Nested Fixed Points

Ghani, Neil; Hancock, Peter; Pattinson, Dirk
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Português
Relevância na Pesquisa
36.29%
We define representations of continuous functions on infinite streams of discrete values, both in the case of discrete-valued functions, and in the case of stream-valued functions. We define also an operation on the representations of two continuous functions between streams that yields a representation of their composite. In the case of discrete-valued functions, the representatives are well-founded (finite-path) trees of a certain kind. The underlying idea can be traced back to Brouwer's justification of bar-induction, or to Kreisel and Troelstra's elimination of choice-sequences. In the case of stream-valued functions, the representatives are non-wellfounded trees pieced together in a coinductive fashion from well-founded trees. The definition requires an alternating fixpoint construction of some ubiquity.

Spreadsheets for Stream Partitions and Windows

Hirzel, Martin; Rabbah, Rodric; Suter, Philippe; Tardieu, Olivier; Vaziri, Mandana
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 13/03/2015 Português
Relevância na Pesquisa
26.2%
We discuss the suitability of spreadsheet processors as tools for programming streaming systems. We argue that, while spreadsheets can function as powerful models for stream operators, their fundamental boundedness limits their scope of application. We propose two extensions to the spreadsheet model and argue their utility in the context of programming streaming systems.; Comment: In Proceedings of the 2nd Workshop on Software Engineering Methods in Spreadsheets (http://spreadsheetlab.org/sems15/)

Stream Processor Generator for HPC to Embedded Applications on FPGA-based System Platform

Sano, Kentaro; Suzuki, Hayato; Ito, Ryo; Ueno, Tomohiro; Yamamoto, Satoru
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 21/08/2014 Português
Relevância na Pesquisa
36.6%
This paper presents a stream processor generator, called SPGen, for FPGA-based system-on-chip platforms. In our research project, we use an FPGA as a common platform for applications ranging from HPC to embedded/robotics computing. Pipelining in application-specific stream processors brings FPGAs power-efficient and high-performance computing. However, poor productivity in developing custom pipelines prevents the reconfigurable platform from being widely and easily used. SPGen aims at assisting developers to design and implement high-throughput stream processors by generating their HDL codes with our domain-specific high-level stream processing description, called SPD.With an example of fluid dynamics computation, we validate SPD for describing a real application and verify SPGen for synthesis with a pipelined data-flow graph. We also demonstrate that SPGen allows us to easily explore a design space for finding better implementation than a hand-designed one.; Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423)

Stream Computing

Kak, Subhash
Fonte: Universidade Cornell Publicador: Universidade Cornell
Tipo: Artigo de Revista Científica
Publicado em 09/01/2008 Português
Relevância na Pesquisa
26.2%
Stream computing is the use of multiple autonomic and parallel modules together with integrative processors at a higher level of abstraction to embody "intelligent" processing. The biological basis of this computing is sketched and the matter of learning is examined.; Comment: 7 pages, 4 figures

Control-based Scheduling in a Distributed Stream Processing System

Khorlin, Andrey; Chandy, K. Mani
Fonte: IEEE Publicador: IEEE
Tipo: Book Section; PeerReviewed Formato: application/pdf
Publicado em //2006 Português
Relevância na Pesquisa
26.29%
Stream processing systems receive continuous streams of messages with raw information and produce streams of messages with processed information. The utility of a stream-processing system depends, in part, on the accuracy and timeliness of the output. Streams in complex event processing systems are processed on distributed systems; several steps are taken on different processors to process each incoming message, and messages may be enqueued between steps. This paper deals with the problems of distributed dynamic control of streams to optimize the total utility provided by the system. A challenge of distributed control is that timeliness of output depends only on the total end-toend time and is otherwise independent of the delays at each separate processor whereas the controller for each processor takes action to control only the steps on that processor and cannot directly control the entire network. This paper identifies key problems in distributed control and analyzes two scheduling algorithms that help in an initial analysis of a difficult problem.