Os sistemas eletrônicos digitais estão sendo cada vez mais utilizados em aplicações de telecomunicações, processamento de voz, instrumentação, biomedicina e multimídia. A maioria dessas aplicações requer algum tipo de processamento de sinal, sendo que essa função normalmente é executada em grande parte por um bloco digital. Além disso, considerando-se os diversos tipos de circuitos existentes num sistema, tais como memórias RAM (Random Access Memory) e ROM (Read Only Memory), partes operativas e partes de controle complexas, é cada vez mais importante a preocupação com o teste desses sistemas complexos. O aumento da complexidade dos circuitos a serem testados exige também um aumento na complexidade dos circuitos testadores (teste externo), tornando estes últimos muito caros. Uma alternativa viável é integrar algumas ou todas as funções de teste no próprio chip a ser testado. Por outro lado, essa estratégia pode resultar em um custo proibitivo em termos de área em silício.É interessante observar, no entanto, que se os testes e a função de processamento de sinal não necessitarem ser executados em paralelo, então é possível utilizar uma única área reconfigurável para realizar essas funções de uma maneira sequencial. Logo...
Nos últimos anos muito tem se pesquisado na área de arquiteturas paralelas de computadores, devido ao fato da melhora de desempenho nas arquiteturas sequenciais não estar acompanhando as necessidades crescentes de capacidade de processamento. Entre as arquiteturas paralelas, um grupo que tem recebido especial atenção por parte dos pesquisadores é o de redes neurais. Uma rede neural é uma arquitetura baseada em paralelismo massivo, na interconexão de numerosos elementos simples de processamento segundo uma determinada topologia e com uma regra de aprendizagem. As redes neurais tem tido grande importância na área de reconhecimento de padrões e diversas aplicações em reconhecimento de caracteres, imagem e voz tem sido desenvolvidas. Outra área de aplicação das redes neurais é o processamento de sinais. A característica de adaptabilidade das redes neurais torna-as apropriadas à utilização em aplicações, onde as características do sinal, ou do meio, são variáveis ou não totalmente conhecidas, como filtros adaptativos. O objetivo deste trabalho é mostrar as aplicações de redes neurais nesta área. Na primeira parte do trabalho foram implementadas aplicações de redes neurais à filtragem utilizando diversas topologias e modelos de neurônios. Os modelos implementados são aqui apresentados juntamente com os resultados das simulações. A segunda parte do trabalho consiste na aplicação de um modelo de redes neurais a um problema bem específico...
A redução no tamanho dos programas tem sido um fator importante no projeto de sistemas embarcados modernos voltados à produção em larga escala. Este problema tem direcionado grandes esforços em projetos de processadores que se utilizam de um conjunto de instruções com formato de tamanho reduzido (ex. ARM Thumb e MIPS16) ou que sejam capazes de executarem códigos comprimidos (ex. CCRP, CodePack, etc). Muitos dos trabalhos publicados na literatura têm sido realizados para arquiteturas RISC. Este trabalho propôe um algoritmo de compressão de programas e uma máquina de descompressão para arquiteturas RISC e DSP. O algoritmo utiliza como símbolos para a compressão as árvores de expressão do programa. Resultados experimentais, baseados em programas do SPECInt95 executando em processador MIPS R4000, mostraram uma razão de compressão média, para os programas, de 27,2% e uma razão de compressão de 60,7% quando a área ocupada pela máquina de descompressão é considerada. Resultados experimentais para programas típicos de aplicações para DSPs, executando em um processador TMS320C25, mostraram uma razão de compressão média, para os programas, de 28% e de 75% quando a área da máquina de descompressão é considerada. As máquinas de descompressão foram sintetizadas usando-se bibliotecas standard cell da AMS...
Main concepts in DSP include filtering, averaging, modulating, and correlating the signals in digital form to estimate characteristic parameter of a signal into a desirable form. This paper presents a brief concept of low power datapath impact for Digital Signal Processing (DSP) based biomedical application. Systolic array based digital filter used in signal processing of electrocardiogram analysis is presented with datapath architectural innovations in low power consumption perspective. Implementation was done with ASIC design methodology using TSMC 65 nm technological library node. The proposed systolic array filter has reduced leakage power up to 8.5% than the existing filter architectures.
As redes de acesso usando cabos de cobre atualmente utilizam bandas de frequ?ncia at? 30 MHz, especificada no padr?o VDSL2. ? medida que arquiteturas h?bridas de fibra e cobre se tornam mais proeminentes na ind?stria e academia, torna-se poss?vel utilizar
cabos met?licos mais curtos (i.e. at? 250 metros) conectando o ?ltimo ponto de distribui??o aos usu?rios, de modo que frequ?ncias mais altas podem ser exploradas para se alcan?ar taxas de transmiss?o de dados de 500 Mbps ou mais, como ? o caso do padr?o G.fast
atualmente em desenvolvimento no ITU-T. Nesse trabalho, um simulador no dom?nio do tempo foi desenvolvido para avaliar a capacidade do sistema G.fast com diferentes tamanhos de extens?o c?clica e diferentes topologias de rede especificadas pelo ITU-T.
Os resultados das simula??es mostram que sistemas G.fast s?o robustos a bridged taps e capazes de atingir altas taxas de dados para todas as topologias simuladas, provendo suporte ? pr?xima gera??o de servi?os de banda larga. Al?m disso, esse trabalho descreve o progresso da implementa??o de um prot?tipo de modem baseado no padr?o G.fast em um ambiente h?brido de DSP multicore e FPGA utilizando kits de avalia??o adquiridos pela UFPA. Arquiteturas, protocolos de comunica??o e benchmarks s?o apresentados e avaliados para se chegar ? conclus?o de que tal prot?tipo ? fact?vel e fornece suporte flex?vel a v?rias linhas de pesquisa em banda larga da pr?xima gera??o.; ABSTRACT: The evolving broadband access systems using copper networks are currently deployed in
a frequency band that goes up to 30 MHz...
Conference Paper; Next-generation computing systems will be highly integrated using wireless networking. The Rice Everywhere NEtwork (RENÃ ) project is exploring the integration of WCDMA cellular systems, high speed wireless LANs, and home wireless networks to produce a seamless multitier network interface. We are currently developing a simulation acceleration testbed and a multitier network interface card (mNIC) consisting of DSP processors, custom VLSI ASICs, and FPGAs for baseband signal processing to interact with the various RF units and the host processor. This testbed will also allow us to explore high performance algorithm alternatives through computer aided design tools for rapid prototyping and hardware/software co-design of embedded systems.
Journal Paper; Next-generation wireless computing platforms will contain flexible communications capabilites. At Rice University, the Rice Everywhere NEtwork (RENE) project is investigating a multi-standard, multi-tier integration of W-CDMA cellular systems, high speed wireless LANs, and home wireless networks. There are many challenges in mapping these advanced communication algorithms to real-time hardware computing platforms. In this paper, we present current work on the development of a reconfigurable baseband physical layer containing DSP processors and FPGA accelerators. Our goal is the design of a multi-tier network interface card (mNIC) which is capable of exploiting eÂ±cient, low-power reconfiguration.
Conference Paper; This paper investigates detector architectures for wireless handsets employing DS-CDMA. The code-matched filter (MF) and minimum output energy (MOE) detectors are analyzed with respect to fixed-point arithmetic behavior. Architectures employing fixed-point arithmetic are then proposed for these detectors. The maximum throughput of these architectures and the associated costs in terms of area usage and power consumption are evaluated. Results of the fixed-point analysis indicate that the MOE detector is more susceptible to quantization than the MF detector. Results of implementation indicate that the superior performance of the MOE detector is achieved at a considerably higher cost in terms of area usage and power consumption. Finally, comparison of hardware implementation with software-based DSP implementation indicates that software approaches result in considerably lower throughputs.
Journal Paper; This paper presents alogrithms and architecture designs that can meet real-time requirements of multiuser channel estimation and detection in future wireless base-station receivers. Sophisticated algorithms proposed to implement multiuser channel estimation and detection make their real-time implementation difficult on current Digital Signal Processor (DSP)-based receivers. A maximum-likelihood based multiuser channel estimation scheme requiring matrix inversions is redesigned from an implementation perspective for a reduce complexity, iterative scheme with a simple fixed-point VLSI architecture. A reduced-complexity, bit-streaming multiuser detection algorithm that avoids the need for multishot detection is also developed for a simple, pipelined VLSI architecutre. Thus, we show that real-time solutions can be achieved for third generation wireless systems by (1) designing the alogrithms from a fixed-point implementation perspective, without significant loss in error rate performance, (2) task partitioning and (3) designing bit-streaming fixed-point VLSI architectures that explore pipelining, parallelism and bit-level computations to achieve real-time with minumum area overhead.
Conference Paper; A real-time VLSI architecture is designed for multiuser channel estimation, one of the core base-band processing operations in wireless base-station receivers. Future wireless basestation receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Current DSP architectures are unable to fully exploit the parallelism and bit level arithmetic present in these algorithms. These features can be revealed and efficiently implemented by task partitioning the algorithms for a VLSI solution. We modify the channel estimation algorithm for a reduced complexity fixed-point hardware implementation. We show the complexity and hardware required for three different area-time tradeoffs: an area-constrained, a time-constrained and an area-time efficient architecture. The area-constrained architecture achieves low data rates with minimum hardware, which may be used in picocell base-stations. The time-constrained solution exploits the entire available parallelism and determines the maximum theoretical data rates. The area-time efficient architecture meets real-time requirements with minimum area overhead. The orders-of-magnitude difference between area and time constrained solutions reveals significant inherent parallelism in the algorithm. All proposed VLSI solutions exhibit better time performance than a previous DSP implementation.
Masters Thesis; This thesis demonstrates designing efficient algorithms and architectures to meet the real-time requirements of future wireless base-station receivers. Next generation receivers require orders-of-magnitude performance improvements in order to provide support for features such as Multimedia, Quality-Of-Service and extremely high data rates. The sophisticated, compute-intensive algorithms proposed to integrate these features make their real-time implementation difficult on current DSP-based receivers. A real-time implementation can be achieved by (1.) making the algorithms computationally efficient, without significant loss in error rate performance, (2.) task partitioning, and (3.) designing hardware to exploit available pipelining, parallelism and bit-level computations. Multiuser Channel Estimation and Detection, two of the most compute-intensive baseband tasks in the receiver, are studied on DSPs for performance evaluation. A reduced complexity iterative channel estimation scheme for slow fading channels is proposed for a fixed point, area-time efficient and real-time VLSI architecture. The multiuser detection algorithm is modified for a simple, pipelined structure. A GPP or DSP based architecture with reconfigurable support suited for wireless communications is proposed and extensions are developed to accelerate the implementation of wireless communication algorithms.
Journal Paper; This paper presents a reduced-complexity, fixed-point algorithm and efficient real-time VLSI architectures for multiuser channel estimation, one of the core baseband processing operations in wireless base-station receivers for CDMA. Future wireless base-station receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Current DSP implementations of these algorithms are unable to meet real-time requirements. However, there exists massive parallelism and bit level arithmetic present in these algorithms than can be revealed and efficiently implemented in a VLSI architecture. We it re-design an existing channel estimation algorithm from an implementation perspective for a reduced complexity, fixed-point hardware implementation. Fixed point simulations are presented to evaluate the precision requirements of the algorithm. A dependence graph of the algorithm is presented and area-time trade-offs are developed. An area-constrained architecture achieves low data rates with minimum hardware, which may be used in pico-cell base-stations. A time-constrained solution exploits the entire available parallelism and determines the maximum theoretical data processing rates. An area-time efficient architecture meets real-time requirements with minimum area overhead.
Telecommunications and multimedia form a vast segment of the embedded systems market. Variations in standards coupled with the desire for software programmability often result in software based implementations executing on DSP cores. With the advent of data intensive media and communications workloads, computational demands of the DSP are ever increasing. Despite increases in clock rates, the computational demands of many wireless and multimedia video kernels far exceeds the available pipeline arithmetic and logic unit (ALU) resources of todays DSP devices.
This thesis presents a hardware/software co-design methodology for partitioning real-time embedded multimedia applications between software programmable DSPs and hardware based FPGA coprocessors. Using a strict set of guidelines, input applications are partitioned between software executing on a programmable DSP and hardware based FPGA implementation. This methodology is applied to channel estimation firmware in 3.5G wireless receivers, as well as software based H.263 video decoders. These heterogeneous systems are prototyped using a custom simulation environment created for these studies, which models bit true cycle accurate heterogeneous embedded architectures. By partitioning performance critical kernels from software on the DSP to FPGA based loosely coupled coprocessors...
The rapid evolution of wireless access is creating an ever changing variety of standards for indoor and outdoor environments. The real-time processing demands of wireless data rates in excess of 100 Mbps is a challenging problem for
architecture design and verification. In this paper, we consider current trends in VLSI architecture and in rapid prototyping testbeds to evaluate these systems. The key phases in multi-standard system design and prototyping
include: Algorithm Mapping to Parallel Architectures – based on the real-time data and sampling rate and the resulting area, time and power complexity; Configurable Mappings and Design Exploration – based on heterogeneous architectures consisting of DSP, programmable application-specific instruction (ASIP) processors, and co-processors; and Verification and Testbed Integration
– based on prototype implementation on programmable devices and integration with RF units.
Wireless communications and video kernels contain vast instruction and data level parallelism that can far outstrip programmable high performance DSPs. Hardware acceleration of these bottlenecks is commonly done at the cost of software flexibility. Many vendors, however, view software as intellectual
property and prefer a software solution that is a proprietary implementation. The paper uses a research compiler for architectural design space exploration to present comparisons between compiler generated scalable software programmable DSP architectures versus hardware acceleration implementations. It shows that scaled up compiler generated software programmable DSP architectures can be attractive alternatives to non-programmable hardware acceleration.
This paper presents a hardware/software co-design methodology for partitioning real-time embedded multimedia applications between software programmable DSPs and hardware based FPGA coprocessors. By following a strict set of guidelines, the input application is partitioned between software executing on a programmable DSP and hardware based FPGA implementation to alleviate computational bottlenecks in modern VLIW style DSP architectures used in embedded systems. This methodology is applied to channel estimation firmware in 3.5G wireless receivers, as well as software based H.263 video decoders. As much as an 11x improvement in runtime performance can be achieved by partitioning performance critical software kernels in these workloads into a hardware based FPGA implementation executing in tandem with the existing host DSP.
This paper presents a DSP/FPGA hardware/software partitioning methodology for signal processing workloads. The example workload is the channel equalization and user-detection in HSDPA wireless standard for 3.5G mobile handsets. Channel equalization and user-detection is a major component of receiver baseband processing and requires strict adherence to real time deadlines. By intelligently exploring the embedded design space, this paper presents a hardware/software system-on-chip partitionings that utilizes both DSP and FPGA based coprocessors to meet and exceed the real time data rates determined
by the HSDPA standard. Hardware and software partitioning strategies
are discussed with respect to real time processing deadlines, while an
SOC simulation toolset is presented as vehicle for prototyping embedded
In this paper we present system-on-a-chip extensions to the Spinach simulation environment for rapidly prototyping heterogeneous DSP/FPGA based architectures, specifically in the embedded domain. This infrastructure has been successfully
used to model systems varying from multiprocessor gigabit ethernet controllers to Texas Instruments C6x series DSP based systems with tightly coupled FPGA based coprocessors for computational offloading. As an illustrative example of this toolsets functionality, we investigate workload partitioning in heterogeneous
DSP/FPGA based embedded environments. Specifically, we focus on computational offloading of matrix multiplication kernels across DSP/FPGA based embedded architectures.
PhD Thesis; Emerging applications such as high definition television (HDTV), streaming video, image processing in embedded applications and signal processing in high-speed wireless communications are driving a need for high performance digital signal processors (DSPs) with real-time processing. This class of applications demonstrates significant data parallelism, finite precision, need for power-efficiency and the need for 100's of arithmetic units in the DSP to meet real-time requirements. Data-parallel DSPs meet these requirements by employing clusters of functional units, enabling 100's of computations every clock cycle. These DSPs exploit instruction level parallelism and subword parallelism within clusters, similar to a traditional VLIW (Very Long Instruction Word) DSP, and exploit data parallelism across clusters, similar to vector processors. Stream processors are data-parallel DSPs that use a bandwidth hierarchy to support dataflow to 100's of arithmetic units and are used for evaluating the contributions of this thesis. Different software realizations of the dataflow in the algorithms can affect the performance of stream processors by greater than an order-of-magnitude. The thesis first presents the design of signal processing algorithms that map efficiently on stream processors by parallelizing the algorithms and by re-ordering the flow of data. The design space for stream processors also exhibits trade-offs between arithmetic units per cluster...