An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

0upvotes

By: Bahareh Khabbazan, Marc Riera, Antonio González

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall problem without much success, and sometimes even worsening the issue since more compute units also require higher memory bandwidth. Prior works have proposed the design of memory-centric architectures based on ... more

Hardware ArchitectureOctober 30, 2023 12:18pm

Comments (0)
Views (349)

Design Space Exploration of Sparsity-Aware Application-Specific Spiking Neural Network Accelerators

0upvotes

By: Ilkin Aliyev. Kama Svoboda, Tosiron Adegbija

Spiking Neural Networks (SNNs) offer a promising alternative to Artificial Neural Networks (ANNs) for deep learning applications, particularly in resource-constrained systems. This is largely due to their inherent sparsity, influenced by factors such as the input dataset, the length of the spike train, and the network topology. While a few prior works have demonstrated the advantages of incorporating sparsity into the hardware design, espec... more

Hardware ArchitectureOctober 26, 2023 5:52am

Comments (0)
Views (331)

Analytical Die-to-Die 3D Placement with Bistratal Wirelength Model and GPU Acceleration

0upvotes

By: Peiyu Liao, Yuxuan Zhao, Dawei Guo, Yibo Lin, Bei Yu

In this paper, we present a new analytical 3D placement framework with a bistratal wirelength model for F2F-bonded 3D ICs with heterogeneous technology nodes based on the electrostatic-based density model. The proposed framework, enabled GPU-acceleration, is capable of efficiently determining node partitioning and locations simultaneously, leveraging the dedicated 3D wirelength model and density model. The experimental results on ICCAD 2022... more

Hardware ArchitectureOctober 12, 2023 5:13am

Comments (0)
Views (353)

MEDUSA: Scalable Biometric Sensing in the Wild through Distributed MIMO Radars

0upvotes

By: Yilong Li, Ramanujan K Sheshadri, Karthik Sundaresan, Eugene Chai, Suman Banerjee

Radar-based techniques for detecting vital signs have shown promise for continuous contactless vital sign sensing and healthcare applications. However, real-world indoor environments face significant challenges for existing vital sign monitoring systems. These include signal blockage in non-line-of-sight (NLOS) situations, movement of human subjects, and alterations in location and orientation. Additionally, these existing systems failed to... more

Hardware ArchitectureOctober 10, 2023 4:07pm

Comments (0)
Views (366)

Victima: Drastically Increasing Address Translation Reach by Leveraging Underutilized Cache Resources

0upvotes

By: Konstantinos Kanellopoulos, Hong Chul Nam, F. Nisa Bostanci, Rahul Bera, Mohammad Sadrosadati, Rakesh Kumar, Davide-Basilio Bartolini, Onur Mutlu

Address translation is a performance bottleneck in data-intensive workloads due to large datasets and irregular access patterns that lead to frequent high-latency page table walks (PTWs). PTWs can be reduced by using (i) large hardware TLBs or (ii) large software-managed TLBs. Unfortunately, both solutions have significant drawbacks: increased access latency, power and area (for hardware TLBs), and costly memory accesses, the need for large... more

Hardware ArchitectureOctober 9, 2023 9:27am

Comments (0)
Views (335)

Swordfish: A Framework for Evaluating Deep Neural Network-based Basecalling using Computation-In-Memory with Non-Ideal Memristors

0upvotes

By: Taha Shahroodi, Gagandeep Singh, Mahdi Zahedi, Haiyu Mao, Joel Lindegger, Can Firtina, Stephan Wong, Onur Mutlu, Said Hamdioui

Basecalling, an essential step in many genome analysis studies, relies on large Deep Neural Networks (DNNs) to achieve high accuracy. Unfortunately, these DNNs are computationally slow and inefficient, leading to considerable delays and resource constraints in the sequence analysis process. A Computation-In-Memory (CIM) architecture using memristors can significantly accelerate the performance of DNNs. However, inherent device non-idealitie... more

Hardware ArchitectureOctober 9, 2023 8:38am

Comments (0)
Views (372)

Co-Optimizing Cache Partitioning and Multi-Core Task Scheduling: Exploit Cache Sensitivity or Not?

0upvotes

By: Binqi Sun, Debayan Roy, Tomasz Kloda, Andrea Bastoni, Rodolfo Pellizzoni, Marco Caccamo

Cache partitioning techniques have been successfully adopted to mitigate interference among concurrently executing real-time tasks on multi-core processors. Considering that the execution time of a cache-sensitive task strongly depends on the cache available for it to use, co-optimizing cache partitioning and task allocation improves the system's schedulability. In this paper, we propose a hybrid multi-layer design space exploration techniq... more

Hardware ArchitectureOctober 5, 2023 8:42am

Comments (0)
Views (344)

SimplePIM: A Software Framework for Productive and Efficient Processing-in-Memory

0upvotes

By: Jinfan Chen, Juan Gómez-Luna, Izzat El Hajj, Yuxin Guo, Onur Mutlu

Data movement between memory and processors is a major bottleneck in modern computing systems. The processing-in-memory (PIM) paradigm aims to alleviate this bottleneck by performing computation inside memory chips. Real PIM hardware (e.g., the UPMEM system) is now available and has demonstrated potential in many applications. However, programming such real PIM hardware remains a challenge for many programmers. This paper presents a new s... more

Hardware ArchitectureOctober 4, 2023 5:27am

Comments (0)
Views (318)

Trikarenos: A Fault-Tolerant RISC-V-based Microcontroller for CubeSats in 28nm

0upvotes

By: Michael Rogenmoser, Luca Benini

One of the key challenges when operating microcontrollers in harsh environments such as space is radiation-induced Single Event Upsets (SEUs), which can lead to errors in computation. Common countermeasures rely on proprietary radiation-hardened technologies, low density technologies, or extensive replication, leading to high costs and low performance and efficiency. To combat this, we present Trikarenos, a fault-tolerant 32-bit RISC-V micr... more

Hardware ArchitectureOctober 4, 2023 5:25am

Comments (0)
Views (325)

GateSeeder: Near-memory CPU-FPGA Acceleration of Short and Long Read Mapping

0upvotes

By: Julien Eudine, Mohammed Alser, Gagandeep Singh, Can Alkan, Onur Mutlu

Motivation: Read mapping is a computationally expensive process and a major bottleneck in genomics analyses. The performance of read mapping is mainly limited by the performance of three key computational steps: Index Querying, Seed Chaining, and Sequence Alignment. The first step is dominated by how fast and frequent it accesses the main memory (i.e., memory-bound), while the latter two steps are dominated by how fast the CPU can compute t... more

Hardware ArchitectureOctober 3, 2023 2:33pm