Parallel_AWT

Parallel 2D Multi-level Adaptive Wavelet Transformation

Maggie Gong & Youheng Yang

URL:

https://hakuz-y.github.io/Parallel_AWT/proposal/

Summary:

This project aims to implement a parallel version of the Adaptive Wavelet Transformation inspired by the paper with both shared memory model and MPI. AWT dynamically selects wavelet bases tailored to local image features, aiming to improve compression and rendering performance over traditional fixed-basis wavelet transforms. The primary objective is to enhance computational performance while preserving scalability and output quality. This involves addressing the unique challenges associated with parallelizing adaptive algorithms, such as load imbalance, data dependencies, and irregular memory access patterns.

Background:

Wavelet transforms are used for analyzing data where features vary over different scales in signal and image processing for tasks such as compression, denoising, and multi-resolution analysis. Real-world signals often have smooth regions interrupted by abrupt changes, so rapidly decaying wave-like oscillation are being used to represent such data.

To make it more interesting in parallelizing, Adaptive Wavelet Transform (AWT) dynamically selects wavelet basis functions based on local image characteristics, such as texture, edges, or smoothness. This adaptivity increases representation efficiency but introduces significant workload imbalance and data dependencies:

Adaptive Wavelet Transform Overview:

In addition to parallelism, we also plan to explore various strategies for memory access, including in-place and out-of-place implementations; with the former one having less memory overhead but higher complexity and the latter one having more straightforward parallelization but causing extra memory costs.

The Challenge:

Workload Imbalance:

Data Dependencies:

Memory access characteristics

Data Movements:

Resources:

Adaptive Wavelet Transformation
Adaptive Wavelet Rendering

It would be really helpful if we could have access of PSC machines and test beyond 8 threads.

Goals and Deliverables:

Plan to achieve:

  1. Develop a fully functional sequential implementation of AWT application (Image Compression/Rendering).
  2. Parallelize the application using a shared memory model (OpenMP) and distributed memory model (MPI) in C++.
  3. Profile and analyze the performances of the application, targeting a 4 to 5x speedup with 8 threads.
  4. Establish benchmarks to evaluate speedup and scalability across different architectures and input sizes.
  5. Document our approach, design decisions, performance findings, and overall progress in detail.

Hope to achieve:

  1. Achieve an ideal speedup (7 to 8x with 8 threads).
  2. If time permits and progress goes smoothly, we want to explore transactional memory and lock-free techniques for synchronization, comparing their impact on performance and scalability with that of traditional locking-based implementations.

We hope by the end of this project, we develop a deeper understanding of the challenges in parallelizing a highly complex sequential algorithm , as well as the key roles that memory access pattern have in affecting performances of a parallel program.

Platform Choice:

Schedule: