Hosseiny Fatemi, Mohammad Reza (2012) Algorithm optimization and low cost bit-serial architecture design for integer-pixel and sub-pixel motion estimation in H.264/AVC / Mohammad Reza Hosseiny Fatemi. PhD thesis, University of Malaya.
Abstract
H.264/AVC employs variable block-size motion estimation (VBSME) with quarter-pixel accuracy, which significantly improves its coding performance. However, the higher coding performance comes at the price of huge computational complexity and memory bandwidth. Therefore, acceleration of the motion estimation (ME) in H.264/AVC with efficient algorithms and architectures is essential for real-time applications. This thesis is concerned with algorithm optimization and efficient low cost architecture design for integer motion estimation (IME) and sub-pixel motion estimation (SME) of H.264/AVC. Regarding the IME of H.264/AVC, we introduce two low cost bit-serial architectures, which are based on full search (FS) algorithm due to its regularity and coding performance. Both architectures benefit from sum of absolute differences (SAD) and data reusing techniques to reduce their memory bandwidth. The first design has a two-dimensional (2-D) structure featured with broadcasting of reference pixel data and propagating of partial sum and SAD results. The second design uses a 2-D bit-serial adder tree connected to a reconfigurable reference buffer making it suitable for hardware parallelism. To improve the overall performances of our designs, we propose several optimization techniques. By using a pixel truncation method and presenting a word length reduction technique, 68.75% of power consumption and the required time for processing of each search point are saved, where the latency, silicon area, and memory bandwidth are decreased as well. Besides, we employ 1/2-subsampling and mode reduction techniques to reduce the hardware cost further. In addition, a power saving method is contributed to decrease the power consumption of the proposed bit-serial reconfigurable reference buffer. Both designs can support VBSME of 720×480 resolution with 30 frames per second (fps), two reference frames and [-16, 15] search range at a clock frequency of 414 MHz with 29.28 K and 31.5 K gates, respectively. To address the computational complexity and memory bandwidth requirement problems of interpolate and search method in the SME of H.264/AVC, we introduce a low complexity algorithm and its hardware architecture for SME with quarter-pixel accuracy that is based on parabolic interpolation free algorithms. According to our analysis, the proposed algorithm reduces the computational budget by 94.35% and the memory access requirement by 98.5% in comparison to the standard interpolate and search method with an acceptable video quality. In addition, a fast version of the proposed algorithm is presented that reduces the computational budget 46.28% further while maintaining the video quality. For the hardware architecture design, we choose bit-serial structure for implementing our algorithm to benefit from its advantages. Moreover, we use SAD truncation, reusability, source sharing, and power saving techniques in our architecture, which lead to area saving and power consumption reduction. Furthermore, by using the mode reduction technique, 39% of the required time for processing of each macroblock (MB) is saved. Compared with previous designs, our architecture shows a better performance in terms of silicon area, throughput, latency, and memory bandwidth. Implementation results show that our design can support real-time HD1080 format with 20.3 K gates at the operation frequency of 88.3 MHz.
Actions (For repository staff only : Login required)