CAPLab

GPGPU

General Purpose Graphic Processing Unit

About

The graphics processing unit(GPU) has become an essential component in computing systems. As computing power and programmability in GPU are increased rapidly, it is used as an accelerator for the applications outside the domain of traditional computer graphics. Currently, there are two typical parallel computing frameworks for GPU: CUDA and OpenCL. Using these frames works, we can write GPU programs which describe the parallelisms in the applications. In GPU, there are a variety of memory with different characteristics. So, memory optimization should be considered in GPU programming to obtain high performance gain. If not, performance increase may be lower than the expected even if the application has high parallelism. Since CPU and GPU can execute in parallel, we can exploit task parallelism in the application and job scheduling between CPU and GPU is an import issue. In many GPU systems, GPU has a separate memory and data should be transferred between CPU and GPU memory. Since communication overhead is not negligible, this overhead should be hidden by the asynchronous communication. We have performed the research for GPGPU considering these optimization issues. Especially, we developed the GPU applications for the transcoder called "Umile Encoder" under the support by Seoul R&BD Program (JP090955).

Download

Coming soon

Publication

  • JinTaek Kang, Kwanghyun Chung, Youngmin Yi, Soonhoi Ha, "NNSim: Fast Performance Estimation based on Sampled Simulation of GPGPU Kernels for Neural Networks", Proceedings of the 55th Annual Design Automation Conference, Jun, 2018.
  • 고영섭, "GPU-in-the-loop simulation for CPU/GPU Heterogeneous Platform", 서울대학교, Feb, 2016.
  • Youngsub Ko, Youngmin Yi, Soonhoi Ha, "An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs", Journal of Real-Time Image Processing, Vol. 9, Issue 1, pp. 5–18, Mar, 2014.
  • Youngsub Ko, Youngmin Yi, Soonhoi Ha, "An Efficient Parallel Motion Estimation Algorithm and X264 Parallelization In CUDA", 2011 Conference on Design and Architectures for Signal and Image Processing, pp. 1-8, Nov, 2011.
  • Hanwoong Jung, Yongmin Yi, Soonhoi Ha, "Automatic CUDA Code Synthesis Framework for Multicore CPU and GPU architectures", Parallel Processing and Applied Mathematics 2011, Sep, 2011.
  • 한재근,하순회, 이영민, "트랜스코더의 성능향상을 위한 스케일링 필터의 CUDA 가속 및 연동", 2011년 제 23회 영상처리및이해에관한워크샵, Feb, 2011.
  • 한재근, 고영섭, 서성한, 하순회, "CUDA를 활용한 스케일링 필터 및 트랜스코더의 성능향상", 정보과학회논문지 :컴퓨팅의 실제 및 레터, Vol. 16, pp. 507-511, Apr, 2010.
  • 한재근, 고영섭, 서성한, 하순회, "CUDA를 활용한 스케일링 필터 및 트랜스코더의 성능향상", 한국정보과학회 추계학술발표회 가을학술발표논문집, Vol. 36, pp. 258-259, Nov, 2009.
  • Contributor

  • 하순회(Soonhoi Ha)
  • 이영민(Youngmin Yi)
  • 서성한(Sunghan Suh)
  • 고영섭(Youngsub Ko)
  • 정한웅(Hanwoong Jung)
  • 한재근(JaeGeun Han)
  • 이보영(Boyung Lee)