UNIST(총장 박종래) 인공지능대학원 유재준 교수팀이 4일 개최된 세계적인 컴퓨터 비전 학회 ECCV(European Conference on Computer Vision) 2024에서 3편의 논문을 발표하며, AI 경량화에서 디자인 자동화까지 AI 기술의 미래를 제시했다.
▲(From left) AI lightweight algorithm research team including Professor Yoo Jae-jun, Researcher Jang Yu-jin, and Researcher Yeo Sang-yeop
UNIST Professor Jaejun Yoo's team announces lightweight, video, and design innovations
A technology has been developed that can maintain high performance even when compressing generative AI by 323 times, suggesting the possibility of efficiently using AI on edge devices and low-power computers.
UNIST (President Jong-Rae Park) announced that Professor Jae-Joon Yoo's team from the Graduate School of Artificial Intelligence presented three papers at the European Conference on Computer Vision (ECCV) 2024, a world-renowned computer vision conference held on the 4th, presenting the future of AI technology, from AI lightweighting to design automation.
Professor Yoo Jae-jun's team succeeded in reducing the weight of GAN (Generative Adversarial Networks), an image generation AI, by up to 323 times without any performance degradation.
By utilizing the knowledge distillation technique, it was suggested that AI can be used efficiently even on edge devices or low-power computers that do not have high-performance computers.
The research team introduced the DiME and NICKEL techniques to increase stability by comparing distributions rather than comparing each image. For example, if a teacher model creates an image of Kim Tae-hee, learning is possible even if a student model creates an image of Song Hye-kyo or Jun Ji-hyun.
The NICKEL technique optimizes the interaction between the generator and the classifier, helping to maintain high performance even in lightweight models. As a result of combining the two techniques, even a 323x compressed GAN model was able to generate images of the same high quality as the original.
“We have proven that a 323x compressed GAN can produce images of the same high quality as existing ones,” said Professor Yoo. “This opens the way for high-performance AI to be used in edge computing and low-power devices.”
“It will significantly expand the scope of AI utilization by opening up the possibility of implementing high-performance AI even with limited resources,” explained first author researcher Yeo Sang-yeop.
In addition, Professor Yoo Jae-Joon's team developed a hybrid video generation model (HVDM) that can efficiently generate high-resolution images even in environments lacking high-performance computing resources. HVDM combines 2D triple-lane representation and 3D wavelet transform to simultaneously process the global context and fine details of an image.
While existing video generation models rely on high-performance computing resources to generate high-resolution images, HVDM succeeded in implementing natural, high-quality images even with limited resources. It overcomes the limitations of CNN-based autoencoder methods.
The research team demonstrated the superiority of HVDM through video benchmark datasets such as UCF-101, SkyTimelapse, and TaiChi. HVDM delivers higher video quality than existing technologies, and has demonstrated outstanding performance in natural image flow and realistic details.
Professor Yoo said, “HVDM is a groundbreaking model that can efficiently produce high-resolution video even in situations where high-performance computing resources are insufficient,” and “It can be widely used in industrial fields such as video production and simulation.”
Here, the research team also developed a multimodal layout generation model that can automatically generate advertising banners and Web-UI designs even with a small amount of data.
This model can process images and text simultaneously, automatically generating an appropriate layout based solely on user input.
Existing models were unable to sufficiently process text and image information due to lack of data. The newly developed model solves this problem and greatly improves the usability of advertising design and web UI. It automatically generates an optimized design that simultaneously reflects visual elements and text by maximizing the interaction between text and images.
The research team converted the layout information into HTML code format. They built an automatic generation pipeline that can achieve excellent performance with a small amount of data by making full use of the language model's pre-learning data. The benchmark test results showed a performance improvement of up to 2,800%.
We utilized an image caption dataset during the pre-training process and combined Depth-Map and ControlNet techniques to maximize performance through data augmentation. The quality of layout generation has been significantly improved, and distortions that can occur during data preprocessing have been reduced to create a more natural design.
Professor Yoo emphasized, “Even with as little as 5,000 pieces of data, it showed better performance than the existing model that required over 60,000 pieces of data,” and “It can be easily used by not only experts but also general users, so it will bring about a great innovation in the automation of advertising banner and web UI design.”
The research was conducted with the support of the National Research Foundation of Korea (NRF), the Ministry of Science and ICT (MSIT), the Institute of Information and Communications Technology Planning and Evaluation (IITP), and UNIST. The research results are expected to further expand the potential of AI utilization in various industrial fields and maximize performance and efficiency.