범용인공지능(AGI)은 최근 학계와 함께 산업계 전반에서도 주목받으며 다양한 각도에서 논의가 이뤄지고 있다. 인간의 개입 없이도 스스로 추론하고 학습하며 문제를 해결하는 인공지능이야말로 궁극의 인공지능이라고 할 수 있다.
Five AI experts share their vision at the Computer Vision Society panel discussion
“Vision AGI, Verifiable on Sensorimotor Robotics Platform”
Presenting embedded multitasking, multi-sensor, and multi-modal directions
“To become an AGI (Artificial General Intelligence), we must reach a stage where sensing and action are combined. We must solve real problems and perform tasks through sensorimotor intelligence.” Kyung-Moo Lee, Professor, Department of Electrical and Computer Engineering, Seoul National University (Computer Vision Lab)
“General AI in computer vision is not easy to evaluate, and implementing a channel to express it is much more difficult than in the language field.” Joo Han-byeol, Professor, Department of Computer Science, Seoul National University (Visual Computing Lab)

▲A panel discussion is underway at the Korea Computer Vision Society KCCV 2023 on the role and development direction of computer vision in moving from the era of giant models to AGI.
Artificial general intelligence (AGI) has recently been attracting attention from academia and industry, and discussions are taking place from various angles. The ultimate artificial intelligence is an artificial intelligence that can reason, learn, and solve problems on its own without human intervention.
At the KCCV 2023 event recently hosted by the Korean Computer Vision Society, a panel discussion was held on 'The Role and Development Direction of Computer Vision in Moving from the Era of Large Models to General-Purpose Artificial Intelligence.'
The discussion that day was moderated by Choi Jong-hyun, the chair of the KCCV2023 program (professor at Yonsei University), and in-depth discussions were held with the participation of Seoul National University professors Lee Kyung-moo, KAIST professor Choi Yoon-jae, Seoul National University professor Joo Han-byeol, OpenAI technical staff member Kim Jong-wook, and KIST Artificial Intelligence Research Group Director Lim Hwa-seop.
■ AGI Vision, Evaluation and Verification Difficult… “Possible with Sensor-Motor Robotics” .jpg)
▲Professor Joo Han-byeol, Department of Computer Engineering, Seoul National University
In natural language processing (NLP), specific applications such as ChatGPT are emerging, and some are giving positive evaluations, saying, "It has opened up the possibility of AGI" and "It is the starting point of AGI." However, ChatGPT and other language models are still limited artificial intelligence (ANI) in the strict sense, and in the field of computer vision, it is even more difficult to define what AGI is.
Professor Joo Han-byeol explained, “AGI is understood as an intelligence that can solve tasks that are continuous in a situation where the task is not defined and cannot be defined as a single problem,” and explained that in a narrow sense, NLP can also be seen as a sample of AGI.
Professor Joo, who said that if you imagine the form of AGI in computer vision, it is not easy to implement it, predicted that verification and evaluation in computer vision will be difficult, especially compared to language models. He pointed out that, “In vision, AGI has no way to check whether it understands a video accurately or not, and there is no proper means to express it.”
Accordingly, he mentioned that information similar to data sensed by humans must be input into vision AGI, and that we can only get closer to AGI when there is connectivity between the environment and data, rather than learning by inputting the current sporadic data.
From this perspective, many approaches are being taken by researchers in the field of vision. The AI Habitat simulator platform mentioned as an example is a tool for Embodied AI research, reinforcing the paradigm shift from 3D simulators to Embodied AI, including △active perception, △interaction-based learning, and △environment-based conversation.
Professor Joo concluded by saying, “The vision must go hand in hand with robotics,” and added, “If we create AGI, the platform that can verify it will likely be robotics with sensory and motor skills similar to humans.”
■ Sensory motor AI, computer vision opportunities abound .jpg)
▲Lee Kyung-moo, Professor, Department of Electrical and Computer Engineering, Seoul National University
Professor Lee Kyung-moo, a domestic computer vision expert and world-renowned scholar, expressed his opinion that the future of AGI will advance to a stage where ‘sensing’ and ‘action’ are combined.
He then mentioned the concept of sensorimotor intelligence, which can be called sensorimotor intelligence in Korean.
The professor emphasized, “In real life, we must be able to perform the service functions we want through actions,” and “Ultimately, the direction toward embedded multitasking, multi-sensor, and multi-modal is inevitable, and vision will play a very key role in this process.”
Many groups and scholars studying artificial intelligence have already started working on Physics-related AI am researching sensor remote AI.
Last March, Professor Jitendra Malik, who studies computer vision at the University of California, Berkeley, gave a lecture on sensorimotors and artificial intelligence. This was about a robot using its own senses to calculate the depth of its legs and other things to navigate obstacles and terrain, and then predicting the depth 1.5 seconds later and obtaining this as an image from the vision system.

▲Professor Jitendra Malik's keynote speech (Capture: UC Berkeley Events YouTube channel)
“The first day you see it moving awkwardly, the second day you see it climbing more stairs than before, and by the end you see the robot climbing all the way up,” he said.
As active research is being conducted on applying sensor-remote-based AI to the robotics field, Professor Lee Kyung-moo said, “I think there will be ample opportunities and roles for vision researchers to go beyond the existing fragmented vision problems and develop highly complex intelligence that solves problems in real life.”
Additionally, Dr. Jongwook Kim (OpenAI technical staff), who attended the panel, predicted, “Ultimately, if self-supervised computer vision is added, there will be many more things that can be extracted through visual data, such as the way children see the world.”