Skip to main content

2024 | Buch

3D Computer Vision

Foundations and Advanced Methodologies

insite
SUCHEN

Über dieses Buch

This book offers a comprehensive and unbiased introduction to 3D Computer Vision, ranging from its foundations and essential principles to advanced methodologies and technologies. Divided into 11 chapters, it covers the main workflow of 3D computer vision as follows: camera imaging and calibration models; various modes and means of 3D image acquisition; binocular, trinocular and multi-ocular stereo vision matching techniques; monocular single-image and multi-image scene restoration methods; point cloud data processing and modeling; simultaneous location and mapping; generalized image and scene matching; and understanding spatial-temporal behavior.

Each topic is addressed in a uniform manner: the dedicated chapter first covers the essential concepts and basic principles before presenting a selection of typical, specific methods and practical techniques. In turn, it introduces readers to the most important recent developments, especially in the last three years. This approachallows them to quickly familiarize themselves with the subject, implement the techniques discussed, and design or improve their own methods for specific applications. The book can be used as a textbook for graduate courses in computer science, computer engineering, electrical engineering, data science, and related subjects. It also offers a valuable reference guide for researchers and practitioners alike.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
Vision is an important function and means for human beings to observe and recognize the world. Computer vision, as a subject using computer to realize human visual function, has not only received great attention and in-depth research but also been widely used [1].
Yu-Jin Zhang
Chapter 2. Camera Imaging and Calibration
Abstract
Image acquisition is an important means to obtain objective world information, and it is also the basis of computer vision. This is because images are the operating objects of various computer vision technologies, and image acquisition refers to the technology and process of acquiring images (imaging).
Yu-Jin Zhang
Chapter 3. Depth Image Acquisition
Abstract
The general imaging method obtains a 2D image originating from a 3D physical space, in which the information on the plane perpendicular to the optical axis of the camera is preserved in this 2D image, but the depth (distance) information along the optical axis of the camera is lost. To complete visual tasks, computer vision often needs to obtain 3D information of the objective world; that is, it needs to collect images with depth information.
Yu-Jin Zhang
Chapter 4. 3D Point Cloud Data and Processing
Abstract
3D point cloud data can be obtained by laser scanning or photogrammetry and can also be seen as a representation of 3D digitization of the physical world. Point cloud data is a kind of temporal and spatial data. Its data structure is relatively simple, its storage space is relatively compact, and its representation of local details of complex surfaces is relatively complete. It has been widely used in many fields [1]. However, 3D point cloud data often lacks correlations with each other, and the amount of data is very large, which brings many challenges to its processing [2].
Yu-Jin Zhang
Chapter 5. Binocular Stereovision
Abstract
The human visual system is a natural stereovision system. The human eyes (each equivalent to a camera) observe the same scene from two viewpoints, and the information obtained is combined in the human brain to give a 3D objective world. In computer vision, by collecting one set of two (or more) images from different viewing angles, the parallax (disparity) between corresponding pixels in different images can be obtained by means of the principle of triangulation. That is, the parallax is the difference between the positions of a 3D space point projected onto these 2D images. The depth information and the reconstructions of the 3D scene can be further obtained according to the parallax.
Yu-Jin Zhang
Chapter 6. Multi-ocular Stereovision
Abstract
The binocular stereovision technology introduced in Chap. 4 refers directly to the structure of the human visual system. When using cameras for image acquisition, systems with more than two cameras (or one camera placed in more than two locations in succession) can also be used to acquire different images of the same scene and further obtain depth information. This technology is called multi-ocular (multi-eye) stereovision technology. Using the multi-ocular method is more complex than using the binocular method but has certain advantages, including reducing the uncertainty of image matching in binocular stereovision technology, eliminating the mismatch caused by the grayscale smooth region of the scene surface, and reducing the mismatch caused by periodic patterns on the scene surface.
Yu-Jin Zhang
Chapter 7. Monocular Multi-image Scene Restoration
Abstract
The stereo vision method introduced in the first two chapters restores the depth of the scene according to two or more images obtained by the cameras in different positions. Here, the depth information (distance information) can be regarded as the redundant information from multiple images. Acquiring multiple images with redundant information can also be achieved by collecting images of light change and/or scene change at the same location. These images can be obtained with only one (fixed) camera, so they can also be collectively referred to as monocular methods (stereo vision methods are all based on multiple cameras and multiple images. Although one camera can be used to shoot in multiple positions, it is still equivalent to multiple cameras due to different angles of view). From the (monocular) multiple images obtained in this way, the surface orientation of the scene can be determined, and the relative depth between the parts of the scene can be directly obtained from the surface orientation of the scene. In practice, it is often possible to further calculate the absolute depth of the scene [1].
Yu-Jin Zhang
Chapter 8. Monocular Single-Image Scene Restoration
Abstract
As pointed out in Sect. 7.​1, this chapter introduces the method of scene restoration based on monocular single image. According to the introduction and discussion in Sect. 2.​2.​2, it is actually an ill-conditioned problem to use only monocular single image for scene restoration. This is because when the 3D scene is projected onto the 2D image, the depth information is lost. However, from the practice of human visual system, especially the ability of spatial perception (see [1]), in many cases, many depth clues are still retained in the image, so it is possible to recover the scene from it under the condition of certain constraints or prior knowledge [2–4].
Yu-Jin Zhang
Chapter 9. Generalized Matching
Abstract
Matching can be understood as a technique or process of combining various representations and knowledge to interpret a scene. This chapter focuses on some generalized matching methods and techniques for objects and scenes, which are more abstract than common image matching. It begins with an overview of matching, including matching strategies, classification of matching methods, and evaluation of matching. The principles and metrics for general object matching is then described, and a dynamic pattern matching technique, which is characterized by the dynamic pattern representation established during the matching process, is introduced. The relationship between matching and registration is presented, and some basic registration techniques, a heterogeneous image registration technique, and an inference-based image matching technique are described. In addition, the matching of various inter-relationships between objects, the using of graph isomorphism to match, and the matching of 3-D scene to the corresponding model with line drawing labels are introduced. Finally, some recent approaches to multimodal image matching, both region-based and feature-based, are presented.
Yu-Jin Zhang
Chapter 10. Simultaneous Location and Mapping
Abstract
Simultaneous localization and mapping (SLAM), also known as real-time localization and map construction, refers to the subject equipped with sensors, which simultaneously estimates its own motion and builds a model of the environment without prior information about the environment. It is a vision algorithm mainly used by mobile robots, which allows the robot to gradually build and update a geometric model as it explores an unknown environment, and based on the built part of the model, the robot can determine its position relative to the model (self-positioning). It can also be understood that the robot starts to move from an unknown position in an unknown environment. In the process of moving, it not only performs its own positioning according to the existing map and the estimation of the position but also performs incremental map drawing on the basis of its own positioning. So as to realize the autonomous positioning and navigation of the robot [1].
Yu-Jin Zhang
Chapter 11. Spatial-Temporal Behavior Understanding
Abstract
From the perspective of computer vision to achieve human visual function, an important task is to interpret the scene, make decisions, and guide actions by processing the images obtained from the scene. To do this, it is necessary to judge which objects are in the scene and how they change their position, attitude, speed, relationship, etc., in space over time, as well as their changing trends. In short, it is necessary to grasp the actions and activities of the scene in time and space, determine the purpose of the actions and activities, and then understand the semantic information they convey.
Yu-Jin Zhang
Backmatter
Metadaten
Titel
3D Computer Vision
verfasst von
Yu-Jin Zhang
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
Electronic ISBN
978-981-19-7603-2
Print ISBN
978-981-19-7602-5
DOI
https://doi.org/10.1007/978-981-19-7603-2

Premium Partner