How a Camera Sees in 3D: The Technology Behind Vision-Guided Robots

Apr 6
4 min read

Updated: Apr 13

You have probably seen the phrase "3D camera" and wondered what it actually means. A regular camera takes a flat picture. A 3D camera does something different: it measures depth, producing a spatial map of everything in its field of view.

For a robot arm, that spatial map is everything. Without depth data, a robot can only move to fixed coordinates programmed in advance. Add a 3D camera, and the robot can see where an object actually is, figure out how it is oriented, and pick it reliably, even if nothing is in quite the same position twice.

This post explains how cameras see in 3D, why it matters for industrial automation, and how to pair 3D vision with an affordable cobot to build a system that works.

Why Cameras Cannot See Depth on Their Own

A standard camera sensor captures light that lands on a flat grid of pixels. The result is a 2D image: height and width, but no distance. The camera has no idea whether an object is six inches away or six feet away. Everything appears flat.

Human eyes solve this through binocular vision. Because each eye sits at a slightly different position, they see the world from two marginally different angles. The brain measures the difference between those two views, called disparity, and uses it to calculate depth. This is why closing one eye makes it harder to judge distance.

3D cameras use variations of this same principle, plus a few others, to recover the depth information that a single camera lens cannot capture on its own.

How a Camera Actually Sees in 3D

There are three main approaches used in industrial robotics today.

Stereo vision mimics human binocular vision most directly. Two cameras, mounted a fixed distance apart, capture the same scene from slightly different angles. Software compares the two images, finds matching points, and calculates depth from the disparity between them. The result is a dense point cloud: a three-dimensional map of the scene expressed as millions of individual X, Y, Z coordinates. Stereo vision works well in good ambient light and over longer working distances, and the hardware cost is relatively low.

Structured light takes a more active approach. The camera projects a known pattern (a grid, a series of dots, or shifting stripes) onto the scene, then captures how that pattern deforms as it lands on object surfaces. Because the original pattern is known, the distortions can be decoded mathematically into precise depth measurements. Structured light produces very high accuracy point clouds and works well on surfaces that lack texture, where stereo vision would struggle. It is the preferred technology for precision pick-and-place, inspection, and assembly tasks.

Time-of-flight (ToF) does not rely on pattern matching at all. The sensor emits pulses of near-infrared light and measures how long each pulse takes to bounce back from the scene. Distance is calculated directly from that travel time, frame by frame, in real time. ToF cameras are fast, compact, and work reliably in dim or variable lighting because they supply their own illumination. They are a common choice for conveyor-based pick-and-place, autonomous mobile robot navigation, and any application where speed matters more than extreme depth precision.

Each technology produces the same fundamental output: a point cloud that gives a robot arm a complete spatial picture of its environment.

What a Robot Does with 3D Data

Once the robot's controller receives a point cloud from the camera, the vision software gets to work. It identifies objects in the cloud, calculates their positions and orientations, and determines the best grasp strategy. The robot arm then moves to that calculated position and picks the part.

This happens in fractions of a second, and it adapts automatically to variation. Parts can be in different positions, at different angles, at different heights. The robot handles it. This is what makes 3D vision-guided automation useful in the real world, where parts do not arrive in perfect, identical positions every time.

Adding 3D Vision to an Affordable Cobot

The good news for smaller manufacturers is that 3D cameras and cobot arms have both dropped dramatically in cost. An entry-level depth camera suitable for pick-and-place or inspection tasks costs between $300 and $1,500. Mid-range structured light systems run $3,000 to $8,000. Either way, the camera is a fraction of what it used to be.

Pair that with the right robot arm and you have a complete vision-guided automation cell at a price that makes financial sense:

The UFactory Lite 6 ($3,500) is the entry point: a compact 6-axis tabletop cobot that integrates with Intel RealSense and similar cameras for desktop inspection and light pick-and-place tasks.

The Fairino FR5 ($6,999) and Fairino FR10 ($10,199) step up payload and repeatability for inspection cells and heavier bin picking applications.

The Fairino FR16 ($11,699) and Fairino FR20 ($15,499) handle the larger-scale work: depalletizing, material handling, and high-throughput pick-and-place lines where both reach and payload are non-negotiable.

Every robot in the Blue Sky Robotics lineup supports 3D vision integration via ROS2, Python SDK, and open APIs. Blue Sky Robotics' automation software includes computer vision capabilities built for these exact applications.

Not sure which setup fits your process? The Cobot Selector is a quick way to narrow it down, or you can book a live demo and see a vision-guided system running in real time. To learn more about computer vision software, visit Blue Argus.