Understand 3D image quality and learn how to use it.
Find all the resources listed at the end of the page.
Hello, and welcome to today's webcast.
It gives me great pleasure to introduce VP of Sales Americas, Raman Sharma.
Good morning.
The concepts that we will share are applicable for Zivid 3D cameras as well as for 3D vision in general.
We will cover the things you need to know about 3D image quality including why 3D image quality is relevant, what it actually is, and finally, how you can use it to get the best results for your automation tasks.
If you stay until the end will show a demo.
But before getting started, I'd like to take a moment to address a very important safety concern.
As you may know, some 3D vision systems use class 3R lasers, which can be dangerous to human eyes when reflected off of shiny objects.
I'm relieved to announce that with Zivid, no eyes were or will be harmed with our structured light technology.
Let's dig into why we're all here in the first place.
From a macro level Industry 4.0 has been in the news for the last several years with the promise of driving increased productivity and economic growth.
In order to achieve this promised growth, it has become clear that robotics and 3D vision are two key factors.
The fact is close to half a million new robots were deployed in 2018 and a majority were blind or at least visually impaired.
So the way I see it, there is a symbiotic relationship and a synergistic relationship, for that matter, between Industry 4.0 and 3D vision.
If we believe that robots and vision are closely related, which I do and the folks at Zivid do, then as we zoom into the relationship, we can see that bigger robots handle bigger objects and bigger objects require less accurate vision.
Generally speaking, this is where I think the market has been hovering until just recently with the rise of collaborative robots.
We're seeing the converse.
Cobots, smaller robots by definition, are being used for smaller objects and with smaller objects more accurate vision is needed.
Beyond accuracy, we see the need for vision systems to do a couple of things.
(1) handle a wide variety of parts and
(2) be flexible to address many applications.
So this brings us to image quality and why it matters.
Not all point clouds are created equal.
If you think about it 3D sensors or cameras are your robot's eyes.
The better robots can see the more productive they can be.
So let's take two basic concepts of image quality to illustrate why it matters - resolution and color.
High resolution allows robots to see all the details.
And this becomes more and more important as the size of objects becomes smaller and smaller.
For example, with a low quality or low resolution output the object can definitely be detected with some difficulty while recognizing them is almost impossible.
Compare that to high quality, high definition output.
As you can see at the bottom of the slide and it's clear which output is more desirable for vision algorithms, such as object detection and edge detection.
Similarly, color plays a crucial role when it comes to image quality.
With color and a little AI, algorithms can advance from being able to detect an object, to be able to recognize an object.
To illustrate my point, let's focus on the middle of the screen.
In the cluttered bin of random objects, you can see a rectangle in the middle.
Without the benefit of color, the object detection algorithm can certainly be successful in identifying that an object exists.
However, the object could be anything.
A rectangular box or a pack eclipse gum.
But with the added dimension of color, it's easy to see that the object is, in fact, a playing card.
That's why color is important in my eyes.
Now let's shift gears from why 3D image quality matters to defining what it is.
The way I see it, defining image quality is important regardless of which sensor or camera you use.
Since 3D image quality is so important, regardless of application I think it's worthwhile to review some of the key components that make up what I call 3DIQ.
We'll dig into resolution and accuracy next.
These were the most talked about and the most discussed during my various customer meetings.
Then during the demo, Jesse will describe what we mean by dynamic range.
Here is a textbook definition of resolution.
The total number of pixels in an imaging sensor, or in our world, this translates to the number of possible 3D sampling points of the scene for each frame.
Taking a Zivid One+ camera as an example, we can see that the resolution of the imaging sensor is 1920 by 1200 pixels or 2.3 megapixels.
Each pixel includes a depth measurement represented by X, Y, and Z and it also includes a color measure represented by RGB, and lastly, a contrast measure which we'll get into during the next section.
Let's take a look at the same two examples of resolution that we saw earlier.
These images are from two different 3D vision technologies.
At the top, you see the resolution of a typical time of flight camera and at the bottom, you see the resolution of the Zivid One+ camera.
More relevant than the resolution of the imaging sensor is spatial resolution.
Spatial resolution describes the individual spatial pixel size.
I like to think of this as a measure of quantization area at a given distance from the camera.
What this tells me is the size of the smallest possible feature that can be detected at a given distance.
For example, for a Zivid One+ Medium at a working distance of 1 meter, the smallest detectable feature is 0.375 mm squared.
You get 25 individual measurements.
With a Zivid One+ Medium each measurement includes XYZ, RGB, and contrast.
If you cut the distance between the camera and the object in half, the number of individual measurements quadruples from 25 to 100 as shown on the slide.
Here's an example to illustrate the point.
On the left, you see a point cloud at 2 m and on the right at 50 cm.
To quantify camera accuracy we've decided to use the definitions from ISO 5725.
Precision has to do with the random noise.
Think of this as the distance between measurements or points.
So as precision increases the measurement points get closer and closer together.
Trueness describes how close each measurement is to the actual reality for the underlining reference.
I think that this is how close the measurements are to the bullseye or to the reference point.
As trueness increases the measurement points get closer and closer to the bullseye.
When considering precision and trueness together, you get accuracy.
Here's another way to look at the same concepts.
The black line represents the target.
On the left-hand side of the slide, we can see a lower end image capture where the distance between measurements, or precision, is quite poor.
If we draw a best-fit curve to the measurement points, we get the red line.
The distance from the red line to the target scene, or true reference, is what ISO 5725 calls trueness.
On the right-hand side we see a higher accuracy image capture.
As you can see, both precision and trueness are a lot better than what's shown on the left-hand side of this slide.
At Zivid, we decided to take the image quality very seriously.
And before doing anything further we wanted to understand which types of objects our customers cared to measure accuracy.
These turned out to be, as shown on the slide, points, points, planes, dimensions (a checkerboard for example) and spheres.
We started with points and planes.
We specified the spatial resolution, precision, trueness, and accuracy under known conditions in a data datasheet that you can read on our website (a link is available at the end of this presentation).
For example, in the datasheet, you can see that the spatial resolution of a Zivid One+ Medium camera at a working distance of 30 cm is 0.23 mm and point precision is 60 um at 60 cm.
The idea of the previous few slides was to illustrate the importance of image quality and the effort that we made to specify ISO 5725 metrics in a datasheet.
Let's zoom out now and summarize the concepts of 3D image quality.
The "quality" component of 3DIQ, as I call it, is measured by the degree of uncertainty in each measurement.
This quality component is made available for every pixel in the imaging sensor in addition to the 3D measurement (XYZ) the color measurement.
What you get is a true-to-reality representation of the object in the camera's view.
Zivid defines the gold standard for 3DIQ by providing high absolute accuracy across the entire imaging sensor.
Accurate representation of shapes and forms, surface details, colors, and texture; a short baseline, which minimizes shadows inclusions; per-pixel processing, which means there's no need for analyzing spatial neighborhoods; high-resolution data on any kind of surface; smart pattern and software filters that eliminate noise and false data points.
That's a summary of what 3DIQ means to us.
In the next part of the webinar, Jesse will talk about how to obtain the optimal 3DIQ.
Now that we've introduced what 3D image quality is we can now talk about the correct methodology to achieve and use 3D image quality to obtain amazing point clouds.
After we cover some concepts I'll demonstrate this methodology and a live example scene using a Zivid 1 Plus camera.
There are three important components to taking an optimum capture with a Zivid camera.
First, understanding the positioning and working distance for your application.
Secondly, you should address artifacts and disturbances in your scene with post-processing or smart exposure controls.
Lastly and most importantly, getting the right exposure is critical to achieving good 3D image quality.
We'll spend the most time exploring how to achieve this with camera concepts and tools that apply to 3D cameras But also any 2D cameras like your DSLR or your phone camera Working distance is the distance between your camera and objects and your scene.
And it's one of the first specifications you should consider in developing a vision system.
In general, there are two key questions to answer in determining your working distance.
The first thing to ask what kind of accuracy do I need? Both spatially and in terms of depth resolution.
The types of graphs shown are critical in development, especially in early-stage and can save weeks in tests and evaluation.
The second question is, what kind of working distance does my application require in robotic applications with statically mounted vision systems, typically located above the scene, robot arms often need a sizable region of space to maneuver the arm and complete tasks necessitating a longer working distance.
In robotic applications with robot mounted vision systems, the working distance becomes much smaller since the mounted camera can usually be much closer to the scene.
In summary, these two requirements: accuracy and application working distance often bound what is the maximum and minimum working distance is allowed, respectively.
You should select a vision system that can achieve both for your given application.
When using an active light source camera, It's important to think about reflections, especially the ones you can control in your scene.
To illustrate this, here's an activity you can do yourself.
Next time you're in front of the mirror.
Take a picture of yourself with a flash on the flash and the camera will over saturate your picture and wash out part of your image with white. Similarly to the black spot on the image to the left under the "LABsnacks".
What's the takeaway here? Active light cameras 2D or 3D do not like selfies! With powerful light projectors, you can achieve this over-saturation with most non absorptive materials.
If your background playing is perpendicular to your camera lens thankfully the solution is simple.
By introducing a slight angle on the mount active light cameras can avoid major light reflections and this blooming effect.
Overall, it's helpful to consider every object in your scene and how it relates to your application, as well as external factors like ambient lighting.
For example, changing the background of your scene from gray to black or vice versa, can affect how your foreground objects contrast with the background, which can affect your image quality and bin picking applications.
Picking a material that is dark and absorptive relative to your objects can make a big difference.
Removing unnecessary objects or overall simplifying your scene can provide benefits to both accuracy and capture time.
For scenes with unavoidable strong ambient lighting.
Choosing exposure times at multiples of the mains frequency can afford can avoid artifacts caused by the mains ripple voltage.
Finally, you should mitigate known artifacts with native post-processing to address issues like reflections.
These are some of the adjectives used to describe both the surface and material characteristics of objects.
Diffuse and specular speak to the smoothness of the surface Material descriptors like absorptive, reflective and transmissive describe how much light passes through or bounces off of materials.
These concepts are important because of how your 3D camera will receive the light when it's reflected by the object.
For example, diffuse surfaces like brushed metal have softer transitions from light to dark regions while specular surfaces like chrome plating transition between light and dark much more drastically different combinations of surface and material can further complicate the ability to take good captures.
For instance, your cell phone screen is both specular because it's made out of glass, and absorptive because the material under it is dark, which is typically not a good target object unless you have a highly specified instrument.
Diffuse and reflective are typically the objects that get the best results.
The signal can be reflected back evenly.
Now let's talk about what exposure is and what it means for our camera.
Exposure is defined as the amount of light reaching the imaging sensor over a given time frame.
It is the integral of light over time.
By controlling the way the light enters our camera sensor we can analyze and affect exposure in a very controlled and precise manner which means we can not only achieve the highest 3D image quality we can do it faster and perhaps even automatically.
As we prepare to affect the exposure in these different ways, how do we measure these changes in exposure.
We use stops, which uses a base 2 logarithmic scale, and it indicates an interval on the exposure scale.
Moving a capture exposure one step up represents allowing twice the amount of light into the camera sensor while a stop down means halving the amount of light reaching the sensor.
One last thing to note is that stops are a relative measurement, meaning that stops are taken relative to a reference exposure or image.
Now that we know what exposure is.
How do we control it?
We use these four parameters: exposure time, aperture, gain, and brightness.
These 4 knobs on the Zivid camera are the same parameters used in DSLRs and the camera phones in our pockets.
Let's explore what each one is and how it relates to exposure as well as the tradeoffseach parameter presents while changing the exposure.
Exposure time, also known as shutter speed with 2D cameras, is the amount of time that light is allowed to enter the imager.
As illustrated on the right, doubling the exposure time means doubling the amount of light entering the sensor, which means a doubling of the exposure time is increasing the exposure by one stop.
Increasing exposure time has a very straightforward trade-off: time.
If you want to increase your exposure with shutter speed your capture time increases logarithmically as your stops increase.
As cycle time is critical for many pick and place applications, increasing exposure time should be carefully considered aperture also known as iris or number describes the size of the hole allowing light into the imager.
A larger iris means more light entering and vice versa.
A larger - a smaller aperture or larger f-number means less light entering which means lower exposure.
Here's a chart relating f-number, iris, and stops.
F-number and iris are inversely related.
Note that iris steps and exposure stops are not related linearly.
You need less iris settings to constitute an exposure stop at lower settings.
Increasing the iris size does not affect your cycle time, but it does affect your depth of field performance.
Take a look at the graph on the right.
You can see that as your f-number decreases your depth of field falls off.
If your scene does not span a large amount in the z-direction relative to your camera, iris settings may be the best tool for you to control exposure.
Conversely, if you have a scene with relevant data in the foreground and background, you should keep your iris settings lower.
Gain, also known as ISO for 2D cameras, simply controls a gain amplifier between the sensor and the analog digital converter.
A higher gain means higher light signal intensity read by the ADC and vice versa.
Changing gain affects exposure stuff similar to the exposure time: doubling the gain means one step up and halving the gain means one stop.
Gain's trade off is characteristic of most amplifiers.
You are boosting the level of your desired light signal, but you're also boosting the level of your noise just like higher ISO camera photos usually include more noise increasing the gain can add noise and potentially hurt your SNR but still may be necessary when a large number of exposure steps are required.
Projector brightness is exactly what it sounds like.
And has the effect you would imagine.
Higher projector brightness means more light hitting your sensor doubling the brightness equates to one exposure step up.
And vice versa.
The tradeoff that brightness presents with power consumption.
At the highest brightness setting and with continuous captures.
The projector may duty cycle the power, which can affect cycle times and continuous capture applications.
Now we know how to control the exposure, How do we observe or measure these changes in exposure? We use histogram of the light intensity to understand where our light information resides along the dynamic range of our camera.
And this is what it looks like.
This is the histogram of a capture and it tells you quite a lot about what's going on in your image.
The gray curve represents the frequency of light intensity at a given tonal range.
In this capture there's a lot of information in the shadows.
You can also see vertical bars at either end where clipping is occurring.
This is where light information was not able to be captured due to the limited dynamic range of the imager.
Note that the tonal range is represented here on a linear scale.
Because we want to you because we want to work with exposure in terms of stops, which is logarithmic base 2, we need to translate the x-axis to a logarithmic scale like so.
Take a look at the scale.
Now the information has not changed.
The x-axis now reflects an important feature of the digital image.
The increment on the x-axis reflects how many quantization bits are available to digitize the analog light signal.
That is, on the left part of the region.
You may have one or two bits of quantification available while the regions to the right may have 128 or 64 bits.
Oince we know that more bits means higher SNR,We can conclude that the farther right you go on this histogram the more quantization bits are available to you which ultimately translates to better image quality.
And that's exactly what we see with this chart.
As you increase your light intensity within your dynamic range the essence of your exposure increases and you get the best 3D image quality since we can move our exposure curve left and right using the exposure tools introduced earlier, we should aim to keep as much of our exposure in this sweet spot as possible by using HDR techniques we can use different exposure settings to put different parts of the scene into this sweet spot and create a single high quality capture of different objects at different exposures.
Let's take a look at a theoretical example, Imagine a capture histogram with little data in the middle and clipping on either edges.
I can infer that I have objects in my scene that are both too bright and too dark for this given exposure.
My data probably looks bad for both of them.
What should I do.
The first step: underexpose my capture by one stop by reducing one of: exposure time, aperture, gain, or brightness.
After doing that, my exposure moves to the left by one stop, Which puts the right curve within my dynamic range.
And now I should have good point cloud quality on that object.
Now what about the other data? With HDR, I can create another frame and pick new settings that push the exposure up 3 stops using the techniques covered earlier.
When I combine the two frames in an HDR capture I can get both objects residing in my dynamic range in an awesome capture.
Here are some examples of typical histogram curves and the types of scenes that cause them a U shape implies both highlights and lowlights often requiring multiple exposures like our previous example AN N shape means that your light information resides mostly within your dynamic range already an L or a j shape mean that you need to increase or decrease your exposure, respectively not only covered the basic concepts of achieving
optimum 3D image quality and exposure, let's put it all together and apply these concepts on a real demo.
So this is if it's studio.
It's a GUI based tool to help evaluate and develop with a Zivid camera.
Very quickly, Right here you've got your point cloud.
You've got the distribution of light intensity in your scene.
In our scene, We've got lots of different objects including some exposure training blocks of different colors.
Specular, specular, matte wooden, coffee cup top is highly reflective.
This is a sort of brushed metal industrial piece and this is a very dark absorbative mouse cover.
So we're going to use the techniques that we've talked about today in conjunction with the histogram to build an HDR image that maximizes the point cloud quality across this entire scene.
So let's get started.
The first thing we want to understand is how the light intensity is translated from this scene into this histogram.
So for example, this white dongle basically is one of the brightest objects in the scene.
So I where I would be I would guess that if I were to remove it,I would see a lot of this information in the highlight region of my histogram to go away.
So I've taken that out of the scene, if I take another capture, Hurrah! I can see the light information disappear.
So that sort of validates sort of what we're looking at.
I put that object right back in the scene.
I see that light information jump up again.
The other important thing to note is that because this white dongle, again, is sitting in the higher end of my dynamic range where I have lots of bits for quantizing the light.
I actually get very high quality point cloud over this white object.
You can see the holes for HDMI VGA and USB-C are very clear, and it's very flat object.
That sort of validates the idea that, again, putting exposure your exposure information that sits in this region is going to be the best information.
This also tells me that as I am building multiple frames at different exposures if I were to increase the exposure by this on this single frame.
I would lose this information.
If I pushed this information.
So we can go ahead and demonstrate that.
So I'm going to increase the iris by quite a few steps.
That's going to cause this entire curve to move to the right.
And it's also going to cause my white dongle exposure to clip and saturate which means I'm going to lose point cloud.
So that's exactly what's happened.
I've clipped.
I've increased the point cloud quality on some other objects but I've really lost this object, as far as a good clean point cloud.
So I'm going to go ahead and move this back to about 20.
And again, I've got really good coverage on this single item.
So I'm going to say, I'm going to go ahead and save that frame and not change it anymore so that I can as I build my HDR image I can focus on other objects knowing that this first frame will cover that and just examining even with this setting I'm getting pretty good quality on most of the blocks, as well as pretty decently on this piece as well.
So as we build our frames, as we add additional frames the only things we need to worry about are increasing the point cloud quality on these regions.
So I went ahead and added a frame, which duplicates the frame before it.
And I've also turned off frame one so that when I do my histogram analysis my data doesn't get conflated with that first frame.
So now that I'm on frame 2 again, I'm going to focus on manipulating these exposure and pushing, controlling this histogram.
So that I can maximize point cloud quality on these other objects.
So the way we do that is by increasing any one of these controls.
I'm going to use.
I'm going to start with exposure.
So I'm going to roughly to around 16,000.
I would expect my curve to move by one stop based on the concepts that we went over before as I've done that.
You can see that the curve did move exactly, roughly by one spot.
I'm going to go ahead and increase it a little bit more using the iris.
So that I can get even closer to that right edge and there we can see we've added a lot more information to the industrial piece and we're starting to get a lot more information, more information on this dark absorbent piece as well.
And fill in some gaps here.
So because a lot of this information is again, sitting in the correct spot of my dynamic range.
I'm going to go ahead and save this frame, and then try and use one more frame to try to fill in the rest of the information.
So let's quickly look at what we have.
As soon as I enable more than one frame at a time, This button switches to HDR because I'm combining multiple frames.
So here we can see: so far we've got most of the field.
Most of the field the view colored in with valid points.
And again, we just need to focus on this region and really some more on this.
So you can tell that with a very shiny piece a lot of the light from the projector is being reflected onto this piece and straight away.
So that the camera can't read it.
And for this dark absorbent piece.
It's simply being absorbed by the material and not reflected.
So we're going to need a higher exposure to sort of draw out that data.
So let's go ahead and switch these off enable frame and increase the exposure.
I'm going to try increasing the brightness.
That added some more points here and here try iris a few more steps and there, you can see I'm really starting to fill in a lot of information on both of these objects.
Again, a lot of my point cloud on some other objects are much, much worse.
But that's OK because the other frames will fill those in.
So let's go ahead and turn these three all on and see what our three frame HDR gets us.
There you have it.
Really good point cloud coverage across all of these objects.
I'm going to go ahead and enable the reflection filter that should clean up.
You can see some reflection artifacts.
Go ahead and enabling that.
Can you clean that up there really good, high quality point cloud across this. A quite challenging scene here.
So that's how you manually build an HDR image using the controls here, as well as the histogram, but since we have those tools and they're somewhat understood programmatically we can program, use AI in some deep learning concepts to try and do this process automatically.
So by switching to this assisted mode we.
All of these settings all of the parameter controls are removed and all you're given is an option to control what my max capture time is.
So let's set this pretty high.
So because we have quite a tough scene.
So the camera take some diagnostic images and uses an algorithm to determine some great frame settings.
So you can see this does roughly about the same thing that the manual capture does.
We recommend switching back to manual mode and playing with each of these frames to sort of understand what each frame is attempting to do and tweaking them as necessary.
Thanks Jesse for that demo.
(1) I thought I'd share a few details of the 3D camera that Jesse used, a Zivid One+ Small.
The One+ family has three key differentiators
- (1) accuracy
- (2) fasta acquisition time and
- (3) the ability to see everything due to a wide dynamic range enabled by 3D HDR.
We use time coded structured light as the basis for our 3D technology.
This is something the founders of Zivid have researched for a total of 30 years.
The basic idea is this - light patterns are projected onto an object.
The displacement is then used to calculate depth information.
With this you get high accuracy and fast acquisition times.
We provide three things at Zivid. (1) the camera (2) a GUI tool to capture and visualize the point clouds of the studio.
That's what Jesse was using in the demo and (3) a software development kit (SDK), which includes an API.
The One+ is our latest family of 3D cameras.
Compared to the original One family, the One+ is faster by 30%, brighter by almost 2X, smarter with new filters and the ability upgraded in the field and more accurate.
The One+ family includes three variants.
The One+ Small, which is ideal for small objects that you need to inspect and do feature verification.
Often times you're looking at a tray or a box.
The maximum working distance for the Small is 1m.
Zivid One+ Medium is ideal for small and medium objects in standard totes and bins where the application is picking, assembly and control.
The maximum distance is 2 meters for a Medium and the accuracy you can expect is 0.07 mm to 1 mm and lastly, a One+ Large is designed for US and Euro pallets with a maximum distance of 3 meters and accuracy in the range of 0.3 mm to 2 mm.
This variety is ideal for for picking and handling and de-palletization.
Before we wrap up for today.
I want to touch upon a touch upon the topic of collaborative safety.
Earlier I joked about no human eyes being harmed during the demo.
Well, let's get a little more serious now and quickly hit upon some considerations when it comes to safety.
Because Zivid's 3D cameras are not laser based, they are inherently safe for collaborative use.
This is due to the white light that we chose to use.
What you get is color, great point clouds and a flexible solution for a variety of applications.
What do you avoid with white light is dangerous reflections from shiny objects and the need for a laser safety officer.
When you look at the ROI for evaluating a 3D camera from Zivid, you should certainly consider the investment cost, which is acquiring a camera and dedicating time for evaluation.
The result will be worthwhile in terms of flexibility and reusability which allows you to address broader applications.
It would be worthwhile in terms of safety and the ability to embrace collaborative environments.
And lastly image quality, which leads to better data and a lower cost of automation.
We hope you have been inspired to take the next steps.
First, you can meet Zivid in person to learn more.
On our website at zivid.com/events you can see all the locations where we or our partners will be.
Upcoming events include a tech tour in Europe and developer boot camps in the US and Canada.
Second, please use our website to download the One+ datasheet, view point cloud examples, request an online demo and order to developer a kit (for which we have a limited time offer).
Thank you for attending.
Fill in your email, and we'll send you all the files
and resources from the webinar.
Zivid brings
Gjerdrums
N-0484, Oslo
Norway