Aurora Innovation, Inc.

Q42024

2/12/2025

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Hi, everyone. I'm George Genericus. I'm the Sustainability and Autonomous Driving Analyst here at Kinecore Genuity. And we're very excited to have with us today Sterling Anderson from Aurora. He is the co-founder and chief product officer at the company. And the topic of today's call is kind of trying to resolve the technical debate that's currently rampant in autonomous driving circles. Sterling, thank you so much for joining us.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, thanks for having me, George.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

So maybe to start off a little bit, can you just describe to us why you chose a career in autonomy, where you went to school, the exciting things that you did before you came to Uber?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, yeah. I've always been an engineer and innovator by disposition. My intro to robotics was as early as I can remember. I was a fairly young boy. I spent a lot of time in that personal experience where when I was a teenager, caused me to focus my career a little bit. My little brother was hit by a car, broke his neck. That led me to kind of hone in on where I wanted to apply my kind of robotics work. So I went to MIT on some fellowships that allowed me to focus specifically on transforming how we control machines. And I've been doing this for the last 16 plus years.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

And where did you start your career after MIT?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

I went to McKinsey for a minute. So while at MIT, I co-founded a company called Gimlet Systems. This is with Carl Yanima. He and Emilio Fusoli were both working with me there at MIT. They started a consultancy called Newtonomy that effectively absorbed the work that we were doing at Gimlet. That became active, subsequently became emotional. So that was my earliest start in kind of corporate application of the technology. I went to McKinsey briefly. Elon reached out and asked if I would come to Tesla, where initially the idea was to run the autopilot program. Shortly after I joined, the Model X program was struggling, so he asked me to run that. I ran that to its launch in 2015 and then ran the autopilot program through the first several generations of its launch and then left there to join Chris and Drew in forming Aurora.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Cool. So maybe if you could just help us understand very broadly what the different approaches are to solving the autonomous driving problem. I mean, Elon Musk, who you just mentioned, called it the hardest problem to solve ever, something to that effect. And how do you how do different companies solve? resolve on solving that issue, you know, hybrid versus end to end versus anything else you kind of tell us about.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah. Yeah. When we, when we, maybe just a bridge from your last question or this one, when we started Aurora, Chris Hermson had been leading Google's self-driving group. Drew Bagnello had been leading autonomous reception for Ubers. I've been leading autopilot with Tesla. We had the benefit of tremendous, a great deal of hindsight in what we had built at each of these companies. We also had the benefit of, in particular, Drew's vision of foresight as a professor at Carnegie Mellon and one of the top three in the world, probably, in machine learning, artificial intelligence, and the opportunity of a fresh start at Aurora. And so when we started the company, In 2017, we made clear publicly, in fact, that the Aurora stack would be designed for learnability. From day one, it instituted a more expedient but ultimately cumbersome approach to brute force handcrafted procedural solutions, which were kind of the thing at the time. And really, the industry has been evolving through a number of different phases, from the more procedural phases to the learned phases. Years ago, we began to look at kind of end-to-end, kind of brute force, black box style learning approaches as well, which is, I think the current conversation about the LLMs has caused a lot of folks to think, well, It seems like it's just a big black box and you throw more training data at it and it just gets better. And shouldn't that be ultimately more generalizable than anything else? We found when we began to do this a couple of years ago that it runs into a number of different challenges on introspectability, predictability. You simply don't know what it's going to do, nor can you specifically target the training or the invariance or the controls for different outputs of that system. And so you're left effectively in what I think I may have mentioned to you before is what I call kind of a train and pray type development approach, where you've got this effectively massive statistical regression. Use foundation models, use transformers as you will, but at the end of the day, the bets you got is the opportunity to throw more data at the system cross your fingers and hope that the next surprising output of that model fixes the issue you're trying to fix with the training and doesn't introduce other regressions. Now, all of this works just fine in language models where the kind of result of the computation is not safety critical. It's a problem when you've got an 80,000 pound truck or a 5,000 pound fast car operating in and around other humans. And so really it's evolved and passed that. We've gone from the procedural, which is kind of the 1.0 approach that we each find here in our respective worlds before we formed Aurora, to the more kind of end-to-end black box type approaches that I see a few folks naively talking about today as though it's the stuff. It was the stuff for us a couple of years ago when transformers and foundation models were really starting to come into their own. We evolved past that into a more compound AI approach. Which uses a combination of different models that allows for introspective ability. Predictability and control it quite frankly, if you look at even what's happening in the large language model space. The same thing is occurring. It's being talked about a little less. But if you look at what a chat GPT will do, OpenAI or anybody else, if they've got a nuanced model, typically your prompt doesn't just go into a black box and pump out an answer. Typically, it'll decompose it internally into a set of models or even engineered code where that's more efficient. to run computations as it needs to to produce the result. And that's the architecture and the approach that Aurora uses today. So it's an AI-first system, but it's effectively a brain of brains. There's a set of AI modules throughout that system that allows for the kind of introspectability, intelligibility, and predictability that you simply couldn't get with a kind of macro end-to-end black box type approach. So that's how I describe the kind of three major buckets, technologically, architecturally. I think developmentally, there are also different approaches being taken. When I was in my last place, when Chris was at Google, Drew was at Uber, the typical approach in the industry was that of a kind of on-road empirically heavy development, which is to say, if you're a software engineer at one of these companies, You would write some code, you would compile it, you would deploy it to a set of vehicles, and you would run those vehicles for a certain amount of time on the road. And you would hope to encounter interesting situations, most of which will have been triggered by a human having to intervene or take over or whatever. You'll offload the log. You'll look at each of those interventions. You'll see kind of what failed or what mistakes were made. You'll try to fix those. And then the best opportunity you've got is you've got to redeploy it to the vehicle. You've got to cross your fingers and you've got to hope that it encounters the same situations. And you've got to hope that it can encounter all the other situations where you might have just regressed because you introduced a set of code that in the, you know, maybe you fixed one issue, but the preponderance of, you know, a thousand other permutations of what could have happened at that intersection would have caused you to have done exactly the wrong thing. And so it's this constant, you know, one step forward, two steps back, oftentimes two steps forward, one step back, kind of regressive, slow approach. Developmentally, when we started Aurora, and I think we were, we raised some eyebrows when we did this. In fact, Washington Post wrote a whole article about it at the time. It was back in 2018. We came out and said, look, that's the wrong way to develop. We've all led the world's preeminent programs. We've done this. We've seen its problems. We've got a clean sheet. We're using it. And the way we're going to use it is we're going to deploy a very small empirical on-road fleet. And instead of that being directly in the loop of development with our engineers, instead, if you're a software engineer at Aurora, you're going to sit down and write the code. But we are going to build the most performant simulation engine, the most robust and representative, highly faithful of the real world, that when you press compile and deploy, that's not going to go out to a few vehicles on the road. That's going to go out to the cloud equivalent of tens of thousands of vehicles. And within minutes, you're going to get the results back across a set of curated specific high challenging situations so that you know exactly how it would have performed had the vehicle encountered an intersection where one thing happened or another thing happened that you just won't get from the road. And so developmentally, that was the approach that Aurora took. And I've seen the rest of the industry in recent years has started to come along there. You hear a number of players starting to talk a lot more about simulation and how they're using it more. But at the heart of the developmental question is, to what degree have you developed that simulation capability such that it's a faithful representation of what happens in the real world, A? And B, you've got the pipelines and the training data with which to develop your system using it such that the on-road work that you do is simply more a verification, a validation of the system and not a developmental necessity. Sounds like the initial approach before 2018 was almost like whack-a-mole, a game of whack-a-mole that never ends, basically, because you're always... that doesn't scale well. It does build quickly and it gets you to a demonstration very rapidly. And so when we saw the large crop of self-driving companies that popped up in 2015, 2016, you saw a lot more of this because it's an expedient way to get to something that looks compelling and ultimately fails to satisfy the generalizable problem. The kind of non-introspectable black box end-to-end approaches, meanwhile, struggle with all the things that I just mentioned. Ultimately, they will struggle in unpredictable ways and be very difficult to convince regulators or the public that they're sufficiently safe for the road. Meanwhile, the most evolved approach that exists today is that compound set of trained AI systems that allow that introspectability, that predictability.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

you've already kind of alluded to this, but I'm curious when you started the company, you had a decision tree. Okay, we can do it this way. We can do it that way. And you clearly had all three of you had a lot of experience with the companies that you previously worked for. But I was pretty impressed when I reread your 2018 blog that sort of foreshadowed all the issues that are coming up again today. So can you just sort of talk about the decision tree you had when you started Aurora? If you started it again in 2024 today, would you make, the same decisions around this kind of system you would deploy today.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, we had a set of decisions to make, right? We had the benefit of a clean sheet, as I said. So we got to choose our development approach. We got to choose a hardware approach, what sensors we were going to use and how. We got to choose our software architecture, effectively how we were going to go about the development of the self-driving system. We even got to choose the off-board services, the market approach, a number of other things. So on each of these in turn, I think I talked a little bit about development approach. We opted for small fleet of on-road vehicles, much more rapid developmental cycle. We heavily invested in simulation early. In fact, our first acquisition was of a simulation company. that allowed us to build simulations that at our previous companies, we never had access to. So the ability to faithfully represent how radar returns look in a simulated environment, that's not a trivial task. The ability to faithfully represent even camera, right? Like gaming engines... We'll give you a what looks fine to the human eye, doesn't represent accurately how rays or how lights moving through the world, how it's reflecting off objects and how obscurance in the air or affecting visibility on the road. And so we put a lot of emphasis on that, which was a key foundational enabler for us developmentally. Technologically, when we looked at the hardware, we did have explicit conversations about do we want to go for just cameras and a very low-cost system, cross our fingers and hope that computer vision will get there to make this work? Do we want to go with just cameras and radar, just radars and LiDAR? How do we want to approach this? At the end of the day, we realized two things. First, state-of-the-art in computer vision wasn't then and still isn't now able to solve the problem to the level of reliability that's required. And it's fairly public what the current state-of-the-art in computer vision is for a variety of reasons. It's not currently good enough for anybody. And so, secondly, we looked at, okay, can we get it good enough? Can we get it there faster by burning the ship, so to speak? Best doing any other sensor and just going camera only. And the answer to that is no. And it's a definitive no that everyone, including those who would use just cameras, have come to in as much as co-training Computer vision with other sensors, which is to say, you know, put lighter on the vehicle. Give it ground truth. You'll get there faster. You'll get a more performing camera. So the computer vision system faster, because you have the benefit of the other sensors with which to Co train and provide ground truth. And so we looked at that and we said, okay, even computer vision is going to develop faster with the other sensors. let's go and find the most performance sensors we can. And we kicked off a search for a set of LiDAR in particular that would give us the benefits of long range, that would allow for highway speed applications, that would give us the benefit of instantaneous Doppler that allowed for segmentation of scenes. So as an example of what I'm talking about here, if you were in a dense urban setting you've got a massive points coming back from a lidar that that in which maybe a pedestrian a bicyclist a car all kind of clustered together your ability to predict the future motion of that clump of points is substantially hampered if all you have is the points If, however, you have instantaneous Doppler, you know how every single one of them is moving at the moment. And so you see these light up. You colorize that point cloud by Doppler. You now see clearly a pedestrian is in that cloud. You see a bicyclist in that cloud. You see how they're moving. You see even how the tires, some of the micro Doppler, how the tires are moving on the bicycle. And now you can predict how that intersection is going to evolve in ways that we could never do before. And so when we found Blackmore in 2017, 2018, We saw an extraordinary opportunity. This was a company that for 15 years had been pioneering frequency-modulated continuous-wave LiDAR. We won't get into all the details of it unless you're interested. But for a variety of reasons, had interference immunity, instantaneous Doppler, ranges that were double what you could get under eye safety requirements with a traditional LiDAR on whatever wavelength you chose to do it, be it a Luminar at 1550 or a Velodyne at 905. You could see further, you could see faster, you could see instantaneous speed, and you could do things like this magic of segmentation that I just described in an urban setting. So we bought Blackmar, and we developed there with the first light LIDAR that sits on all of our trucks today. So technologically, you asked about kind of developmental decisions. We may just pull back for a minute. On the hardware side, that was what got us into the hardware configuration that we got. And then we developed our own networking system that allowed for much better networking than was available at the time on a vehicle we developed, our own computer that uses a combination of off-the-shelf chips from some of the bigger players that you all are familiar with to allow us to both ride the S-curve they're all pushing but at the same time have a custom-tailored solution for the vehicle use case that allowed us to solve kind of the problems that we needed to solve here. And then on the software side, as I mentioned, we developed from the outset a compound AI stack that was AI first, that led with training, that led with learning, allowed for generalizability that simply wasn't possible otherwise. And that's kind of how we got to where we are.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Can I just focus in on the different segments of that? So first on cameras, you mentioned that you toyed with or thought about maybe having a camera-only approach and that technology wasn't there yet and still isn't there today. I know it's impossible to predict, but could it get to a point where you could only use cameras? Can you get enough sensory data from just a camera in all weather and particularly when you need to deploy it on a Class A truck that needs different levels of perception than a robot?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

I expect it will, right? If your aim is human-level performance, you have a compelling existence proof in the human eye and brain, right? So human-level performance, I expect, will ultimately be achievable with just cameras. There are two things. First, we aim to superhuman performance in a variety of ways. We operate through fog and rain that a human would be stupid to drive at speeds through that we can just see through with other sensors. Second, as I mentioned, you will get to a camera only level performance faster by using a system that has the other sensors, right? And this is distinct and somewhat different from the, look, I'm a burn the ships guy. You know, maybe by virtue of my legacy and the boss I once worked with, we found great success in a variety of ways by just stripping things out, removing all crutches and saying, Go. Ships are burned. Get them to shore before you sink. In a couple of areas, this is one of them, that is exactly the wrong thing to do. Because by developing only with cameras, you don't have the benefit of the ground truth data. You don't have the benefit of the co-training. And by consequence, the development of your computer vision stack will move more slowly than it otherwise might have. And so in my view, this was in our view when we started Aurora, we will ultimately get to a place I don't know that will ever depopulate things like radar. Radar see on a different modality. They have complementary failure modes. They can see through fog that cameras simply cannot. They can see through other precipitation very well. And frankly, they're inexpensive and getting even less so. LIDAR, as we scale up our trucking product, will get extraordinarily inexpensive as well. We're going to get into numbers that will lead, I think, commercially us to make decisions even once cameras are getting good enough that we might remove these other sensors. I wouldn't be surprised if we come to the conclusion, look, we're paying a few hundred bucks for these other sensors. It's worth the superhuman performance this allows in conditions that the cameras simply can't achieve because they're using visible light and we need sensors that can see through obscurance.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Yeah, some of the companies that use camera only suggest that when you use too many sensory inputs, it confuses the system and it becomes too complex. Any thoughts on that pushback on using?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, I think that is the explanation of someone naive in how to do good fusion. I think good fusion of different sensor modalities from the outset architecture. We'll get into whatever elements of our architectural decisions are interesting. In this particular area, we early on decided that the right approach is to leverage early fusion of the raw sensor returns. This is to say the pixels from the camera, the points from the LIDAR, and the returns from the radar. running into a single AI that processes all of those returns based on all of those sensors and results in a better prediction based on that information than you otherwise could have done. Now, you don't have to decide to the point of how kind of a black box works. You don't actually have to decide within the perception AI which you're going to trust and which you're not. Because it learns that when you have artifacts in certain situations on one sensor modality or another, let's say you're driving through tunnels and you're getting multi paths from the radar. Your system trained adequately learns the fact that the radar, when it. meets a certain characteristic is unreliable. And it learns to trust this camera and the LIDAR more. The other thing you do is on the output of the system, you leverage sensor fusion of the objects that are classified by pairwise combinations of these sensors. So you take radar and LIDAR, you take LIDAR and camera, you take radar and camera, And each one of those pairwise has a set of predictions that is better than any one single sensor will have been. And you're able to cross-check that way. So when I mentioned the question of, well, which can you trust? If you do it this way with early fusion and late fusion, you end up with a more trustworthy prediction. And make no mistake, ultimately, it is always a prediction, right? Perception is a game of statistics. At the end of the day, you are never 100% certain that's what you see, but you will get much higher probability, much more accurate information about what's happening in the world when you do multi-sensor fusion this way. then if you were to do it in a naive way to say, I've just got a radar object classifier and I've just got a computer vision classifier, now I'm going to just kind of juxtapose them and decide which one I trust. And I think those who have bemoaned the trust challenge are thinking of fusion that way. And that's simply not how I approached it.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

You don't have a level two system, at least that I'm aware of. Many of your competitors do. What's the advantage of not having one, in your opinion?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, focus primarily. So the commercial opportunity for a level two system and a level four system is not even comparable, as I'm sure you and all the listeners here appreciate. If you can remove the human from the truck, it goes beyond even the value proposition of a robo taxi. It's not just a cost abatement play for a $15 an hour gig economy worker. It is cost abatement for a $40 an hour truck driver. It's unlock of hours of service that would have limited that same truck driver to 11 hours. Now you're doing 24. Now you're moving goods at closer to the speed of air at the cost of the truck, which has massive implications for the entire logistics network. And so from a commercial lens, the decision to... Sorry, I've lost the thread.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Your question was about... Why not have a level two system as well?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Oh, that's right. That's right. So from a commercial lens, a level four system is far more compelling. Its value proposition is extraordinary. From a developmental... And whereas a level two system, AI system, I mean, you can't extract much value out of those. I mean, they're driver systems at the end of the day. Meanwhile, developmentally, if you are developing a level two system first, you are making different choices. You're making different architectural decisions. You're making different bias decisions. So in the industry, you may have heard reference made to precision and recall effectively. Precision is how often you expect to encounter false positives. recall is how often you expect to encounter false negatives. Just to say you miss the garbage bag in the road, you don't see the kid, whatever. When developing an ADAS system, your primary focus is on maximizing precision. You cannot afford a situation where your vehicle is driving down the highway and you false brake for no reason and you cause an accident behind you, which we've seen plenty. And so your focus is get rid of those. Now, in getting rid of those, you inevitably incur some cost of lower recall, which is to say the more likely event of a false negative. You're going to ignore more things. And as you ignore more things in the interest of trying not to kind of false positive break for them, suddenly you are reliant on the human to take care of the situation. The human's got to take over, right? Like you may not stop, you may not slam on the brakes with the kid, but hey, we've got a clause in the agreement with the human that they should have seen that and taken care of it. That's a very different mindset, and it leads to a very different set of architectural decisions than this idea that precision and recall have to both be at extraordinary levels because you're not going to have a human to fall back on. You can't false break for items in the road, but neither can you ignore them or miss them or expect that there's a human there to rescue you should you miss them. And then finally, from a kind of human factors perspective, The idea that one can kind of evolutionarily migrate from a level two system to a level four system is one that I have been at the heart of for many, many years, or was at the heart of for many, many years. The valley is just uncanny between those two things. You pass through a phase where the driver assistance system is... gets better and humans become complacent at a speed you cannot outrun in terms of how much better you make the system. Which is to say, when you see humans hitting into the back of fire trucks because your naive sensor fusion system can't differentiate, your cameras didn't notice them and your radar ignored static objects, which was historically often the case, you can, for an ADAS system, you get to a place where you have humans increasingly trusting the system that's getting better and better and therefore being not ready to take over when they have to take over, when you miss something. And so you end up getting into firetrucks, hitting in other things, and it gets worse faster than you can make the system better. And so that's a really, really challenging chasm to cross. And so when we started Aurora, we said, look, let's go the other way. Let's start with a highly profitable, highly productive self-driving system that solves the precision and recall problem together architecturally. And then if someday there is any need for, and the jury's still out on whether there will be need for, you know, large scale driver systems at the point at which we have a generalizable self-driving one, but if there is, We'll depopulate it. We'll remove some sensors. We'll thin out the code. We'll get it to a place where it is super performant, but it's going to share control with a human in a way that's, frankly, maybe more similar to what I did at MIT than what we did at Tesla.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

What you said there in the middle is that it's very hard, and these are your words, not mine, to have people remain inattentively attentive. It's impossible.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

That's exactly right. That's exactly right. And the better the system gets, the more inattentive they become, and it's really hard to control for that.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

You also, I think this is what you said, that the miles, the data that you gather in a level two system, all the, to use another firm's words for it, all the FSD miles, all the level two miles, they're not useful when developing a driver-owned system, or they're not very useful when developing a driver-owned system. Is that fair? Is that why you're so reliant on simulation?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, I wouldn't make a blanket statement like that. Let me maybe describe what is useful and what's not, just to kind of nuance it a little bit. To be clear, having access to a massive amount of highly curated, high-quality data is extraordinarily important, especially if you're going to go AI first, as we have. You need that data to train with. Now, the elements of that data that we found that are the single most important factors in it resulting in AI models that perform well is, A, it needs to be highly representative. And by highly representative, I mean it needs to be curated to represent the situations in the world that aren't just the basic commute to and from your work, right? If all I've got is a fleet of vehicles that are driving the same roads every day, encountering effectively the same things, I can't indiscriminately just pump all the data through my learner. Right? I can. It's going to be crazy expensive and not particularly useful. What I care about is I want highly curated, interesting events that are at the edges of what my system currently performs well on, and I want full rate data from those, which is to say I don't want a few megabytes that were triggered on an upload because a particular event happened, and I've got such a large fleet that I can't afford to offload gigabytes per hour from each one of these cars. So I care about the fidelity of that data. I care about it being robust and providing all of the information the car had at its disposal at the time of the event. I care about having it for a curated set of events that are representative of the fringe of what my AI is currently capable of doing. And the combination of those things results in typically better learning. And so when we get into, okay, so how are you going to acquire that data? Well, I could go out and put our system on a massive fleet, hundreds of thousands, millions of cars. I cannot afford to offload much data from those cars. At best, I can set a set of triggers for, hey, if you observe high lateral acceleration or these other things that might indicate something weird just happened, I want you to record. And then when that vehicle is on Wi-Fi or maybe over the air, I'll trickle some of that up to the cloud and I'll use it for some of my training. That is something of a drip kind of approach to the acquisition of data. And it's very difficult for me to target exactly what I want. In contrast, what we've done at Aurora is we've said, look, we care about high quality data, high resolution data, and highly representative data. The best way to get that is with a highly performance simulation engine that we can feed with interesting things from the road if we ever find something we didn't think of. But what we instead focus our engineering team on is build us out a simulation engine that represents what would otherwise require millions of vehicles in the world for an extended duration of time to have encountered these situations. We want you to stress the AI at the boundaries of what it could reasonably expect to encounter in the world. And we're able to generate those simulation environments and those tests in a way that gives us high fidelity, high representation, and highly curated set of events that we care about. So we can run those through our training, both what we encounter on the road and what we've developed in simulation.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Maybe to focus on just the brain of brains that you have, which is different than others. How do you optimize? How do you know that you've optimized the right amount of engineered code versus leveraging the AI infrastructure you've built?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah. So the way we think about that is don't learn what you don't have to. If there are certain invariants or rules of the road, put a control in place and stop thinking about it. It's the equivalent to just to put this in again, because the language of large language models is kind of permeating the conversation these days. I'm going to put it in the language of that for a moment, right? Like if I were to ask an LLM to compute two plus two, it would be using an extraordinary amount of computation if it just brute forced it through the LLM. And it might even get it wrong depending on what it's trained with. If I instead develop a brain of brains in my LLM and I say, hey, if someone asks a basic arithmetic question, I want you to break that down and have a set of modules, one of which is just straight up Python code that just runs the computation efficiently. We do the same thing on road, which is to say there are a variety of invariants as it relates to don't hit other objects. If a traffic light is red, don't drive through the intersection, right? There are a variety of things we can look through and it's a rule books and say, hey, bake those in. So there are a set of invariants the system simply won't violate. But then from there, let the system learn what it will. Get it to a place where each of these AI modules is learning things we never would have thought to have taught it. And it's got an emergent behavior that will generalize to situations it's never encountered before. So that's really the balance. is inject effectively, think of it as kind of nuclear control rods effectively to arrest and address in a much more efficient, computationally efficient way, basic questions that don't need a large AI to solve. And then for everything else, pump the data through it so as to leverage the benefit of that learning that was, as I said, performed across a highly represented and very large scale data set.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Maybe just to focus in on your LIDAR solution, because it seems to be an area that you feel very confident is differentiated relative to others. You've stuck with your own solution. Maybe just briefly talk about why you feel you're so much differentiated relative to others and how you see the development of that LIDAR coming to market over the next couple of years.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, yeah. So... I'll maybe explain FMCW, which I didn't before. So frequency modulated continuous wave LIDAR is a method for modulating the light that you're sending out of the laser. You can think of it as the equivalent of kind of traditional LIDAR is more akin to amplitude modulated radio or any other signal. which is to say you're kind of flashing the light, and you're measuring the time it takes for it to return. Now, when you do that, effectively, you're looking for the return to exceed some amplitude threshold. And the problem is at different ranges on, so let's say I shoot my laser, a pulsive laser, at a faraway black car, let's say, low reflectivity. That light that returns, depending on how reflective that object is and how far away it is, that may not cross the threshold for a detection of a return. The other thing is that threshold gets higher because there is a lot of contamination in the spectrum for that light. So let's say I use 9.5 nanometers, 1550, whatever it is, the sun turns out emits across a range of spectrums including this. And so I get interference. I get interference from xenon headlights that have some spectral content in that frequency. I get interference from other things, all of which compound to cause a problem where I can't see very far with a pulse-modulated LiDAR. And so when we talk about frequency-modulated LiDAR, instead of measuring the amplitude of return pulse, We're simply modulating the frequency, which is akin to FM radio, which is to say when we send that out, we effectively leave the light on. And when it returns, we can difference it against the emitted signal. And in that differencing, we get an effective amplification of on the order of 20 times, which is to say we see much further. but there are a couple of bonuses. We are no longer subject to interference from the sun, xenon headlights, other DC sources of interference, because it turns out they don't encode their signal with a frequency. So we can explicitly separate that. The other thing we can do is by that difference thing, we can actually see how the frequency has shifted based on the relative velocity of the target off of which it was reflected. So we get that instantaneous velocity. We know in a moment based on what the LIDAR is telling us how fast something is moving. We even know how fast its tires are moving, which gives us a very strong perspective about how the world's about to evolve. This is FMCW versus traditional time of flight LIDAR. Blackmore was the single best, remains the single best, it's part of Aurora now, but the single best developer of FMCW LIDAR. And when we talked with Blackmore versus everyone else we talked with, including the one company who at the time was starting to do FMCW as well, we saw in Blackmore's description of the path they'd taken to get where they were, evidence of many of these other companies whose claim to credibility and claim to performance was based on something Blackmore years before had found is ultimately a dead end for one reason or another. And so we saw that many of these other companies were chasing down dead end roads. And so the decision to purchase Blackmore for us was fairly obvious in all of these conversations. They were head and shoulders. They were leagues ahead of what others had done. By bringing them in-house, that allowed us to focus the development of this FMCW LIDAR on specifically the needs of the self-driving use case. And that matters because what we see in others who develop LIDAR for a broad swath of industries, out of necessity, they need to tell a story of large market potential, et cetera, et cetera. they have to design to the lowest common denominator for all of these industries. And they're incentivized to get something, rush something to the market, bake it into silicon photonics so they can tell really impressive stories about, hey, this is really low cost. It's got very little fiber. We've got the silicon photonics on the chip. In doing so, they effectively lock themselves onto their current S-curve performance-wise. And we've gone several generations beyond that performance curve. So even those who would portend to have FNCW, not portend, but have FNCW ladder, who rush into a generalizable product that they could bake onto chips, They failed to meet anywhere near the level of performance that self-driving vehicles, in particular, high-speed, high-weight self-driving trucks need for operation on the road. Meanwhile, what we did is we pushed the frontier of performance before optimizing, right? So the equivalent, if you're familiar with FPGAs versus ASICs, right? You want to get to where you want to be before you bake it in, because once it's baked, it's baked, and you're kind of locked into that architecture. We did the same thing with this. We got the first light to a place where no one can match its performance specifically for this use case. And then with the acquisition of ours and a lot of progress we're making, we're very excited to now harden that into silicon batonics that get it to low-cost, high-performance system that does all the things that others have been focused on, but does so on a fundamentally new performance S-curve.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Great. Thank you for that. I'd like to talk.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Is that too much? Not enough?

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

No, that was fantastic, actually. I was going to ask another question on it, but we have so many more. I'd like to focus on a topic. Sure. Your solution, you have one launch lane to start with and another one soon thereafter. I'm curious as to how scalable you think your solution is. In other words, how much more work has to be done to get to the third or the fourth or the fifth launch lane after the first two?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, that's a good question. So, yeah, you're right. We are currently operating on Dallas-Houston. We're moving to Fort Worth, El Paso. We're operating in Fort Worth, El Paso as well, which is over 10 hours. As we shared in our recent analyst investor data, our next plan is to extend to Phoenix or Atlanta, at which point our product will have exceeded the hours of service. And it's a material unlock, obviously, for the value proposition of a self-driving truck. So we're excited about that. In terms of the additional technical development required for any new lane, That's small and diminishing by the day. And what I mean by that is as distinct from robo-taxi use cases where any new market may present a fundamentally new approach to how vehicles navigate the city. The US highway network and the industrial urban settings, frontage roads, industrial parks, fringes of metro areas that we typically drive to to get to and from terminals and wheel for warehouse type applications is very self-similar. Which is to say, if you can drive one, you can drive most. And I think any of our listeners who have driven on highway routes would probably declare highway in Oklahoma not particularly different from a highway in California. They look and feel very similar. There were small nuances on them. For instance, in the southwest, we passed through an inland border crossing. That's something you don't typically see in a lot of other areas. We worked with the authorities to develop a solution that would get through there just fine. And once it's done, we can operate a variety of other things. Turns out toll booths look a hell of a lot like inland border crossings. And so a lot of the capability was developed to do that generalizes to other things, including a robo taxi, right? So our work with Toyota, we found that we were navigating toll booths to the airport fairly straightforwardly because we'd already done this work on the trucks. Other minor things, right? You deal with hills in some places, you deal with weather in other places. And so there is incremental work to be done, particularly as you move regionally between places and you deal with weather in particular is one area where You'll need to train the system and validate the system for operation in sub-freezing environments in a variety of other cases like that. Otherwise, we're finding that it's very, very similar. And when we bring up our truck and operate it on a new lane, we find that it's very, very quick to do that.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

Any rough number as to how many miles you need to drive on a particular lane before you feel like the system is going to be ready to go relative to the initial ones?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, that's a good question. So a rough number, typically we need a pass or two, and the map is largely done. The map is that there's a trope in the industry, I think, My read, I've tried to figure out where this originated. I think we had some players years ago who were pure play map providers, maybe. I won't name names. But I think at the time, it was in their interest to declare maps a heavyweight, high resolution, look how technically complex, look how much work it takes for us to build these things and deploy them and operationalize them. you all really shouldn't get into maps. Like, let us handle that. We're the mapping experts. And that caused, that seems to have stuck with a lot of people, this idea that like, okay, well, Like how many miles do you guys have to drive before you've got a map? Well, effectively it's one pass, maybe two passes. I'll give you two. But effectively the same trucks or cars, the same sensors that they're using for operation in the world are the things that are gathering the data that effectively automatically generate the Atlas that has been designed such that subsequent autonomous driving from any vehicle, anywhere, sends back a signal should the static world change. Let's say a road is repainted. Let's say construction changes something fundamentally. It'll send back a signal. The map will update. They're effectively self-healing with relatively little human oversight. So they're really inexpensive to make. They take a drive or two with a human in the truck before we've got the data with which to make them. From then on, they're effectively self-maintaining. So you've got to map it. I'm just kind of walking you through the steps that you take to bring online a new lane. You've got to map it, which is drive it once or twice. You've got to validate that the system is performing on it as you expect it to be performing. Typically for that, it is sufficient to simply look at the lane and say, what new situations is this going to cause the self-driving system to encounter that it's never encountered before? The really tough stuff was trained already, right? Like this is humans laying halfway under the vehicle, changing a tire on the side of the highway. This is something weird happening on a surface street as you're going to a terminal. These things, you don't have the luxury of saying, well, it will happen on one lane and won't happen on another. If it's going to happen, it's going to happen effectively unpredictably anywhere. And so the hurdle that you get over to deploy on that first lane is is many orders of magnitude higher than any subsequent. The reason for that is you are addressing every single one of these cases that could show up anywhere that you have to have solved. Once you've solved those, moving that system to a new lane, you don't worry about, you know, guy halfway under his car changing his tire because that's already done for the first time. You don't worry about, you know, what the hell is this stroller doing, you know, crossing the highway. You don't worry about the frogger cases, the other things that I suspect some of you may have seen from us because you've already developed the system to deal with those. All you're worried about, frankly, is what elements of the static world or the actors we expect to encounter on it are fundamentally different. And typically, as I said, that's very, very limited things like in Limbo or Crossing. If you haven't done train tracks yet, make sure you've got a train track, you've got train tracks in your AI. So yeah, that's how I think about it.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

We've spoken about this. One of the very heated debates around autonomy is the fact that there are other companies who are spending tens of billions of dollars building data centers, buying NVIDIA chips to train these models. Obviously, you're not doing that. Why do you think you don't need to? And to the extent you could try to understand someone else's thinking, why do others feel the need that they have to do that?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, that's a good question. It's tough for me to put myself in the heads of others, but I'll give you my best shot. As it relates to why we're not, it's because that is an extraordinarily inefficient and ineffective way to go about things. Like I said, if you build an unintrospectable, unpredictable black box model, the best you can do is create a massive data center and throw data at it. And this is the train and pray approach. This is the, hey, let's cross our fingers and hope it doesn't fail in some way. And let's take a couple of examples of the failures that you might encounter. Let's say you've gone with the train and pray approach. You've built up your data center. You've impressed a lot of people with how much you've spent on this. And that's pattern matching to the creation of large LLMs. And folks haven't gotten around the idea that like actually industry has moved past those. And in many cases, the most performing LLMs or models of models, their set of models, much more akin to what Aurora does. But let's say you're still there, right? And you're a company that is very tempted to throw a Hail Mary at a black box model because you may be more than anyone else have access to a set of empirical on-road data that is effectively free from the standpoint of its acquisition. Users are buying the cars and, you know, you got to pay for the cloud pass, you got to pay for the offload, everything else. But otherwise you've got access to a lot more empirical data than everybody else. That is a reasonable thing to say, look, I'm going to play my book, right? My book is, I got a lot of, I got a lot of assets out there capturing data. I'm going to use that. I'm going to train this model. Now, when you do that, Let's take a couple examples of something that might happen. You train that model. Now you've got that system running in the field. And suddenly you encounter, you know, somebody runs into a black car. One of your vehicles runs into another black car. uh okay weird that's what happened well we don't know uh and we don't know because we can't introspect this we can't see it was a problem with the perception system was a problem with the motion planning was it a like did we not see it at all like what happened we don't know okay well i guess go collect a bunch of images of black cars and pump it through the model and tell it like like like train it not to hit those things Well, it turns out like somewhere deep in the model, there are a bunch of weights doing uninspectable things. Maybe they're keying on the fact that like a lot of drivers of your vehicles drive over shadows and that a camera only solution sees these black car looking objects that everybody seems to be okay with driving over. So you keep throwing this training data at it with black cars. Meanwhile, you've got a massive corpus of data of people driving over shadows. That sounds trivial. It's not. Right. Extrapolate it to a tumbleweed. Right. You're hitting your system is breaking for tumbleweeds in the middle of the Texas desert. You want to train it not to do that. So you throw a bunch of tumbleweed training data at it. And then suddenly you out of the blue one day hit a child, God forbid, who's of a similar color to the tumbleweed. Maybe maybe they're just a similar size to the tumbleweed. You just don't know why it hit them. And you have zero guarantee that no matter how much training data you throw out, you cannot guarantee to a regulator or to anyone else that that's not going to happen again. And so you see where I'm getting at this. The lack of predictability, the inability to introspect and find out exactly why what happened just happened is a fundamental flaw that I expect ultimately will lead to a massive deficit of public trust. Frankly, a huge challenge with regulators who will look at this and say, you guys can't even tell me why you just hit that pedestrian. I don't want your cars on the road. I don't want this system out there. And so predictability is key. We've got really positive relationships with the regulators, and every time they have a question about why a system did something that it did, we can get in and say, look, there is an AI module that came up with this. We can't tell you deeper within that module. What we can tell you though, is there is no world in which that module could have output a decision like hit this pedestrian looking thing without us having engineered code that interstitially intercepts that and ensures that that model is doing reasonable things. um uh or it allows us to even look at you know what the model is outputting for different training data and specifically target what new training data is needed so that at least the perception is not making a mistake the motion planning system is not making a mistake different elements of the code are not are not the sources of the error um one last question for me and we have a question from the audience i think it's pretty good so i'll end with that sure um

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

You have a highly anticipated launch at the end of the year. Can you just talk about any commercial or technical hurdles that are still in the way of getting there?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Sure. Yeah, we don't have any major technical hurdles. Commercial hurdles, we got multiple customers with multi-year commercial contracts in place at this point. So we're in good place commercially. In terms of the technical work to be done, I say work to be done rather than hurdles because it's fairly well understood. As you may know, we're marching towards closure of the safety case, which is effectively the complete closure a set of claims and evidence to show that we don't pose an unreasonable risk to road users with our vehicles. As of mid-April, we were at 95%. This is something that years ago we had said we'd be at about 95% before we closed the kind of final validation in conjunction with the vehicle platform. We're expecting final closure and validation of the remaining safety case claims to be completed this year in preparation for launching without a driver on the road in a full commercial setting. I think the other metric that we've historically shared is what we call the Autonomy Performance Indicator. That's a key metric for assessing beyond the safety. Safety case is really the safety bit, right? Is it safe enough? And when that's closed, it is. It's validated. We've confirmed it relative to our expectations. Autonomous performance indicator is not about its safety, but it's about its performance and its cost effectiveness. The most expensive support we can provide to these otherwise safe vehicles on the road is if they encounter situations they haven't yet learned how to deal with, and they have to come to a stop and they have to be rescued with somebody who has to go out and pick it up or even a remote person who has to support it and deal with issues. The rescue, physical rescue is very, very expensive. And so we've effectively started tracking, I think a year and a half ago maybe, what we call API, Autonomy Performance Indicator. We reached an aggregate API of 99% at the end of last year, just to say an aggregate of the miles driven by our trucks. Over 99% of them required zero on-site support or rescue. And keep in mind, there will always be some. There's always a flat tire. There's always other issues that need that. And so because the signal from the aggregate API was no longer particularly useful, we started publishing not just that, but 100% API loads, effectively perfect API. What fraction of loads and trips needed no on-site support at all? During the first quarter of this year, I think we were at about 75%. We expect to be about 90% at commercial launch. We were up 13 points on the quarter, I think, in Q1 when we got to 75%. So the team's making good progress there. As we close out the validation, that final kind of robustness or hardening of the system performance is a fairly straightforward endeavor. So I think we're in relatively good shape there.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

So a question from the audience, I think maybe in five parts. So first, I think just basically, what do you see as Aurora's competitive advantage long term? Is it OEM deals? Is it proprietary technology? Is it first light light up? What would you point us towards?

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

Yeah, I think maybe to tie this to your question early on, George, that I didn't maybe fully answer, you asked about decisions we made early on, and I talked about technical decisions primarily, hardware, software, development approach. I didn't talk about commercial decisions, and I think that bears on this question. When we started Aurora, this was early 2017, we saw a preponderance of go-it-alone Solutions right through various forms of vertical integration, whether that was GM buying crews, whether that was Google deciding they were going to do the network and the software system, whether that was saying, we're going to build a vehicle and the network and the self system. But effectively, there were a lot of folks very wide eyes kind of vertically integrating saying, we're going to do all of this ourselves. Our commercial approach from day one was very different. We looked at the OEMs that exist in both past car and trucking spaces, and we said, why would we ever compete with companies that have proven over nearly a century in many cases competitive? the ability to pump out highly reliable assets that last 15 plus years and hundreds of thousands of miles, why would we compete with them? That's not our core expertise, it's our core specialty. Same thing with ride-hailing networks, same thing with logistics companies. There's no reason to focus there when our core, we were known as the, you know, I think we were referred to as the traveling Willoughbys of self-driving, right? Like it was the combination of me, Chris, and Drew, and the collection, the cadre of really exceptional people we built around us. They were all exceptional at one thing, and that was self-driving. Not at manufacturing vehicles, not at operating logistics networks. So from day one, then, we built this company based on this idea that the strongest ecosystem in the world around a single self-driving play would beat every other pure play team that went after the problem. And I think our results have borne out. I think today, bar none, the ecosystem of partners that have consolidated around Aurora is unmatched elsewhere. Two of the top three major trucking companies, the top global pass car company, the largest ride-hailing network in the world. One of the largest tier one suppliers is exclusively developing our self-driving kit for launch. Fleet service partnerships. It's an extraordinary ecosystem where enabled by the self-driving stack, each participant can effectively lift where they stand. They can effectively do their work that they're best at, aided or enabled by this ecosystem that works concurrently together. So commercially, that's why we've approached it the way we have. And then I talked earlier about all the reasons that trucking first makes sense. I don't think I wove in the kind of how we get into past car from trucking. So just to be clear and briefly, trucking profitably drives scale. Trucking has a value proposition that can cover the higher early costs of the self-driving hardware. That scale increases the experience of the self-driving system and it decreases the costs as we leverage economies of scale through Continental and other partnerships. Once we're to a place where we've got thousands of trucks on the road, we've got a low-cost system and a highly performant driver that can operate very, very well at high speeds. We launch into ride hailing with Toyota, with Uber, on the most profitable ride hailing segment, which turns out, maybe not surprisingly, is the high-speed segment. It's airport-type trips. It's long trips. It's the 45% of trips that exceed 55 miles an hour. When we launch there, that's a segment of the ride-hailing market that is difficult for others who haven't mastered the art of high-speed driving with long-range sensors like First Light. And then we migrate down market into the denser urban environment. So that's kind of the sequencing over past the market and why we've chosen it as we have.

George Genericus

Sustainability and Autonomous Driving Analyst at Kinecore Genuity

It's a great place to end. Sterling, thank you so much for your time. That was awesome. I learned a lot during that conversation. So good luck. We hope to see your trucks without people at the end of the year.

Sterling Anderson

Co-founder and Chief Product Officer at Aurora

We're excited. We're excited. Thanks for hosting, George, and good to talk with y'all again.

Disclaimer

This conference call transcript was computer generated and almost certianly contains errors. This transcript is provided for information purposes only.EarningsCall, LLC makes no representation about the accuracy of the aforementioned transcript, and you are cautioned not to place undue reliance on the information provided by the transcript.

Q4AUR 2024