The Data Cloud Podcast

Deciphering Your Data with Peter Bailis, Founder and CEO of Sisu Data

Episode Summary

This episode features an interview with Peter Bailis, Founder and CEO of Sisu Data and Assistant Professor of Computer Science at Stanford University. Peter’s research focuses on the design and implementation of post-database data-intensive systems. In this episode Peter talks about how Machine Learning can inform reopening strategies amidst the Covid crisis, cutting edge Machine Learning research taking place at Stanford, the future integration of AI in our day-to-day lives, and much more.

Episode Notes

In this episode Peter talks about how Machine Learning can inform reopening strategies amidst the Covid crisis, cutting edge Machine Learning research taking place at Stanford, the future integration of AI in our day-to-day lives, and much more.

--------

How you approach data will define what’s possible for your organization. Data engineers, data scientists, application developers, and a host of other data professionals who depend on the Snowflake Data Cloud continue to thrive thanks to a decade of technology breakthroughs. But that journey is only the beginning.

Attend Snowflake Summit 2023 in Las Vegas June 26-29 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization.
Learn more and register at www.snowflake.com/summit

Episode Transcription

Steve Hamm: [00:00:00] well, it's very nice to meet you. And I wanted to start off with some questions about the company itself and you know, how, and when you started it up, I've been as a journalist.

I've been following startups in Silicon Valley for about more than 35 years. So it's been quite a while and I'm still fascinated with the stories. So if you could tell your founding story, that'd be fantastic.

Peter Bailis: [00:00:23] Yeah. So, CC started as. An offshoot from my research group at Stanford, where I'm on the faculty. And I had showed up at Stanford in 2016 as a assistant professor of computer science and was really interested in this trend. We were noticing with some of our industrial sponsors and then also even looking more broadly kind of Fang, Google, Facebook, Netflix, the world where even the top 1% of.

Companies that were just so far ahead of so many others in analytics, didn't have that many analysis resources to answer all the questions about the day to day are collecting where, you know, sometimes the, you know, the ads team for example, would get a ton of analysis resources. But as soon as you went to the mobile engagement team, you know, you'd be looking at a substantially lower head count.

And if you went down even to like an individual ad campaign or a product manager, there was no way that they were going to get. Analyst resources. They just get kind of a log in to some kind of portal that would give them access to huge amounts of data, but, but not, it really helped them make use of that data on a daily basis.

And you know, the broader trend is that, you know, state has gotten easier to collect. We can kind of afford. In my opinion for the first time in history to start to track, not just every business metric, which is almost table stakes, but, but really recording these metrics in really fine granularity. Why don't just have, say transactions, on a retail branch or transaction on mobile.

I have tons of information out, you know, demographics and attribution and, user activity and a ton of context associated with each event that, that essentially, It's hard to make use of in a meaningful way, using kind of legacy BI and analytics environments. And so we started CCU in 2018 after having done about a bunch of research in scale, a means of analyzing contextualizing and operationalizing the types of large amounts of structured data that we'd seen at kind of thanks scale.

and we're seeing increasingly in the market, especially in snowflake. So, so, so it was really kind of this combination of huge market trend with a bunch of hard research and kind of tons of people who had done the work to gather this data, but weren't really sure what to do next.

Steve Hamm: [00:02:43] are you talking about being able to, have data about the data metadata and being able to track that or some kind of governance around the data?

Peter Bailis: [00:02:53] Yeah. So, so I'll give a concrete example. many of our users, you know, start with something like Tableau, where they've got a business metric, they're trying to optimize it's plotted and Tablo or Looker or any number of BI tools. and they know what's going on day in and day out in terms of, say engagement or retention or margin or volume.

So, so it starts with a business user that has a metric. They want to optimized. And in some sense of, I am say a marketing operations manager, my job is to help make recommendations to the rest of my team in terms of coming with the best ROI for my marketing spend to drive activation or engagement or certain type of user.

And where CCU comes in is there's already a bunch of tools for telling you the what behind each of these metrics. And what CCA is about is helping understand the why. So when activations increase by 1%. Why did they increase? Is it something about a new campaign as something, how a given demographic, is it something that a given product, is it combination of these variables?

What we're really doing is we're using all of the, like when I call contexts, you can think of these from a data perspective is more columns associated with say every conversion. And we basically provide that context. That's most relevant to say, you know, teenagers are using the referral program, and driving, you know, half point increase in conversion rate this week compared to last week and the promo code that we released.

Two months ago, I actually saw a decline of a quarter point. And so when you see that actual, quarter point increase, it's actually a combination of something getting better. Something's getting worse. And the whole value prop here is that you would never take the time to look at all these different combinations.

There can literally be hundreds of thousands to millions of these possible factors inside of a modern warehouse, but no, one's got the time to go and look at that to really get to the Y.

Steve Hamm: [00:04:52] it seems like. Part of this is about context, understanding context. And part of it is separating the correlations from the causations because often there are these apparent patterns that are, that someone might think are causative, but in fact they're not, they're just they're related, but they didn't cause something to happen.

So it sounds like what you're saying is your technology is able to penetrate. Deeper and really, and make the connections, you know, kind of a firmer, and more convincing connection.

Peter Bailis: [00:05:26] That's one way of putting it. I think there's no substitute for a true AB test. Right. Actually trying a new campaign out in market or releasing a new feature and having a control group and so on. Right. That's a whole, whole. Field of statistics and so on. So to get true causality, you really need to do what's called an intervention or run an experiment.

What we're really doing is we're helping service the naturally occurring experiments and almost the lowest hanging fruit that's already present in the data to help guide that action, that experimentation. That someone's already gonna do anyway, because there's already say the rest of the marketing team, that's going to be releasing new copy and going to be doing new campaigns and you don't update their targeting.

And so the way I think about the CCS value prop is it's almost by identifying these, you know, statistically significant, highly impactful factors within the data. We can almost nudge each department in which we're deployed. Towards making different decisions they would otherwise make. And they were the ones who ultimately take the action.

But you know, our job in some sense, by delivering this, Y is to sort of arm these business teams with the information that might cause them to go left instead of right. Or go target, you know, millennials instead of, you know, Middle-aged folks or, or whatever the combination happens to be. And it's in some sentences, you know, all the machine learning is based on correlation statistics.

and again, you know, interventional studies are a whole other, bag of wax. Like anyone's telling you causality without AB testing is, is probably selling snake oil, or at least know. You get a lot of interest in statistics, world, if that were possible. but I think of it as there's too many factors to keep on top of, especially when you think about the data that's refreshing on a daily or an hourly basis, business is changing faster and faster that even just being notified about significant changes and factors that are having a large impact, like driving lift or increasing order value, or decreasing margin, that can be a real game changer for our business compared to just.

Plotting a metric and looking what's happening with the metric without understanding the why.

Steve Hamm: [00:07:35] silly question. What did you call the, the company SISU is it CCU or

Peter Bailis: [00:07:41] it sees you. Yeah. So CC is this finished word. and I'm actually part finished. My grandparents were finished immigrants to Canada back in. No, that's the seventies and it's a word with no direct kind of a translation in English, but. In a sense of translates to this type of stoic, determination, tenacity, and grit bravery.

In, in, in the thing that I like about CCU a lot is that it's not like a goal. Like, you know, you don't achieve bravery or resilience. You don't achieve CCU. It's kind of like a way of living and working. And it kinda means like, you know, you're gonna run through walls and you're taking these big swings and stay hungry.

And I think it reflects not only how we operate as a company, but also. How we're kind of running through walls, for our customers taking these big bats for the analysts that use our product, you know, the great analyst never stops asking why, right. They want to look at an answer from every angle and question every assumption and find kind of the most comprehensive way to answer it.

So many of our users already have CCU as a, as a personality trait and I give them CCU as a, as a software it's, you know, turned them into, into superheroes.

Steve Hamm: [00:08:53] is there a scenario that you can talk about, about, of a customer using your technology on kind of a real world problem?

Peter Bailis: [00:09:01] Yeah, totally. so one of our customers that actually uses Seesaw and snowflake together, is housecall pro where they use these technologies together to understand and diagnose changes in metrics like revenue and recurring revenue and customer retention. And as a business housecalls trusted by about 15,000 different home service companies that help manage all sorts of Home service and, and kind of business related functions from, you know, how do I schedule, how do I bill and all and all sorts of stuff. And in a really volatile market, especially with macroeconomic conditions, really critical for that team. And has a really nimble and agile team to understand how their customer behavior and usage is changing.

Right? So not just like, how are we doing in terms of, in terms of recurring revenue, but why is this changing, what types of new user patterns are we seeing? how should we better optimize a customer experience and are. Core user and kind of executive sponsor at housecall his name's Vanessa. She runs a pretty lean team underneath the chief operating officer.

And, you know, before CC, you know, the COO had basically, you know, almost given up on, on asking all the questions that he had because yeah. She didn't have the resources as with most heads of analytics. In fact, every head of analytics I've talked to is intellectually honest will say, I don't have the resources to answer all of the questions that are coming from my COO.

These metrics are defined once a quarter more often, you know, once a year, but the data's constantly changing. And what we essentially able to do is look at all of the types of data associated with say customer usage and revenue, everything from channel to specific products and features that are being utilized to firmographic information and.

At each point in time, essentially point out, you know, what's moving the needle for housecall and how do they do more of the stuff that's working really well. And how do they address and adjust for. Changes in consumer behavior in this market. And that's been really rewarding to be able to work with folks like Vanessa and other kinds of data analysts at places like Samsung, looking at a customer conversion and upgrades of handsets all the way to, you know, store operators like mixed, which is a salad chain, basically looking at how do you optimize, time to, get someone their order and, Yeah.

In a competitive environment, especially with all of these delivery services, right? How do you optimize for loyalty and you know, the fun part about all of this and what we keep coming back to is the data exists, right? It's always CC plus. Existing enterprise data. And we kind of use CCOs kind of the unlock to take this existing process, which is often reactive business owner asks the question.

Analyst team runs a fire drill comes back to speed that up by at least an order of magnitude, and then just continuously deliver these results as the data keeps updating.

Steve Hamm: [00:11:55] So are the core users, data analysts, data scientists, people like that, or, or is it available to people who are less skilled in writing algorithms? We're writing queries.

Peter Bailis: [00:12:07] Yeah. So one of the reasons why I started CC was that when you look at the people who are, who are using dashboards and reports today, they are not SQL experts, or they may not even heard of SQL. And they're certainly not data scientists. And for me, you know, as a founder, you know, I just couldn't.

Live with a future in which the best thing we'd have in another five or 10 years is a Python notebook or a, a SQL driven, BI tool. So, so, so, so with CCU, we're going out to that user who it's conversant in their data, but it's not a SQL expert and may not even be a data analyst. what we find is a core constituent for us is that analyst who.

You know, sets up dashboards and reports. but also increasingly it's also embedded within the business people with titles, like, you know, FP and a director or marketing operations analysts, or just store operations analyst, right? These are people who are embedded and really looking at the data, but are not the ones setting up the warehouses or setting up the dashboards and so on.

And it's these teams who get asked kind of why. Dozens of times a day, right? they're often viewed as a service function. They're creating dashboards, or they're going to explaining things to the business and our goal. And our belief is that these analysts, whether or not they're in a. Center of excellence or they're embedded, who are the most critical people to navigate change and kind of these uncertain markets, right?

So the name of the game today, especially in the current macroeconomic climate is not predicting what will happen next. It's understanding even what's going on. Right? So understanding in the moment what's driving the change and then informing these decisions. Well, that window of opportunity is still open.

So no matter what you call these people, what their title is, they're increasingly viewed as kind of the trusted people. To inform that business strategy with more than just gut impact, gut feel.

Steve Hamm: [00:14:02] Now most of the examples you've given so far today have been kind of marketing or consumer oriented. Is your product focused on that segment or is it more broad?

Peter Bailis: [00:14:13] So with the type of statistical analysis that we're running, like so many other types of machine learning, you need data to make this stuff work. So if you're running a clinical trial and there's 30 people in the clinical trial, that's a large clinical trial. but if it turns out that, you know, eating carrots has not improved cardiovascular health, there's only so many more hypotheses you can test, maybe broccoli helps improve cardiovascular health, maybe it's potatoes, maybe it's eating grapes, like with enough trials on a limited amount of data, you'll eventually find some that are false positives.

And this is why there's a lot of reproducibility crises. And kind of core science today, just because historically we've been limited by the amount of data that we have, especially in, sort of datasets and tasks that involve people. What I think has happened over the last five years is that at least in consumer facing businesses,

these businesses for the first time ever are essentially flush with data on the order of hundreds of thousands, if not hundreds of millions of events coming in every single week about their business and what those large data sets represent for us is a ton of training data in a ton of, essentially input that we can start to derive signal.

And so when you think about the scales that we operate at. Up to hundreds of millions of records, direct to consumer businesses are a great fit. We also do a little bit with higher volume, B2B businesses. So for example, Upwork as a customer, they have a two sided marketplace. They look at things like match, rate and margin, and it's high enough volume that we can also provide value there, where we don't play in.

What I think is a really hard statistical problem is. If you're a BDB company and you've got 700 customers, there may not be enough signal in the data in order to draw meaningful inferences and.

And, and from a user perspective, we want people who are not making decisions on annual basis, but people are constantly making decisions that they can easily test and learn, which overwhelmingly on if you're making decisions on a, on a weekly or a daily basis is typically one of marketing. It's easy to change your campaigns, finance, easy to change pricing and allocation of spend and an ops.

Steve Hamm: [00:16:16] What was the last thing?

Peter Bailis: [00:16:18] Oh like operations. So store operations or, or logistics, or, you know, any kind of like automated process. Like what we don't want to do is inform someone's decision about how to build the next great mobile phone. When, you know, there are so many, you know, qualitative factors that go in in terms of taste and, and, and in terms of, you know, what's been tried in the past and human ingenuity, what we will do is will help inform that product launch, where product managers and product folks should be in the room, but it's really about how do you quickly test and learn to make decisions.

And at the end of the day, like customers get value from CC, not because we tell them things that are interesting. We tell them things that change the decisions that they would otherwise be making and the more decisions that are being made on a more regular cadence, the more opportunities we have to subtly move the business forward.

Steve Hamm: [00:17:03] you made a reference before to the fact that we're in the, in the middle of the COVID crisis and that a prediction isn't as important today as it was five months ago. And people are really focusing on just understanding what is happening right now because of the chaos, because , it's just an unprecedented situation.

does that challenge machine learning? You mean, because so much of what machine learning is, is kind of past data and patterns that pop out of it. So how do you adapt machine learning for a time like this?

Peter Bailis: [00:17:38] Exactly. It's a great question. I mean, in some sense, I think it was the coy that said that COVID and this was at the very start of this pandemic. That's kind of the black Swan event of 2020. And the whole point of a black Swan is that, of course it's probable and that it's not a nonzero probability, but you're not gonna model it as precisely as cleanly as you could otherwise.

Cause they're just less training data. And you see this in a lot of ways in the failure of things like predictive maintenance, right? Like, when we were at, on campus working with some device manufacturers and industrials, they want to predict failures, right. So they could go and maintain these generators and field equipment.

And it's like, when you only had two examples of a generator failing, you know, it's very hard to model that. Unless you have a ton of domain expertise and same is true. When you think about modeling the economy or consumer behavior right now, we've never had like a sand standstill on a national level, no less a global level in so many different countries.

And so just understanding what's going on in consumer's mind, how do national policy decisions, which are. Also an uncharted territory impact. what's gonna happen next. you know, you can ask any, any chief revenue officer, and enterprise business today, what their, what their 2021, revenue projections are.

And anyone who tells you that they're certain about them , is bluffing, right? So there's just huge uncertainty predicting the future, even if you've got domain experts to go and do this. And so for our perspective, you know, where we think the most value is to look at trailing. I'd say one week of data, one month of data where.

There's going to be ultimately a strategy driven by the business, right? So when we talk with retailers that are now reopening, right, they have a thesis in mind and they are likely the best people to actually use human ingenuity, reasoning, to understand which stores to reopen first, how do we comply with local regulations, which in many times are unclear and so on and our goal.

And I think where ML can shine is in that near term window, looking at the last week for the last month of data. What is working and what it's not. Okay. And the value of the data and the machine learning is not so much to predict what will happen next week, but to really check our intuition, right? If we believe that suburban malls are going to see a larger increase in same store sales, then urban environments stores in urban environments.

Well, the data can tell us if in fact that is true. And as we decide the next week, what stores open? Right. I can use the findings from this week to further inform my strategy. And so from a process perspective, it's, it's keeping the human in the loop as opposed to giving a black box model, which will spit out, you know, a forecast, what might happen.

It may not be explainable, may not be, interpretable and maybe wrong when you have these. I mean, we could have another stimulus package coming out next week and no one would be super surprised if that happened, but I have huge macro influence and you see it in the stock market as well. So it kind of a long story short there's so many variables.

You're going to have to go based on some synthesis that's likely human generated in order to, in order to make macro scale decisions. And what do you think that the data has the opportunity to shine? Isn't testing the reality of what's happening on a short term basis as we're executing on those longer term strategic bats.

Steve Hamm: [00:20:53] you mentioned the relationship between Susu and snowflake earlier, when did you get together? How do the companies and their technologies work together?

Peter Bailis: [00:21:04] the reason why we spent so much time with snowflake is customers adopting stuff like are on the cutting edge of, I've kind of cloud adoption. And we find that many of the customers adopting snowflake are the most progressive

leaders and data. Right. And that they see the TCO, they see the value and, and snowflake also supports in many cases, substantially more advanced functionality from an architectural perspective than many other alternatives in the market.

And so we see snowflake as a clear leader in that warehousing space and essentially being the source of truth for. These businesses and where Caesar comes in is you say, look, you can buy a bunch of BI licenses and throw them on top of snowflake, but all the investments in data engineering you're using, in terms of pulling data from API APIs, doing ETL, cleaning up this data, right?

CCU can help you unlock that truth for more people in the business. You know, we can help you use all that data to answer these critical questions of why. Not by telling you what your metrics should be or building another dashboard, but by understanding what metrics are you tracking today, what's important to you and keeping up to date as that data keeps updating.

And so. It's kind of this virtuous, psycho where even from the it perspective, they like CC because she used to let them use all the data and justify additional spend in terms of pulling more or data streams in as opposed to just having the data, sit there and having people only look at the high level metrics, never digging deep.

Cause we've got businesses to run and not enough time.

Steve Hamm: [00:22:38] I want to drill down a little bit on the relationship with snowflake and understand how the cloud data platform really enables customers to do things that they couldn't do or couldn't do as well previously. you know, you talked about how important the relationship is with you, but what's the, what's the differentiation from the customer's point of view.

Peter Bailis: [00:23:02] So for these customers of these cloud warehouses, they just didn't ability to provide insane speed and scalability, which to a first approximation aren't really cost effective. in some cases not computationally feasible. In a, on premises environment. So you can afford to run more analyses of the same data by spinning up and down instances transparently to the analyst.

And you don't have to worry about, you know, the intern nuking the report for the CEO. well they're doing much of exploratory analysis for example. And when you have. Literally on prem hardware, you're gonna be constrained by how quickly you can add servers to your data center. And that's really slow compared to clicking a few buttons and scaling up and down.

And we've seen this happen in the more general cloud market with compute and how, you know, services like easy to have enabled elastic compute. And I see snowflake as a natural extension of this, just the ability to go up and down allows you to capture more data at a way lower costs.

And then perform more computationally, intensive analysis. That otherwise would have been essentially in feasible on a, even a very large, rack of servers. And for us, this is kind of the leap we first observed at Stanford working with some of these very large tech companies and that with a ton of compute, you can run way more kind of statistical hypothesis tests and models.

And so on that you otherwise. Wouldn't be running on prem and then couple that with the data that's available inside of a cloud warehouse, which has consolidated as ETL, it's arriving more, more quickly and, and in a way that's fresh. And also the data is less siloed. Right? So many times we talked to people who have just, haven't made that leap.

You know, they, they groan about how their data is in 30 different databases with different access controls and so on. And it's like that world is, is very quickly, you know, Becoming, you know, the world of yesterday, just be just because cost convenience scale. It doesn't make sense to run this stuff on prem, unless you've got some insane, you know, data security requirements, in which case there's options like gov cloud.

Steve Hamm: [00:25:07] you made a couple of references to your, your work at Stanford. You're on the faculty. You're an assistant professor. Now you've stepped back somewhat from your faculty activities to start and run CC. Do you still kind of balance these two roles together or have you pretty much stepped into the entrepreneur role full time at this point?

Peter Bailis: [00:25:29] So it's a great question. I started to CC because I was having more fun shipping code and building out products from campus with some of our collaborators than writing more papers. And I think for me, fun is having that kind of impact and seeing, yeah. These techniques and algorithms move the needle on real customer problems.

You know, we had done some work with the Microsoft product analytics team, looking at things like Skype, call quality and call volume, and some of the Google ads folks we've written papers with these folks. And, you know, that's like the most advanced team in the world. And yet we were do something that was net new for them in terms of how they were using their data.

And I couldn't pass up the opportunity to go and build a company. Really for me, it's about the team that combines a skill set that I wasn't seeing people coming together, organically. So specifically there's a lot of database engineering to be able to do this compute. Right. We have to do this in a specialized engine cause it's just so heavyweight provide interactive speed.

You need net new, database, engine, not storage. Like we leave the storage, the snowflakes, the world, but the analysis on top of that database problem. relevance, you gotta do ranking and relevance, which is an ML problem. What results are useful to this user right now, and this period of time, based on what they've done in the past, the data we know and everything we know about them.

And then there's design, like how do you make this useful to a user who doesn't care about machine learning? It doesn't care about data just wants to answer a business question. And so that drew me over to CC was the ability to work with people who are way better than I am in so many different categories and, and build a team around this.

And. As faculty it's, you know, Stanford's an amazing place. you get, you know, the best students just showing up on your doorstep, asking to work with you. It's, it's kind of, an embarrassment of riches, but you ultimately can't hire designers and engineers and field team and marketing. Like I tried hiring engineers and designers, but you know, you can't compete with private industry there unless someone wants to do sabbatical or something.

Steve Hamm: [00:27:22] At Stanford, you were the co-leader of Don a research project focused on making it dramatically easier to build machine learning, enabled applications. Could you tell us a little bit more about Don and also about how Don relates to CSO?

Peter Bailis: [00:27:39] Absolutely. So Dawn was a. Multi disciplinary collaboration with folks in computer architecture, computer systems, databases, machine learning, where we got together back in 2015 when we were, first putting this together. And, we're kind of scratching our heads, looking at the quantum leaps that were being made in, in ML technology and benchmarks, like image net, and kind of.

Looking at this gap between what academics were doing in terms of more and more accurate machine learning and what people were actually adopting and practice. And that there's this almost irrational exuberance around what AIML can do when you look at these demos. But when you actually try to take a pre-trained model and apply it to enterprise data, you don't get the same mind blowing results as you might initially think.

And. What we realized in our core thesis in the Dawn project was not that we needed newer and better models that you need to get on the whiteboard and necessarily come up with, you know, new, network architectures or new training algorithms. But what's kind of missing from the picture was. Was essentially systems I can go end to end all the way from the data and the domain knowledge that an expert inside of an enterprise already has all the way to a model that's running in production that can be updated as new data arrives and can be, you know, diagnosed with quality assurance and serve predictions to people.

Who've never heard of ML. And we came together to do this kind of crazy project to ask, how far could we push this with. Collaboration with industry sponsors. And we had a bunch of great folks, you know, Facebook, Microsoft, Google, VMware, VMware's that are really strong supporter since the start and financial, range of these folks to basically fund research that no one in the NSF would fund because it's just too far out there too speculative.

we would build a bunch of prototypes and release them to the world and then we'd learn and iterate. And yeah, this was in some sense how. Myself and my co it's a Haria we're trained as graduate students as part of the five-year project called the amp lab at Berkeley, where day started spark. And I did some of my dissertation research on databases, and we wanted to bring that model to Stanford because we felt we could take bigger swings.

And my philosophy at Stanford was look, the value of being an academic is taking big bets that are going to be wrong some of the time. What were you learn along the way? And we felt like they build in these types of end to end systems. We would learn a ton and do a bunch of good research and produce a bunch of good PhDs as a result.

And so were Dawn relates to CC was that I had, even before I showed up at Stanford, officially started looking at this problem of what do we do for ML on top of structured data. You know, neural networks were really, really well for images and texts, but most of the world's data and I would argue most valuable data is in tables.

It's in snowflake, how do we make this data more useful, especially when you've got elastic compute to put on top of it. And so the earliest prototypes, of some of the. projects we did in Dawn we're direct inspiration for what we built in CCU in terms of the types of problems we went after, like monitoring user metrics, explaining what's going on, making time series a more interpretable to users and really getting that human in the loop, data interaction.

And, you know, a bunch of my students continue to work on interesting problems and Dawn, everything from, stream processing and building models over streams, building models, and we're compressed data. You kind of all topics related to systems , for machine learning on structured data. But we really took our production experience in our experience.

Productizing. Some of the prototypes that Dawn, as inspiration for what ultimately came to CC platform. And by being able to work with again, designers and professional engineers and an amazing field team, there was just this opportunity, at least from my perspective, to build something bigger that you just couldn't do in a lab context.

by the time we had completed some deployments at relatively large scale in Dawn,

Steve Hamm: [00:31:51] I know you're a champion of democratization of data and also obviously of machine learning AI. I want to ask you to put on your visionary cap for a second here look ahead five years or more. How do you think the business landscape and society will have changed because of these incredible technologies that we've been discussing?

Peter Bailis: [00:32:12] Yeah, it's a great question. And one that I think about a lot keeps me up at night, just thinking about all the different directions in which all of this energy and excitement around data is going to go into take it.

Steve Hamm: [00:32:23] It keeps you up at night out of excitement. Not out of dread.

Peter Bailis: [00:32:26] Yeah. Yeah, exactly. No, I'm not, I'm not a big believer in Skynet. These guys who are going in gals going for, you know, AI complete, I'm not holding my breath and I don't recommend it on this podcast as either. And you know, the first way I'll answer the questions. People think you will hear about self driving cars.

Which by the way, still have not made it to mainstream, despite the fact that every step self driving cars trying to do the same things, navigation, pedestrian, avoidance, avoiding other cars and so on. That's it that's a barely verticalized tasks still. Hasn't made it into me mainstream. they think about, they look at their business, say, Oh, I'm going to have a self driving marketing department.

I'm gonna have self-driving sales department. Self-driving pro it's like, that's like lofty aspiration for sure. But I think it actually. It doesn't give enough credit to the intuition and organizational knowledge that so many organizations already have and what people do on a day to day basis, especially as knowledge workers.

Right. I just, I think that the idea that AI will automate everything is, is a bit naive. When you look at what these models are capable of. For me, the five year vision here is one in which. Know, you've been such an augmented, the rote and routine and repetitive tasks inside of businesses. Some of this is the form of, startups you see, in terms of robotic process automation.

I think a lot of the diagnosis of, businesses and what's going on that's right for automation. That's what, that's what we're doing with, with CCU, obviously. but in a nutshell, it's really about taking the boring stuff. And better informing people about what's going on so that the people can do what they're best at, which is, you know, creative thinking and strategic thinking and where I see this going longterm, especially if you think about what's happening with systems like snowflake and the increased amount of data and context being put into these systems, you already kind of run businesses based on metrics in the form of.

You know, at the top of every, every company you've got the CEO looking at P and L and a number of top metrics like engagement or retention, laddering down through the org chart all the way down to increasingly leading metrics within each business unit with each department, it's not just the tech companies that are running their businesses based on KPIs.

This is happening everywhere. Fortune 100 fortune 500, they've got dashboards. And you know, where I think the future of this kind of augmented enterprise goes is. Everyone inside of the company knows not just where they want to go, but how they're doing and how their strategies are playing out in real time.

And they're actively adapting and responding to changes in the business that are capturing their data in real time. And the actions that they're taking, because so much of this data comes from software as a service platforms and are being taken digitally. You can actually track those actions. Such that if I'm Jack Dorsey at Twitter and I've got my 10,000, employees below me, I usually get a feed of here's everything that happened inside of Twitter.

Today here, all the changes we made here are all the decisions we're making. Here's, what's really moving the needle. You're not going to replace the people. You're gonna make them more powerful. And you're gonna give more organizational understanding of what's working. And what's not. And I think you'll see this dramatic productivity improvement, because you're going to get what you would get if you could afford to put one analyst per se business decision maker, but even at the largest scale, the Googles of the world, they can't afford to do that.

And so that's where the augmentation I think is so rich. And that's where I see the future of all this structure data's heading. It's just knowing everything going on inside the business all the time and having this ladder up. In a way that reflects the org chart and organizational priorities.

Steve Hamm: [00:36:09] You know, since the beginning of artificial intelligence, as academic domains, people have been talking about general machine intelligence, it's always been a long way off. Do you think that that's, I mean, are we closer to that or is it even desirable to pursue that?

Peter Bailis: [00:36:31] I think there are substantial downstream effects. Yeah. AGI is realized, but based on current technology, I believe we're years away in the sense that if you look at where these advances are coming in terms of, AIML, they're kind of two major patterns, one is you're getting. Variants of neural network architectures where you wire the parameters up slightly differently.

And it gives you an incremental boost on a certain task. And people haven't got to the point where you're delivering neural networks that are designed by neural networks. It's like machines designing machines. And that sounds scary. Like, Oh my God, this is, this is how we're going to get Skynet. But the reality is it's kind of like million monkeys typing.

And what you end up with is something that can w recognize patterns. And in some cases, even memorize patterns a little bit better. And I think to say that that approach is going to lead to AGI kind of, it doesn't do justice to the complexity of thoughts and our ability to construct higher order concepts and narratives and, and so on.

and maybe we get there, but I think even if you look at, you know, the pioneers, that field folks like Jeff Hinton, They're not saying, okay, all we need is a little bit better. Neural network architecture. That is, we need a fundamentally different approach that can actually leverage these higher order representations and concepts in order to make decisions as opposed to just playing, you know, a billion games from Atari

Steve Hamm: [00:37:56] my simple minded thought is that we already have humans with brains. Why do we need to reinvent them?

Peter Bailis: [00:38:04] Well, I think that's a fair question. I think that the real reason why I think that. AI is, is, is going to take off is not that we will have that replacement. It's that there's just too much sensory information to take in, right? Like, like you and I have relatively low bandwidth inputs in the form of our eyes and even what we can process in our brain.

And for me, it's like, you know, why is, the internet useful? Like, why is Google useful? Is that I can, I can look through more data than I would ever have a chance of looking through in my entire lifetime. If I had all the time to go look at, at every page on the internet, when I want to go figure out, you know, who uses snowflake as a customer today?

Like I just it's one, one click I go. Right. And, and for me, it's that, it's that ability to use massive amounts of data. We're actually think you'll start to see the lift. It's like, Information retrieval, not just for text, which is what you think of on the internet, but for structured data like this problem of relevance and ranking on top of tables is like totally understudied in large part.

Because most people aside from Google, we didn't have large amounts of structured data until very recently. And so Google used to claim, they don't say this anymore, but they used to claim more data, beats, better algorithms,

you know, the reality is today. Like it's not just Google. It has all that data.

It's like people who adopt snowflake and have cloud data, they have more data and they can actually get, you know, value out of these. And so that doesn't lead to AGI. I just wanna make that point, like more data doesn't lead to necessarily to AGI. You can learn more concepts, you get more long tail, but more data does mean you will need more AI or ML in order to prioritize the limited bits that we as humans can process in any.

Period of time, or we just were low through, but high creativity. So let's take things that are high through, but low creativity and combine them.

Steve Hamm: [00:39:54] You mentioned the augmented enterprise, the whole idea of augmentation, of human thought of kind of a collaboration between the machine and the humans. I think not as sort of the model that seems attractive to me. And, you know, when you can look in the future, it paints a picture of a happy future, not a machine dominated future.

Peter Bailis: [00:40:15] on this idea of the augmentation, I think it's Steve jobs. Maybe it's, incorrectly attributed, but I believe it's Steve jobs. You said the computer is a bicycle for the mind. And I liked that idea and that. Know, bicycle is not a tank. You know, it's not a fighter pilot, but it's the main house. When you move in the ways that I want to move, you know, efficiently, it's not completely automated. You know, I'm, I'm doing a little bit of work. I'm steering the thing, I'm keeping it balanced.

And I think that's much more of a reasonable metaphor in that it's not wholesale replacement of the mind. It's this collaboration and even the user interface design for human, the loop analytics, we're still in its infancy. I mean, the last 20 years of consumer internet have all been optimized on getting people to click more ads and look at more content on Facebook and post more Instagram pictures.

But when you think about what happens on a daily basis at work. But we like most enterprise applications have no predictive qualities. You know, I get the same view every single time I go to my inbox. Every single time I look at my calendar every single time I look at documents and it's that kind of augmentation where.

Whereby what I expected it. Consumer setting, personalization, recommendation, relevance. That's where I think you'll get like the step up of the bicycle for the mind. Like maybe the bicycle with a motor, um, or, . My favorite metaphor is just thinking about, you know, funneling and summarizing all of this data that's at someone's fingertips is constantly changing and really telling me what do I need to know now that's going to change my plans for the day.

And if I can do that, like once per week, if a computer can do that once per week, for me, that's pretty astonishing.

Steve Hamm: [00:42:01] Yeah, that's making life better. , Peter, but it's been wonderful talking to you today. I want to thank you so much for your time. I felt like some of the things we talked about, like the, the augmented enterprise, I think that's really meaningful.

I liked when you spoke about structured data and pointed out that it may not be as appreciated as it deserves. I think you said it might be the most valuable data. And I think that's, you know, maybe that was, an idea that was back in the nineties was lost and has been found again.

So, thank you so much for your time today. Great talking to you. Okay.