The Data Cloud Podcast

A Deep Dive into Data Science with Adrien Treuille, CEO and Co-founder of Streamlit (Acquired by Snowflake)

Episode Summary

In this episode, Adrien Treuille, Co-founder of Streamlit (acquired by Snowflake), shares how business people can use data science and machine learning, how you should incorporate data scientists into your organizations, and so much more.

Episode Notes

--------

How you approach data will define what’s possible for your organization. Data engineers, data scientists, application developers, and a host of other data professionals who depend on the Snowflake Data Cloud continue to thrive thanks to a decade of technology breakthroughs. But that journey is only the beginning.

Attend Snowflake Summit 2023 in Las Vegas June 26-29 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization.
Learn more and register at www.snowflake.com/summit

Episode Transcription

Steve Hamm: [00:00:00] So, Adrian, it's great to have you on the podcast today.

Adrien Treuille: thank you. I'm thrilled to be here.

Steve Hamm: Yeah. So tell us a bit about stream lit, which was recently acquired by snowflake briefly. When and why did you and your co-founders decide to start the company and what does it do for data S.

Adrien Treuille: so my co-founders are Amanda, uh, Cali and Thiago Tahara and I were really part of, uh, the group of the first generation of people who grew up with data science and machine learning as actual things that companies really did at an industrial scale. We all met at Google. We were working on, uh, get Google access self-driving cars, other kinds of advanced technologies.

Then Amanda and I went off to have another job together, uh, which was working on self-driving cars at a company called zoos. Uh, and then, uh, of course the three of us came together and formed at streamlet. And the thing that [00:01:00] we noticed over the course of that, of those careers was that here was this amazing group of people, uh, Namely the data scientists, the machine learning engineers who were in suddenly put in charge of really important data and insights by these big companies.

And it was super hard to communicate the work that they were doing to others inside of a company. It's just a property of working with that much data, uh, with the kinds of models that machine learning engineers build. Um, With, uh, annotating that data being able to search through it, it's really difficult to convey that to others.

And so we saw that either they were using super poor impoverished tools to do so. Or, um, often companies would invest tons and tons of money in building custom infrastructure to make it this possible. So a, a, an example would be, um, [00:02:00] we have millions of annotated images that we're running through our machine learning algorithms.

We need other people besides just the machine neuro engineers, to be able to see those images, annotate them, go through them and so on and so forth. And there's tons and tons of these kinds of internal. Micro applications adjacent to the data science and machine learning workflow. And there wasn't a good way to build these things in order to communicate your findings to others inside of a company.

So seeing that. Uh, we all, well, I should say to be perfect, honest, I convinced them both to quit their jobs, uh, obviously at an insane thing to do in the beginning of, of, uh, of, you know, every new company's story and, um, and start a company to, to do that. And, uh, that was streamless and, um, it took off like wildfire in the open source community.

Um, and it started being used. [00:03:00] Not just by big tech forward companies, like, you know, Google and Uber and apple, but also by caterpillar and seven 11 and mid-market companies and startups. So it really swept the world.

Steve Hamm: Yeah, no, that's a great explanation. Now it might be helpful for some of our listeners. If we turn this next section of the podcast into something of a primer for business, people who want a better understanding of the data science phenomenon, please explain for all of us what data science is.

The variety of things that data scientists do, the main tools they use and the challenges they face.

Adrien Treuille: Sure. So data science is distinguished from, uh, other types of, uh, data, data adjacent work that you see in companies in basically, uh, two ways. Um, in [00:04:00] some ways it's like analyzing data in an Excel spreadsheet, but. Unlike that you're usually dealing with lots and lots of data. So, so much data that you'd need to, uh, Hey, a fancy company, like, uh, snowflake to actually manage it all for you.

Uh, you may have heard of snowflake and, um, and, and run queries against it and so forth. So think so on the one end you can think of it like doing analyses, you know, let's say in Excel, but on massive, massive amounts of data. So that's the first difference is this the size and scope of the data that you're dealing with?

The other thing that makes data science really, uh, interesting and unique is, um, the, the types and the, the power of the analyses that you can do. And this is really, um, it usually goes beyond, um, you know, let's make a, a graph of something that happened in the past. You can get into let's build a statistical model of what's gonna happen in [00:05:00] the future.

Um, or, you know, machine learning applications, how can we actually, uh, predict from all this data, what a customer is going to do next? So those are the kinds of things that the data scientist deal with. Um, and, uh, one of the, uh, key challenges. In the field right now is that it's difficult to convey the insights that you're working on, um, to others.

And so, uh, this kind of plays out in a couple different ways. One is that, uh, data scientists, let's say they do a brilliant insight and they say, we can predict, um, how much your marketing spend, uh, or how much our marketing spend is going to make in, in, in revenue, based on the statistical model we built.

Okay, well then the marketing team says, great. Run it with these numbers. Now, run it with those numbers. Oh, awesome. That's great. Run it with these other numbers. And all of a sudden your brilliant data scientist are sort of reduced to, [00:06:00] uh, rerunning their models over and over again for other people, which is it's a communication problem, actually.

Steve Hamm: Right.

Adrien Treuille: Um, so

Steve Hamm: you saying that the people, the people within the business units or function, they should be able to run these models themselves?

Adrien Treuille: Totally. They should be able to run the models themselves, uh, play with them. Um, and, uh, and, and ultimately, um, you know, it's, it's, it's not just about what is the answer. Uh, it can also be, I wanna understand, uh, I wanna gain insight into what the model itself is saying. That's actually a big part of machine learning is how, how well is this thing working?

How does it break? What if I throw this data at it? What if I throw that data at it?

Steve Hamm: That's interesting. So, so a model is, um, uh, a kind of a digital twin of what, of something that's happening in the world. And it can be used to analyze the thing that's happening in the world, but you also can [00:07:00] use what's happening in the world to kind of improve the model.

Adrien Treuille: That's exactly right.

Steve Hamm: Okay. All right. Good, good. Now, you know, there's a lot of terms that are thrown around, I think, uh, in, in our industry and in the press, you know, data scientists, data engineers, my sense is those are two different things, but some people seem to use them interchangeably.

What are the, the, how do you distinguish, uh, between the work of data scientists and data engineers?

Adrien Treuille: Yeah, that's a great question. Suppose you're in a company and you see a sudden drop in the, uh, uh, visits to your website. Let's just say. You could turn to your data engineers who are in charge of maintaining the sources of data in good working order, their Providence, et cetera, and say, Hey, is this actually really happening?

Or is there a blip in the data pipeline somewhere that's causing this problem? That's data engineering to the data scientists you might say. Hey, we noticed that there is a drop [00:08:00] in traffic to our website. Can you help us build a model of why these drops happen? Can you help us predict over the next six months?

Are we gonna see a drop like this again? So they're sort of exploring the data and trying to figure out, uh, what comes next.

Steve Hamm: Yeah. Well, that's a good explanation. So another term that's thrown around quite a bit is data application or data app. Uh, unclear, you know, I mean, there, there, there, it seems to, it seems to mean different things to different people. So what's your definition of a data app and, and do both data scientists and data engineers write and use data apps.

And, and how do they use data apps differently?

Adrien Treuille: Yeah. So a data app is usually a small application, uh, that you go to on a webpage and an internal website, or maybe you'd have it on a phone that, uh, allows you to play with manipulate, understand models and data that your company has. [00:09:00] And. Uh, do both data engineers and data scientists, uh, write data apps.

Well, they should . And one of the cool things about streamlet is it makes it so easy to do that. That increasingly we are seeing both those categories of, uh, job titles, building data apps, really productively in applications using streamlet.

Steve Hamm: Yeah. Yeah. Very cool. Now, um, you mentioned machine learning briefly before. How does machine learning fit into the data scientist?

Adrien Treuille: So machine learning is really the, uh, extreme end of data science, where you're dealing with either massive data sets or really complicated, let's say unstructured data, um, like images or natural language texts. And so the, the types of techniques that you use to analyze that data. Are even more sophisticated and they, they fit in this [00:10:00] category, uh, called machine learning.

Um, so from our, from streamlets perspective, we consider that just another data workflow and a really exciting one, one that that really pushes the limits of technology. And in fact, streamlet was originally designed. For machine learning engineers specifically. And it was only after we released it in the world that we saw it massively by data scientists and realized, oh, it's actually the sort of crosscutting need that's being served by streamline.

Steve Hamm: Yeah. Yeah. Hey, you know, I ask, I wanna ask one more question here, then we'll get back to streamlet, but, um, it seems like the most popular programming language for data scientists is Python. Why is it so popular and how is it used?

Adrien Treuille: The reason why Python is so popular is because it's so. darn delightful. um, it's really very easy to write. Uh, it's very easy to [00:11:00] read. It's kind of difficult to mess up. And so for people like data scientists and machine learning engineers who are also thinking about data and also thinking about statistics and big models and so forth, it's really, really nice not to have, um, a super complicated programming language to add to the complexity of your.

Steve Hamm: Yeah. And, and is it used to both write data apps that are then reusable, but also to write queries that are kind of one offs?

Adrien Treuille: Yep. It's used for data app development for queries, um, even for model building and model training. So it's really become the language of machine learning and data science around the world.

Steve Hamm: Okay. Okay. And I think that does kind of take us back to streamlet. So, you know, people talk about the modern data stack, and I know that streamlet, you know, is a player in that. So within the data, that data ecosystem, what's the range of problems that your product helps to solve or the new capabilities that it enables.[00:12:00]

Adrien Treuille: So the really amazing superpower that streamlet, uh, brings the table is giving data scientists, machine learning, engineers, all those engineers using Python. We were just talking about the ability to build these micro apps super easily and quickly.

Steve Hamm: Mm-hmm

Adrien Treuille: The, uh, without it, it was possible to build apps, but it was a heck of a lot harder.

And people thought in terms of weeks, they taught in terms of quarters with streamlet, they think sometimes in terms of a few hours or a weekend side project. And if you do that over and over again, it can really start to, uh, affect, uh, an organization. How that's power.

Steve Hamm: So I let's, let's kind of drill in deeply here. Give us a step by step example or two of,of a data science team using streamlet, maybe one that's typical and one that's [00:13:00] atypical. And remember our podcast listeners include non-technical people.

So be merciful to them.

Adrien Treuille: um, I'll tell you that one of the, um, earliest and for me, most eye-opening, uh, examples of streamlet use was by Delta dental. And, uh, they got super excited about streamlet. Um, and they started building really amazing internal applications to understand, uh, how their, their call centers work. So it turns out in the medical insurance business.

Having really good call centers is the name of the game. And how do you do that? You're analyzing every call in real time, including the sentiment of both callers. This is machine learning, figuring out if you're happy or sad, the gaps in the conversation. And of course, aggregate statistics about how quickly, how often and so on and so forth.

These calls are happening. They built an amazing dashboard in streamlet that allowed, [00:14:00] uh, executives in the company to see this data and play with it that simply wouldn't have been possible using more traditional business intelligence tools, like let's say, Tableau or power BI. And that was. Just for me, such a cool example of how giving data scientists, this power to build their own apps, using their own data, and then share it with others in the company is just such a transformative idea.

Steve Hamm: No, that's great. Um, how about an atypical example? You got one of those.

Adrien Treuille: Well, um, go to, uh, our gallery streamlet IO slash gallery, and you can see tons of wild and wonderful examples of people using streamlet to do all kinds of things. Um, people have used streamlet to, uh, uh, make an app for all of the free parking spaces in their town. So you can just go to the app and check and it'll automatically detect the free parking spaces and tell you where you can park your [00:15:00] car.

Uh, people have created apps to analyze your good read habit, so you can go to Amazon. Good reads, see every book you've ever read. And it'll give you a ton of statistics on, uh, why, um, what kinds of books you like to read and so forth. So there's an amazing sort. Uh, diversity ecosystem of cool, weird stream apps that have been built, uh, both in and out of companies to solve weird corner problems.

And that's honestly one of the delight of running stream.

Steve Hamm: Yeah, that's wonderful. Wonderful story. Hey, I, I kind of wanna step back, uh, here for a second, you know, um, snowflake is very much, you know, a very important player in the modern data stack. We talked about the, the data stack before, um, but in its history, you know, it started off really. um, with the data warehouse in the cloud, the first native, uh, cloud data warehouse.

And it was used by, by companies to manage structured and semi-structured data [00:16:00] using SQL to write applications and queries. Now that's not data science, I don't think now. So how is snowflake really gotten into the game of data science? What what's it, what's it doing to support data scientists?

Adrien Treuille: Well, as we talked about earlier, the key language in data science and machine learning is Python, not sequel and strip snowflake has done an amazing and huge effort around Python over the past couple years, notably, uh, with the release of Python, snow park, and the partnership with Anaconda, which now basically brings modern Python and the entire machine learning and data.

Uh, technical ecosystem to the hands of snowflake users. The other big thing they do they've done I'd say is acquire streamless, which was one of the fastest growing data, uh, application frameworks of all time, [00:17:00] and which we are now bringing inside of snowflake as a first class product and super excited to see what snowflake customers can do with

Steve Hamm: Yeah. Yeah. Hey, how did the relationship between snowflake and streamlet begin and kind of emerge and evolve?

Adrien Treuille: So originally we were talking about a partnership back in, uh, December of last year. And, uh, the idea was that we were gonna work with, with snowflake to bring this product to market that was going to allow snowflake customers to build beautiful apps and Python, share them internally, and also share them with other companies.

Through a framework called native apps. Those talks became so elaborate that. Um, and, and also frankly, just so exciting. Um, and, and, and eventually I, we got to meet the founders of snowflake and they liked us and, and we liked them that we just started to realize, wow, we are actually talking about the same joint product [00:18:00] here.

And the coolest and most efficient thing both of us could do together is really. You know, embrace and, and bring this product to market. And, and once we saw that, you know, it, it started to basically transition from a partnership, talk into an acquisition talk.

Steve Hamm: Yeah. Yeah. Now I, I see why it's advantageous for snowflake and streamlet to do this thing together, but what advantages does the, what advantages do customers come from? The combination of the two companies?

Adrien Treuille: Well, I've seen this talking to so many, uh, snowflake customers. There's a real thirst to bring data insights to, uh, bear, uh, throughout the company. We want the marketing team to be able to access the data scientist's work. We want the, um, operations team to have access to the [00:19:00] data scientist work and so forth.

And so with streamlet in snow, We are, uh, releasing a version of streamlet. That's going to unlock that potential across all of Snowflake's customers. So.

Steve Hamm: Hm. Okay. Now you, you mentioned right near the top, you know, a streamlet started off as an open source project, became a commercial project, a, a company it's, it's got these two parallel, uh, technologies. What happens to the open source technology with the acquisition?

Adrien Treuille: The open source version of streamlet is going to remain strong. Uh, the community is incredibly valuable to. Uh, streamlet, not only, uh, because it's nice to have lots of people use your work, but also as a part of the product itself, if you go online and ask a question about streamlet, you'll see tons of code examples out there, or you might even have someone, a friendly person answer your question.

[00:20:00] All of that, uh, is the stream community. So we are committed to supporting and developing the open source project and keeping it in, uh, parallel, advancing it with the snowflake version while at the same time, giving snowflake customers access to an amazing array of features that sort of directly makes sense and uniquely makes sense in the snowflake ecosystem.

Steve Hamm: Yeah, well, that may that's good. Yeah. So we've been talking about the past and the present. Let's talk about the future for a minute. So looking ahead for a year or so, what major changes are coming in your field?

Adrien Treuille: From where I stand. We are still in the earliest days of machine learning and data science inside of companies. Every department that we see now is going to 10 X in the next couple of years.

So This is a major change. [00:21:00] That's going through corporate America right now. And the outcome is more efficient decision making, more access to, uh, pertinent data, the ability for the organization to make decisions automatically through deployed machine learning models and so forth.

And, um, that is, uh, that, that is just a trend that it's, if it seems big now we've only just got started.

Steve Hamm:. You know, I, I'm gonna ask you to put on your visionary cap now looking out five years or more, what are the major changes in technology and data analytics that you think will impact business and even society?

Adrien Treuille: The major. Five year horizon, implication of machine learning is that, the data sources are becoming sensing sources. Let me give you an example. If you, uh, today, if you just point a [00:22:00] camera outta your window. You BA, unless you a lot, know how you end up with nothing but a video of what's happening down the. we're seeing with machine learning is that that's no longer a video that is a description of everyone, counting of everyone going down the speed, their speed, their gender, and this is all happening in that automatically. So take that analogy to every data source that you've all across the world.

And imagine now that we are actually sensing what is happening. In each one of those data sources and bring that into the inter internal system that is a major societal civilizational change, uh, for all of us.

Steve Hamm: So it makes. Possible to understand what's going on kind of in the world real time, you know, much more deeply, you can gather much more data than humans can despite themselves, because of all these, you know, internet of things, all the other data sources. [00:23:00] And then these events, these things that happen can trigger applications and, and responses.

Adrien Treuille: That's exactly

Steve Hamm: way to see it? Okay,

Adrien Treuille: That's exactly right. Yeah.

Steve Hamm: see why that could change society. I mean, it's really, when we look at all the phases of automation in human history, it's kind of the next huge advance and it's, and it's not the advance that some people worried about, oh, machines are gonna do all our thinking for us.

It's really the machine as an, as an a really powerful assistant. It looks.

Adrien Treuille: Yes. And, and, and, and the machine. As a powerful, uh, sense organ. Uh, I think that, that looking back on it, you're exactly right. We won't say, uh, oh wow. The, the, the computers, uh, just, just think for us may, maybe in the long run, they'll think for us too, but I think it's a little [00:24:00] bit like astronomy, like the, the can see deeper and deeper and deeper and deeper into the cosmos and with more and more detail, everything around us and.

That's similar to what's happening machine today. Suddenly our ability to sense and see the world it's like blinders were taken off and we can count and see a million new things that weren't possible before.

Steve Hamm: Yeah. That is a beautiful vision, sir. Thank. I hope it becomes true, you know,

Adrien Treuille: Thank you.

Steve Hamm: yeah. You know, we're coming to the end of our podcast. And typically at the end, we, we try to end on a, on a lighter note, ask a, a more personal question. And, you know, we we've talked obviously, and you know, a lot of people love to hear the stories of the beginning of something of a, of an invention of a big idea, the aha moment.

And I know that you have a really good story first streamlet. So would you please tell.

Adrien Treuille: Yeah, my, uh, very brilliant friend. [00:25:00] Lucas BWA said, Hey, Adrian. You know, it would be really fun if we went off into the woods and coded neural nets together. And, uh, so we went off into the woods and we rented an Airbnb cabin and we took our laptops and we started coding on neural nets together, which, uh, was really fun.

And, um, and out of that cabin, uh, in the woods, Came two of the biggest, most, you know, most recently biggest companies in the machine learning infrastructure space. One of them is Lucas's company weights and biases, which does experiment management among many other things. And of course, uh, Our company is stream lit.

Uh, and so it's been absolutely, uh, an honor and breathtaking and strange and wonderful and fun to have you and your best friend start company at the same time. And to see them both in parallel, grow so impactful in the world.[00:26:00]

Steve Hamm: well, that's a fantastic story. Thank you so much for that. You know, I think this has been, this has been a great conversation. I mean, I think you've told some good stories. You've told some very practical information for people. And I think for me, and maybe some of the, the more business oriented people, your, your primer on data science and data engineering and Python and, and machine learning, I think that's really valuable.

I mean, I almost feel like we should bust it out and publish it because, you know, I, I read a lot and I read a lot that seems very Mudd. In the press and I think clarity would help everybody. It would help. It would help business users of technology. It would certainly help investors to understand the value of companies and, and what's really going on in the, in the marketplace.

So I wanna thank you so much for this conversation that I, I think it's really valuable.

Adrien Treuille: Thank you so much, Steve. This is, uh, Reese is a real pleasure.