The Data Cloud Podcast

Big Data In Everyday Decision Making with Florian Douetteau, CEO of Daitaiku

Episode Summary

This episode features an interview with Florian Deautteau, CEO of Dataiku. Florian is an expert data scientist, he as a degree in maths and computer science from ENS, one of the most prestigious universities in France, and he previously served as the vice president of research and development of Exalead. On this interview, Florian talks about Machine Learning and Deep Learning, how to bring big data analytics to everyday decision making, data visualization, and much more. So please enjoy this interview with Florian Deautteau, CEO of Dataiku, and your host, Steve Hamm.

Episode Notes

This episode features an interview with Florian Deautteau, CEO of Dataiku. Florian is an expert data scientist, he as a degree in maths and computer science from ENS, one of the most prestigious universities in France, and he previously served as the vice president of research and development of Exalead.

 

In this interview, Florian talks about Machine Learning and Deep Learning, how to bring big data analytics to everyday decision making, data visualization, and much more. So please enjoy this interview with Florian Deautteau, CEO of Dataiku, and your host, Steve Hamm.

--------

How you approach data will define what’s possible for your organization. Data engineers, data scientists, application developers, and a host of other data professionals who depend on the Snowflake Data Cloud continue to thrive thanks to a decade of technology breakthroughs. But that journey is only the beginning.

Attend Snowflake Summit 2023 in Las Vegas June 26-29 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization.
Learn more and register at www.snowflake.com/summit

Episode Transcription

Steve Hamm: [00:00:00] so Florian, good to talk to you today. It would be great. If you could start by talking a little bit about your background. We know you're from France as are two of the founders of snowflake. So how did you kind of grow up in France to find your way to technology and how did you find your way to data analytics?

Great.

Florian Douettea: [00:00:21] I grew up to technology, very, fairly young starting programming as. As young as I think, six or seven. And then I got very quickly into, AI and chess program and stuff like that when I was a teenager, because I was like fascinated by it, fascinated by chess and, Quote, unquote AI in this sense of like chess players, then I went to, I'm going to search engine and, and all the data you add on the web bags then in.

Very beginning of the century. And back then, doing web search was like the next frontier, because like every kind of like interesting thing was on the web that was going very fast and cadre, I moved back in a sense to the mockup white world going through, advertising gaming that was also booming in two to 10.

And I realized that the next frontier in a sense was more an Oh, you apply AI. to the copyright world, which was, going in terms of, sophistication of, and use of data.

Steve Hamm: [00:01:28] Yeah. And what was the idea that led you to starting data ICO?

Florian Douettea: [00:01:34] Then the main ID where that there was a big gap. In fact, the enterprise between technologists and business in terms of, Oh, they perceive and or easy, it was for them to get value out of data. And so I wanted to bridge that gap just because I'm. Well, I'll just like to bridge gaps. And I like when people talk to one another.

And so the idea was to build a platform where people from different backgrounds can collaborate to achieve things in terms of AI and analytics very quickly.

Steve Hamm: [00:02:07] Yeah. Yeah. So Florian, what's in the name of the company. It's really unusual. It sounds like a cross between data and poetry.

Florian Douettea: [00:02:16] It's actually true. In fact, because it's a cross between data and I cou I grew being a Japanese poem with slavery. So it's a berm, which is a Trevor, a very deep meaning, but like, while being very, very shocked, only Skiva and, a few syllables. And the idea was to kind of like, talk about data and big data while trying to do things in a more easy, small way.

Steve Hamm: [00:02:46] So it would be really helpful if you would describe the technology and how people use it.

Florian Douettea: [00:02:55] So the platform since sits on tip on top of the existing. Data platforms of, , uh, of, uh, organization and enables, people to visually transform data, blend. Data together from different data sources and build predictive models in order to have like deeper insights into the data. And so, within the platform you can have like end to end, get on the data as it is like the whole data, that you just got from a different systems to insights, forecast predictions, better, easily.

Steve Hamm: [00:03:37] Now is this conventional machine learning or is this deep learning?

Florian Douettea: [00:03:43] It's both. it's both. And, in fact, we implemented a, the platform over the last six to seven years, continuously adding more capabilities and algorithms into it. So the very first version of that day, who was a conventional machine running back, six, six years ago, and three years ago, we added deep learning capabilities. And so depending on whether you have like, in a sense of more simple business problem, where you really want to understand and explain things with a whiteboard approach, you use traditional machine learning. If you've got more complex data and especially images, text, you can leverage deep learning.

Steve Hamm: [00:04:22] Yeah. Yeah. So it might be nice to have a scenario of how you use deep learning, because my understanding is with deep learning, you just turn the algorithms loose on the data, rather than doing a lot of training ahead of time. And the algorithms kind of find patterns or discover things or predict.

Things that you, not that a human might not necessarily have spotted or even traditional machine learning. So if you could explain how the deep learning works, that would be really great.

Florian Douettea: [00:04:53] So yeah, deep learning,

deep learning can work actually. Now lots of different ways. And, in fact it's more relevant as you pointed out in situations where the downs within the data, ah, so complex. That's you actually need to kind of like understand the data in other to, to, to find them. What I mean here is that, you should look at something like a table, a spreadsheet in a sense it's pretty simple data.

You can find the patterns by looking at the values, go terms and try to combine them. I'll look at one than another, essentially. It's just like, maybe, I don't know, each time, 100 to 200 numbers. But if you look at an image in order to understand the patterns, you need to understand what the structure or the image is.

Meaning, is it an object or not? which positions do you have? Several layers and stuff like that. It's an image it's complicated. And so, deep learning is essentially a method where you build this understanding of the patterns of like, what's the image. It looks like automatically with the computer.

We set that into actually describe the patterns themselves to the computer. So it learns by itself, not only, what you want to. What you want to get as an outcome, but also the patterns within the data in order to get there.

Steve Hamm: [00:06:18] Is your technology primarily for data scientists or is it for a broader selection of people to use.

Florian Douettea: [00:06:25] Yep. Yes, I would argue that technology is actually mainly not for data scientist. Oh, maybe just to the multiply, the impact of the sentences within his organization. Most of our users are actually, analysts, subject matter experts and people that want to get more impact from the data they have that want to get to the next step.

Five years ago, they started to do more business intelligence and analytics engineer, or no, they need to get deeper into the data, do more forecasts or more complex analytics and get to the next steps in a self service way. Meaning you want actually a people from the business to get into the data and to be able to go as fast as possible, collaborating with data scientist in order to go faster, but you need really to feed the data, to democratize the access to analytics.

And that's what we're trying to do with the platform.

Steve Hamm: [00:07:18] So there are a lot of companies in the same market segment as data ICO. How would you contrast your company and its technology from say Databricks or data robot or Zippel.

Florian Douettea: [00:07:32] Yeah, indeed. It's a, it's a very odd market. this market of AI and that the science for the enterprise, because the stake of democratising these for the antibodies, I, I think that compared to others, we, we started. Very early on by adding this perspective of democraticization meaning taking the angle of the data analyst that may want to do something simple with the data or something more complex that needs to keep out the data on some of the data, do some machine learning and really serve as this analyst as someone that you need to enable and empower work with the younger players.

And at the incredible rate was about our group. So selling these, analyst and these teams of analyst is actually what the decrees about compared to others that have the more, I would say technology or infrastructure approach.

Steve Hamm: [00:08:21] Yeah, well, my sense, you know, excuse my ignorance. But my sense is that Databricks really is for data scientists. The data robot is really about a lot about automation, and making the, the technology available to a lot of different. a lot of people in a lot of different roles, like, like yours is, and zeppole is more like a collaboration platform for, for data scientists and data analysts.

If you could kind of could know, do a bit of a contrast of your, how your technology and your approach contrast with theirs. I think that would be really helpful to the listener.

Florian Douettea: [00:09:00] Yeah, so compared to there, and would say that we are, We've got ablation platform for indeed for data scientists and data analysts for the wider ones for pies, where DWX actually, focuses on providing infrastructure for the data engineers themselves, in terms of, automated transformation of the data, data, our focuses on.

Well to ML in the sense of automating machine learning itself, but possibly not with everything which is before and after, that, yeah. And zippered focus on the, L being a data scientist, having, not books, the height, infrastructure and computing platform.

Steve Hamm: [00:09:35] Yeah. Yeah. So there are a lot of, we've only mentioned a few companies, but there are many, many more, do you foresee a consolidation in this industry kind of a rolling up so that people will maybe go to one or two companies for, for all that they need? Or do you think there's going to be a long period with, with a lot of, a lot of different offerings out there and people using a lot of them.

Florian Douettea: [00:09:57] A things that this market will consolidate. for two reasons, one could be external factors like the general need of consolidation, because of economic factors and so forth. And just, they're going to want a more interesting, the fact that as an organization, when you build AI, you need the resilience, you need consistency, you need some form of governance, meaning you need to be able to upskill your people or whatever their position in the company.

And to make sure that if you set up a new way to do something, it is actually distributed within your organization in a consistent way, because AI is I'm sinking buttons. You don't want like to have like lots of different tools and the consistencies and the different way to compute a very important formula for your organization.

That is not the same. From one talk to the other. So at the end of the day, yes, I do believe that there will be only a few vendors of future vandals on this. I just might get to get to you, right?

Steve Hamm: [00:10:48] Yeah. Hey, I wanted to back up to something we talked about before was, was your technology and who uses it and what roles, if you could walk through a scenario of how a data analyst, might use your technology kind of step by step to solve a problem. I think that would be really helpful.

Florian Douettea: [00:11:07] Yes. Sure. So we could find, since you imagine a data analyst, walking, In a marketing capacity, let's say in a bigger consumer group, a company, consumer good company. And so what he would do in the Taikoo would be for instance, to gather data from a values advertising platforms. From sales, from East LA called data, they would get data about online offline ads, possibly social media data from competitors.

And it would have like all these data within his that they could projects. And we are talking possibly millions of lines and he would leverage it to take who in order to clean refines his data, find what's interesting in terms of correlation, on the products he needs to work about. And the brands needs to focus on his particular study.

And maybe he's a Buddhist is to understand what is the impact of a media campaign, what other keywords that works better, today, as opposed to before. And it would use this in order to produce an informed opinion about, Oh, to spend five, 10, 15 million of well-staffed marketing budgets for a particular product category on a particular country.

And so tell to the business where to spend the money on which keywords, whether or not you should bet. And  the weekend versus the weekdays and stuff like that. And the idea is that this data analyst is someone that has, is that we are not talking about. I'm a computer scientist is someone that has essentially a very good understanding of the product and the category of the language we are talking about in the country is focusing on it, but is instead of doing that.

In Excel as before, essentially trying to copy paste data from value sources in order to build a PowerPoint. No, in order to be out event is doing that within the data platform where he's manipulating millions, if not billions of rows and two gigabytes of data in order to get something way more relevant for the digital world.

Steve Hamm: [00:13:06] Yeah. And the, the data analyst doesn't have to write code or write algorithms or anything like that, correct?

Florian Douettea: [00:13:13] Yeah, correct is like a clicking in order to confirm the data and do the forecast and apply machine learning. And obviously in some situation, a clicking is not enough inmate, inmate. it's some kind of fraud block where it needs to actually, ask a friend like, during those TV games, some ELP and usually the friend is a data scientist that could like, look at the roadblock, write some piece of code, save it somewhere.

On the tycoon and then this particular fixed way to move forward and then be able to use by other data analysts, you know, they'll to be more self sufficient when trying to do the same kind of analytics next time.

Steve Hamm: [00:13:53] Yeah. So they can collect a bunch of tools that they've used before and use them again.

Florian Douettea: [00:13:58] Exactly because the main issue is AI. That is like such a complex problem that you actually want to be able to reuse solution from the past, you know, not to get to the next step. It's you start from scratch every time. Actually. You won't go that far as an organization.

Steve Hamm: [00:14:13] And are the results of the work primarily presented as visualizations or in some other way?

Florian Douettea: [00:14:20] So it can be visualization, but in fact it's almost. Let's say once out of visualization, Lanka insights, you can use to better understand what's happening. once, this isn't super hot for apps in the sense of like providing to someone, suggestions our way to make a more informed decision.

Interacting with the data, like an example for that could be a, when you ask, for a cadence in a bank, or do you provide to, the operator to the banker information about a credit score or the decision related to that score and so forth? So it's more like a. In context data, and another third could be a applications where you actually integrate directly into a passion of systems to, completely automate the tasks, like, automate, the whole supply of them goods.

any other tasks where you actually, directly in a sense use data to, to simplify and make, a business process, more efficient.

Steve Hamm: [00:15:20] , I know you have an Alliance with snowflake. When and why did that occur?

Florian Douettea: [00:15:24] So we started working with snowflake a few years ago. at least two reasons why we had the. some customer would come in and we are seeing, an increasing interest in snowflake, in the data science and broader analytics community, obviously. And then we, well, we, we also met with the funders of snowflake.

we happened to be French, so happened to be French too, for some of them. And we really, really liked the technology and, the, the fact that orchestra miles could do. we snowflake things, you know, easier on things faster and easier, always, without having to set up a complex infrastructure from that perspective.

Steve Hamm: [00:16:06] Yeah. Yeah. So you have mutual customers, snowflake and data IQ. If you could walk through a scenario of how a mutual customer would use both of your technologies, that would be really helpful too.

Florian Douettea: [00:16:21] So there is a scenario that I just described, where, and then at least from a consumer goods company, leveraged data in order to get things done and essentially, be more efficient that, marketing campaign is actually very relevant and actually a well sort of ideal scenario for using the deco in snowflake together.

And so the way. The way, it, the way it works is essentially by having snowflake as the backbone, which is useful, most of the computation, that are needed in order to transform the data. eh, be fine yet. So going to eat it and so forth. So it's it's as if, with that you can visually click and that he gets the data to design what you want to transform it.

And the boy of snowflake make, most of the queries and the result very, very quick and efficient for the analyst. The second factor is that we are talking about customers where, the number of such an artist is by the hundreds. Like  at the same time trying to navigate the data. And each of those entities have millions of rows.

So you really need a platform that can scale seamlessly so that you can scale this activity is where the organization

Steve Hamm: [00:17:33] Yeah. And so the cloud is really essential for that, I guess.

Florian Douettea: [00:17:37] exactly.

Steve Hamm: [00:17:38] Yeah. Yeah. So are you in a snowflake customer as well?

Florian Douettea: [00:17:43] So we, we are not a significant user of snowflake. As of today. We are

Steve Hamm: [00:17:50] That's too bad.

Florian Douettea: [00:17:51] that's too bad, but I think that we, we don't, we have like, we have such small, well, we are B2B, so we have very small datasets. We have very small datasets in a sense.

Steve Hamm: [00:18:04] Yeah. Yeah. You're not a target customer. I get that.  flooring. And we're in the middle of a global health and economic crisis.

Florian Douettea: [00:18:13] Huh.

Steve Hamm: [00:18:14] How is data I could dealing with it? And how are your customers using our technology to respond to the stresses on that?

Florian Douettea: [00:18:24] Yeah, indeed. We are. We are living the values phases of his classes. So at the very beginning phase one for the deco itself, it was I think, first and foremost, to care for other people, from the taiko, meaning, Where good to live, where could they work? And essentially, he said the expectation in terms of work, that we're compatible with, with the new world, meaning, ability to work from home.

ability to have a flexible hours, being very interested, understand that, we have parents that I could, that were unable to, to deliver during those few months as much as they could, because of this association. So I think as an, as an organization, we had to, to, to adapt and, make sure that we, we were really understanding the situation of everyone.

And in a sense, similarly, we, we, we went and tied to outbox customers, in various ways so that they could also adapt, during this crisis. And so themselves were sitting a remote work and, having a fully online platform like that really helps. to continue collaborating even, in a full year work problem situation.

And I must say that actually, almost all of our customers continued to do analytics and even more analytics than before during the crisis. And then during the crisis. And the reason is that analytics is key. In order to plan or add to do forecast to the plan. So, our customer actually, implemented in the last few weeks, lots of use cases to, improve the efficiency of their manufacturing in a situation where they are to completely.

He scheduled everything to rethink their supply lines, to rethink about our pricing, to try to unseen new trends in terms of demand and  of customers. And that was actually super interesting because we have a, I guess, the males in very different industries, including, airline must be deputies, car manufacturing, aerospace manufacturing, pharma, city, girls.

consumer goods, financial services, and all were adapting differently, but like all were very active in terms of adapting and actually leveraging data in order to, to, to try to plan our ad.

Steve Hamm: [00:20:44] Yeah. Yeah. It seems like airlines, retail and restaurants are among the industries that are most seriously damaged by the while the crisis by the economic slowdown. Are you seeing them respond very aggressively and creatively to the crisis using using data analytics?

Florian Douettea: [00:21:06] Yeah, we, we, we saw that happening, which is about tying to better anticipate what would be the behaviors of the, of the customers, tied to and leverage analytics in order to further our boost, new business lines that could be, for instance, own delivery, for Iceland chains and so forth.

And so in a sense, it's gone up like a blessing was accelerating in terms of the world moving to do something or digital to something online for this crisis.

Steve Hamm: [00:21:36] And it's accelerating now, I think, right?

Florian Douettea: [00:21:39] Yeah. It's still accelerating in a

Steve Hamm: [00:21:40] Yeah. Yeah. Hey, does your software operate both in the cloud and the public cloud and on premises?

Florian Douettea: [00:21:48] yes it does. Yup.

Steve Hamm: [00:21:50] Are you seeing any shift in that right now? Any, I mean, I would imagine there might be an acceleration of the migration of data to the cloud, but you know, you could tell me.

Florian Douettea: [00:22:01] Yeah, it's, it's been a long standing trend that things were moving from on prem to the cloud. That's that's for sure. I know. So is this hazing, this is, this will possibly accelerate that, fell down. So that being said, we are talking about some negotiation that is, that started five years ago, at least if not 10 years ago.

and so, so KZ self will accelerate it, but, It's definitely an opportunity for everyone that, as he's doing business to the cloud, is simplifying the way to leverage data in the cloud. For

Steve Hamm: [00:22:35] yeah. Yeah. Why do you think it's accelerating it?

Florian Douettea: [00:22:40] It's a. Essentially people understand that they need, it's not a matter of cost. In fact, in a sense, it's a matter of like, Oh, fast, do I need value or fast? Can I operate? What kind of agility do I need? And people with this crisis, you realize that the ability to, to have some agility in the organization, the ability to reconfigure quickly.

When needed is more important than anything because tool resilience is not based on the fact that you've got yourself, secure the, you know, premise that are somewhere it's based on the facts that you can have a configuration needed. And that's, definitely a different mindset. And, I think that reiterating that is  into the cloud.

Steve Hamm: [00:23:24] Yeah. Okay. No, someday this crisis will end. And before there's the next crisis. There might be a period of stability or something like that. Who knows. If you would look out into the future, what's your vision of how data and data analytics will affect business and society. And even individual lives say five or 10 years from now.

Florian Douettea: [00:23:50] Mm. When I look at the future, I, I. I'm actually looking at, data and analytics can a change in a puff on the way, things related to supply line and consultation systems and they'll scale to actually make them way more efficient than today, because, All those systems are today getting MoMA down well, digital and so forth, but still in a sense, when you look at them as a world, very, very crude.

Yes, you do have like some smashed gals, but legs. Our transportation system is a. Well, in fact kind of dumb, it was very efficient.  in Paris. And I think in those cities of the world, but like, no, we are back into a traffic jam in a click and the same for L scout that's kidney. we love to be able to sustain.

Mark Hayes is in the, in the, in the ostial gum. And similarly for the way we con spelt goods and supply lines about, about the tools that we will have to rethink. And I think that data analytics, we have a key role to play in the negotiation of those key elements of our society, where we can spot ourselves or we transport goods.

And we make sure that we stay healthy.

Steve Hamm: [00:25:05] Yeah. Yeah. I think one of the lessons that this crisis has taught us is that even though we have all these great new tools for predicting things more accurately, you can't anticipate everything there are going to be surprises. And part of what you have to do is be able to respond quickly to be more resilient.

And, it seems like a lot of the technology, especially, machine learning technologies and especially cloud technologies are really things that will make a more resilient society. Do you see it that way?

Florian Douettea: [00:25:40] Yeah, it seems that way. And I think it's a combination of several technologies, machine learning being only about it's like a machine learning, access to data analytics in general. I think especially it's. It's all about being able to mix the human intuition and the data as it is, as you pointed out.

when you've got a crisis is still, it doesn't really matter because you can kind of use the last year to predict, this year you can use it as a sales from may last year to predict the sales from me this year. And so in another two to get anything done and to be able to respond to crisis, you need the agility to be able to configure.

You need to, to also have a bit of flexibility where you were able to have lots of different scenarios and prepare for the worst even before, but you also need when needed to be able to, to mix the human intuition, meaning as a subject matter expert, that is understanding what's happening, what, the weak signals you can use without a forecast you can make and mix  with, data.

You know, there are three, two, two to plan the ad, to drive things, during a crisis.

Steve Hamm: [00:26:48] Yeah. Yeah. It sounds like you see AI and humans kind of being collaborators in some kind of, almost like a continuous conversation. or do you, or is that the way you see things or do you think it's more like we feed it into the machine and we get the results out?

Florian Douettea: [00:27:06] Alright. I really see it. Like, it's like it like, like a nomination of Shubin for decision makings, who AI it's like kind of like a duality. You need to be able to. To make the business brain work with the technology brain in order to get there. And then you need to be, to make the human brain work well with this new AI Bain as an invitation, and you don't need two or three, you know, not to fight between those brains.

You don't want to oppose business and technology. You don't want to oppose a AI Humana because probably you don't get that issue. You take that perspective.

Steve Hamm: [00:27:41] Yeah. Yeah. You know, when you look around the world today, you see kind of a big rush to get back to way to get back to normal and. You know, in a lot of ways, normal, wasn't really that great for a lot of different for people in different countries, for different classes. And certainly in regard to sustainability and climate change, do you see, or do you have any hope that the getting back to normal might actually take a different path and get to a better normal.

Florian Douettea: [00:28:15] I do think so because, well, meaning as any citizen or any human, in a sense, I think it will have an impact. And as I, as a computer scientist or analyst, I would say that I can see that as a big AB test. As you could call it, you know, AB test hours as a test, you can do a line where you test two variants or something to see what is behaving better.

And in this particular instance, we were able to see what is happening to the society when, something as like no transportation or cure when a crisis or cure. And they give us different insights, like, Or worse. Can things go, can we leave with less transportation and so forth? So it sets of different nature, but definitely things that we can use in order to think about the future is hunky.

And so it will take time in order to consolidate so that we made sure, what we learned from this a very big AB test, but I do think that it will have an impact on the, on the future on the way we think a society, longterm.

Steve Hamm: [00:29:15] Yeah. Okay. So flooring, thank so much for your time today, you know, your stories and insights about what you do with data and how you do it and what you enable your customers to do really has been fascinating. So thanks a lot for your time.

Florian Douettea: [00:29:31] Since Steve, uh, I follow your question and your time

Steve Hamm: [00:29:34] yeah. Okay. Well, okay. Well, thanks again.

Florian Douettea: [00:29:37] Thanks for that.