The Data Cloud Podcast

Using Your Data for Good with Tamás Földi, Co-founder and CEO of Starschema

Episode Summary

This episode features an interview with Tamás Földi, Co-founder and CEO of Starschema. Tamás has previously been a white hat hacker and professional consultant for fortune 500 clients. In this episode, Tamás talks about distributing data, centralizing data, how the pandemic accelerated digitization, and much more.

Episode Notes

This episode features an interview with Tamás Földi, Co-founder and CEO of Starschema. Tamás has previously been a white hat hacker and professional consultant for fortune 500 clients.

In this episode, Tamás talks about distributing data, centralizing data, how the pandemic accelerated digitization, and much more.

--------

How you approach data will define what’s possible for your organization. Data engineers, data scientists, application developers, and a host of other data professionals who depend on the Snowflake Data Cloud continue to thrive thanks to a decade of technology breakthroughs. But that journey is only the beginning.

Attend Snowflake Summit 2023 in Las Vegas June 26-29 to learn how to access, build, and monetize data, tools, models, and applications in ways that were previously unimaginable. Enable seamless alignment and collaboration across these crucial functions in the Data Cloud to transform nearly every aspect of your organization.
Learn more and register at www.snowflake.com/summit

Episode Transcription

[00:00:00]Narration:  Hello, and welcome to rise of the data cloud today's episode features in interview with Thomas foldy co-founder and CEO of star schema Tamasha has previously been a white hat, hacker and professional consultant for fortune 500 clients in this episode. To mosh talks about the pandemic, accelerated digitization, distributing data, centralizing data, and much more.

So please enjoy this interview between  and your host, Steve ham.

Steve Hamm: It's so great to have you on the podcast today. I'm just fascinated with your company and what you're doing with technology. Many of our listeners, probably aren't familiar with star schema. So it'd be great if you would start by describing the company when and why did you and your partner started and what are the main dimensions of the business now?

Tamas Foldi: So we help organization derive value from their data and analytics [00:01:00] technology investments. With different words, we are a technology consulting firm. I started this company like 14 years ago and we started as a relatively boring company, doing a lot of Oracle SQL development, really boring, traditional data warehousing.

But to be honest, we just realize that we are going to die doing the same stuff over and over. So we decided to focus on emerging technologies because we wanted to do something different. We want to be unique and also doing things for others are not focusing for. And that's where we started to focus on cloud on open source data integrations.

And that's really helped us to work with the best partners and with the best clients. Given us that, uh, we are niched on the data and analytics. So that's the only thing, what we are doing. Just data and analytics, and also in the data and the analytics space. [00:02:00] We are exclusively working for these cool emerging technology is what we believe that the best at this point and will be the technology of the future.

Steve Hamm: That's great. Now people can probably guess from your accent that you're not from the United States or the UK now you're from Hungary and the whole company started in Hungary. Tell us a little bit about that. I mean, it must be very challenging to start a company in Hungary and make it a global company.

Tamas Foldi: Yes indeed. Coming from Hungary has it's all, uh, disadvantages. So using my Hungary on language, I don't think that I can close a lot of steals or I can help a lot of companies, but also in the same time, it, we have great education and extremely great people, a huge talent for, and also at the same time, having ambitions and having vision, it's not necessarily related to where you are.

If. If you're focusing on what do you want to achieve? And in our [00:03:00] case, again, we want to help companies to make the best decision on what technologies, what they are using. We are helping them to be successful with snowflake, for instance. And if that's your drive then, and putting the effort to it, then.

It's just a question of time when you are going to reach your customers and when you are reaching your goals. So it wasn't necessarily an issue for us actually, it's way easier than I thought to work with these fortune 10 companies. But again, for that, you have to find your niche. 

Steve Hamm: So what's your role in the company?

I mean, you've, you've been there for all 14 years. And do you have a leadership role, but you're in the, you're in the United States now. Yeah. 

Tamas Foldi: Yeah. So I'm the co-founder and the CEO, but if you check any of my biography or my LinkedIn profile, I used to call myself as a CTO because I just liked the title better.

I started as a data engineer and I still consider [00:04:00] myself as a data engineer. Even if my company did grow. Mm, two more than 200 people. And we have offices in, I don't know, east coast, west coast in addition to Hungary, but again, the goal was always again to live up our aspiration and then work with the best companies, but also in the same time being a data engineer, I want to make sure that we are enjoying what we are doing and being a data engineer or being.

Uh, an it programmer. Um, I wanted to create a company which will also be good for my employees who are also fellow data. 

Narration:  This episode is brought to you by snowflake join 50,000 of your peers at Snowflake's annual global user conference. Snowflake summit, this June 8th through 10th, here from snowflake customers, industry thought leaders and more.

About how they bring data together. Now with the data cloud, [00:05:00] learn more and register at snowflake.com/summit.

Steve Hamm:  No, we're talking at a time when the world is starting to emerge from the COVID-19 crisis. And I know that your companies. Played an important role in that, but we'll, and we'll get a bit to that later, but if you could talk on a more generally, what, what are you teed up to do as the kind of the world hopefully roars back economically and socially from this disaster?

Tamas Foldi: Yes. Data and analytics was already a hotel area and. During the pandemic each and every company started to accelerate their digital digitalization efforts. And digitalization means more data and more data means more work. Many companies realize that their plans and strategic for fragmented and are investing in getting the house in order doing so will enable them to better [00:06:00] compete and try for many why they know where they want to go.

Aren't sure how to, best to get there. More companies that turn to third parties like us to help with the practical implication needs for this digital transformation. And again, we have a track record on doing just this and already see greed demand. 

Steve Hamm: Let's go and talk to about the star schema COVID-19 data set.

When and why did you decide to create the star schema COVID-19 data set and what is it? People need that basic knowledge to get going. 

Tamas Foldi: Yes. So the started the two of my friends and community members of Tablo calls me in the beginning of the pandemic and told me that they need some data engineering, how to set up a Tableau software as COVID data hub.

So tell us softwares wanted to create easily accessible dashboards, where you can track all these cases around the globe. And they asked our help to make it [00:07:00] more. Robust. And since we did a lot of work with the Tableau community and how their Tableau foundation in the past, we immediately say yes, but during that process, I just realized that, uh, sharing the data and creating the data for their, for their dashboards for Salesforce itself was really a painful exercise.

And I told that there should be a better way to distribute this data. But the customers and to distribute it for everyone who needs to react to this crisis in the quickest way possible. And that was the point where I called a couple of my friends at snowflake and asked, Hey guys, do you want to have a COVID-19 data hub or.

Do you want to have the data and share it to your data marketplace? Because that's exactly, I mean, that's the best use case I can imagine for you. We can make sure that all the data will be there in the cleanest and ready to use format [00:08:00] and all of your customers can access with a single click. So that's how we started.

And I would say within a week using open source technologies, we set it up on ice, get hub repository. Uh, with all the source codes, with all the integration and the meeting within a few days, we are able to distribute this data to Hungary. So for first snowflake customers, and it was quite a ride it's, uh, currently the star schema COVID-19 datasets correlates a range of data sources.

Including the epidemiological data like morbidity and mortality from the John Hopkins university or divert house organization, and other like smaller national household charities. We have testing data from the CDC. We have hospitalization information, including project in the next show where you recently, we added the vaccination data.

We are also the John Hopkins university and. Our Verdin data, which is also on, on open data provider. And there [00:09:00] are a lot of other data sets as well, such as mobility, data sets like seeing, uh, how the, the commute times and, and people's mobility changed over time, as well as other useful data sets for giving context like demographic data, ICU, bed capacity, everything, what you need to build up your own.

COVID affiliated dashboards for your own COVID related use cases inside your organization. So that's how we started and that's what we are having. And we have a lot of inquiries and data requests from, from customers want to add additional data sources and we try to be as flexible as possible to make sure that if you have any needs, which might be useful for other clients as well, we are trying to.

Integrated back if it can. And also, because this is an open source project, there are [00:10:00] other random people over the internet from the snowflake community who are contributing to this marketplace. So that's really, uh, a collaborative effort. 

Steve Hamm: Yeah. Yeah. Hey, so why is it so important to have all this data in one place and shareable and what are some of the more cruelest things that the customers are doing with it?

Tamas Foldi: So the reason why it's great to have everything in a single place is because it makes it extremely easy to use it. Inside your own organization. You don't have to spend time hunting for four different datasets to find out what's reliable and what's not. And again, I would also emphasize the importance of the community here.

If you do. It wants and can be used by many. And, and you can just call like this institutional, I mean, the community knowledge in a single place, it will accelerate the response by these organizations [00:11:00] because we don't have to do individually the same tasks over and over. And also another advantage is have everything in, in one place that you can have, um, A really organized, easy to join dataset, which are using the same business keys.

So it's really, which makes it really easy to, to match the different data sources and join them together, blend them together and use your own data sets. So it saves a lot of time and, and also perhaps you to react quicker. Regarding use cases that we see three different waves of use cases. The first wave a few scares is, was concerned with responding to big sense, Steve trajectories.

And that was, uh, really related to the beginning of the pandemic. Like for instance, a defense contractor who has their offices just around the street, just opposite on the street. But from our offices has used our data to prioritize [00:12:00] various facilities for evacuation of. Barcelona, which means that I used to believe I am.

I don't know if it's true or not, but our work could impact actual human lives. So it was used to prioritize who and how should we ever created, which is, again, it was a relatively interesting use case, but again, in the beginning of the pandemic, There was a lot of uncertainty. The only thing which was sure is about the mortality and the number of cases after a couple of months or a couple of weeks, maybe there was a second wave of use cases, which were mainly focused on joining our COVID-19 data sets with accidental data, such as mobility indicators identified from south form localization data.

So several companies from the consumer retail sector have used our data set in this way. One example, we have a relatively large music label as the customer, [00:13:00] and they realized that the streaming music consumption habits are drastically changed. People used to listen to music in urban areas during the commute times during commuting, and since commuting went back to zero, practically all of their existing business models were impacted and using that data.

What is the situation in this, in these areas, how the virus is spreading, how people are. Commuting what's the mobility are people are moving. Are they staying at home? Are they not staying at home? What's the ratio between the social workers who are still traveling and start. These kinds of analyzes is where quite frequently produced in that second day.

The third, which is, I would say the current wave, eh, in this. Um, I would say that the main focus of the use cases are [00:14:00] predicting policy changes and the risking view of vaccination for again, many companies are using the star scheme. COVID-19 dataset to the marketplace to determine when it's safe, to reopen particular facilities, when to begin the transition back to the office and when the consumer demons for him.

Personal services is likely to rise. Again. One example, uh, like capita, who are using to forecast and plan response scenarios for its workforce and its customers. And also, especially in Europe, some governments also aligns their policy changes to the number of vaccinated. Fetal in their country or the percentage of vaccinated people.

So keeping their keeping companies, eyes on these numbers and percentages and extremely important to plan when they can open what kind of facilities [00:15:00] in their, in their businesses. 

Steve Hamm: Yeah. Excellent explanation of the three waves of using the, the dataset. Did the company do a lot of dataset creation before this, or did this kind of the COVID crisis give rise to a new, a new area for you to focus on and explore exploited?

Tamas Foldi: No, we are not, uh, again, we are a consultation company, so this was the first effort, actually not the last, since then we made additional data sets available. We found that this is the best way, how can we have to other communities, other accompanies? And when a Crow, when there is a crisis, you should think about how could you have the most for others?

And that's. What was the most straightforward and the most obvious way to, to try to have other snow. This is an extremely, you were already out for us. So we started to add additional datasets to the market [00:16:00] place because of two things. The first it's extremely easy to use the marketplace. It sends a pleasure to share data sources with others.

And also on the other hand, We use so many. Open source and free projects in the fast. And whenever we have a chance to make other companies people's life easier, we just have to take that opportunity. And again, the data marketplace is just an obvious way to, to do that. 

Steve Hamm: Yeah. Yeah. And just to make it clear for the listeners, it was Snowflake's data marketplace that basically hosts this data set.

Tamas Foldi: Correct. Yes, it is. Yes. 

Steve Hamm: No, I understand that you created a couple of technologies along the line here, a starter dashboard and the case trajectory, status visualization. That's a mouthful. Talk about those. What are those and how are they used? 

Tamas Foldi: So the case [00:17:00] trajectory status visualization was just an example, how to use this snowflake data set from accent to such as Tableau dashboard.

It was really mad for an example, because we still believe that the biggest effect of this data when it's joined with the customer's own enterprise data. So looking to. Purely on accidental data. It's not going to give you any new insights. The value of the marketplace is that you have your internal enterprise data sets from your ERP, from your CRM, from every variable field organization.

And that's the piece, what you need to complete with that additional information to get more precise insights. Yes, the dashboard was really just a. To show some of the capabilities, but what SAS customers is, how the pandemic is going to affect their businesses. So for this [00:18:00] integrating with their own data buyer is just, it's just important.

Yeah.

Steve Hamm:  Just to make sure I understand correctly. So you're saying. The data marketplace made it possible for them to easily integrate external data with their own internal data and then analyze it.

Tamas Foldi:  Yes. The marketplace makes this more convenient than any other solution on the market. So it brings in the entire data set with a single click, actually with two clicks.

What's still, yeah. 

Steve Hamm: Yeah. Hey, you've made it sound. Pretty easy, but were there challenges, technological challenges or business challenges that emerged when you were doing this data set that could be instructive to the listeners? 

Tamas Foldi: Honestly, the biggest challenge was the data cleansing and the data quality issues, because most of these data sets were.

Put together by volunteers from different projects, [00:19:00] use Google sheets and other really Manuel and feed forms to collect the data. And especially in the beginning, it was a huge mess. It wasn't designed for computers to work with that data set. So we spent a lot of time to declare the data to add black, uh, I can talk about that later, but let's but the biggest challenge itself is, is the change of the data sets availability.

So in the beginning, John Hopkins was the primary source of case information and everybody's used their datasets and then eventually they decided to deny for it. To use their data sets for profit usage. So many of the stall fake customers were not able to use the John Hopkins university data sets, which means in that case, we had to find alternative sources.

As much as reliable as the John Hopkins university. So options fair, [00:20:00] like using the vert house organization, datasets, which again, he, last year they published their numbers every day in a PDF format. So we had to use some OCR technologies to parse their PDFs and get the numbers out of it because. Purely saying there really is no other way to get those numbers for companies who wants to use it for profit purposes.

Also another example that the biggest and the most reliable data provider cause COVID tracking project was also stopping early March. And in that case, we have to reach out many customers. We had to write a couple of articles and blog posts. How can they do the migration from one data set to another, which lead to one conclusion on our side.

So whenever you're doing a community project, or whenever you're doing any kind of project, Being consistent and staying until the really add the fifth is extremely important

Steve Hamm: , [00:21:00] a tremendous project, which has had tremendous positive impact on companies and clients and all sorts of people. One of the most important lessons that you have learned for your companies about dataset development and provisioning, that's enriching, the way you approach the world.

Tamas Foldi: Some of them, I started to mention in the previous question. So number one, You have to stay consistent and support your project and never a bad on them for us. It was. Um, extremely hard thing to keep up with all these other projects and do the changes. So what I learned that I shouldn't do it for the others in doing, and if we decided to support something we should support until the end until the really end.

We also, I learned that the data marketplace is an extremely easy to use system to share data and that's. As I mentioned the best on the market, and it's [00:22:00] also encouraged us to invest more, to provide data sense to others and not just limit ourselves to the COVID-19. And also the certain what I learned that the whole concept, the whole data marketplace concept is not just applicable for, for customers and across the customers and between data providers and customers, the same concept.

Like publishing your data inside an organization provides a huge value as well. So we also started to encourage our own clients to use snowflake, to build up internal data exchange, internal data marketplace, where the different functions and divisions can exchange data in a really sophisticated way.

Steve Hamm: Yeah. I want to backtrack a little bit now and talk about. The relationship between star, schema and snowflake. When, when did you first get involved with snowflake and what's [00:23:00] the nature of the involvement? 

Tamas Foldi: Yeah, so we are normally partners since a couple of years, and we truly believe in the snowflake technology because that's the only technology.

Which does the separation of storage and computation in a really elegant way. And I had the pleasure to spend some time in the snowflake and New York office and have some discussions with some of the VPs and CTOs and product managers there. And I realized that we show what snowflake has. And I really liked the engineering and the easier fuse that's of this system.

And I do believe that, uh, the future is CQL. And for that snowflake is one of our strategic partner. We are also having a few fortune 10 clients who rely on snowflake and they are quite satisfied. It. So they are as much of a fence as us. [00:24:00] So I think, yeah, that's how it started. Yeah. 

Steve Hamm: So let's, uh, I wanted to switch, switch speeds here and, uh, let's look into the future some more.

Uh, what are the major data analysis trends that you see developing over the next year or so?

Tamas Foldi:  And two big trends and two big changes. The first that it's more like a hope that all these Jay Java-based Hoddle spark and these kinds of computation system should just disappear or at least transform into something like the data cloud.

The challenge with the systems is that they were not designed from scratch too. Be Cloudnative. I truly believe that you can create a great computation engine if you are using. Java system. And I also believe that you can create a great analytic engine [00:25:00] if you are not designed from day one to seek well.

So I'm sure that adjusting like 10 years ago, Sikh was getting stronger and stronger day by day. And snowflakes will be one of the winners from this trend. On the other hand, what is, I think. More interesting is that I believe personally that in the next five years, argument analytics will be a big thing.

So right now many companies are having a lot of data. They have great reports and great dashboards, but what the end of the day analysts are doing, just look for animal is, looks for trends and trying to understand what really happens, but using. Machine learning like easy statistical algorithm, some not even talking AI, but using basic statistical algorithms, you can easily tell not just what is in your database, but you can find out what you have [00:26:00] to look at.

Or you can ask for a question why this really happened. And I truly believe that that's that trend, augmented analytics explaining your data, asking root cause analyzes, or just ask for AI generated insights to see where you need to look at your data. That's going to be the future. So if I would. If I want to invest my money, then I definitely would buy or buy in into an augmented analytics company.

Sure.

Narration:  Fascinating, modern age we live it it's just what the future holds.

Steve Hamm:  So now I'm going to ask you to put on your visionary cap for a minute. Look out five years or more. How do you see data management and data analytics impacting business government, society, healthcare, whatever, the whole thing 

Tamas Foldi: in five years, what I would expect from leading companies that they [00:27:00] would.

Operate their companies from different cockpits and dashboards and everything will be really centralized around data. As a first step. Everything needs to be in the cloud and everything needs to be centralized. All the data values and the data assets has to be in a single location. Second, all these companies have to get the context and need, they need to leverage.

The knowledge from others using external data to improve or complete their. Their own data sets. I think that will be really important that you already see the trends that we are helping many pharmaceutical companies, for instance, to use, to create more precise data science for the us, by leveraging more precise data datasets, but getting the data and getting the reports is just the first step.

What. I would think the future would be that [00:28:00] these cockpits will be giving advice to analysts and executed and decision makers like where they need to focus and what actions do they have to take and why humans will be always in the center more and more. Of these ex fluttery will be augmented to AI.

And the most important part is, again, when you are making decisions based on the data, you have to have a strong collaboration around it. So collaborative capabilities, especially after COVID where we are never going to back to the office as we did before. I don't believe that it's going to happen in the next, the next five years.

So finding out, how can you make a collaboration as synchronously using this huge centralized data asset and how you can keep track all of the decisions, what you made and find the impact of what you are doing without leaving that cockpit? Uh, I [00:29:00] think that's where all these products should go and be able to go in the future.

Narration:  Sure. Where your information there's a lot more to older. Some people really need to dig deep and get to know the real view.

Steve Hamm: I really love your accent. And when we talked before you described your accent as terrible, and there are moments when it's hard to understand, but then you called it your super power. What did you mean by that? 

Tamas Foldi: And that came from the bottom and I never really learned English in a formal way. So my accent actually has me to remember, but I came from, but also at the same time, Fortunately, my kids had the privilege to start their schools in the U S so athletes.

They can also make some fun from me. If my accents, they always try to correct me, but you know what I'm telling to them two things, the first [00:30:00] that it's, you have to be that guy, but no matter how strong his accent is, You will listen to him. That's number one. And the second is that even if my English is as is, but my C plus plus my Python languages and even SQL languages, I very better than anyone else's.

So it's cool. Also because of my accent, when I'm talking about complex topics. So I just notice that people are listening more carefully.

Steve Hamm:  You know, one of the things I've learned over the past year, and I think. Zoom is really part of it is that in a lot of business meetings, people just talk like this one talks that one talks, it's almost like a competition to see who gets to say something.

And I actually feel like people are listening more in meetings and there's like a patient. And then there's listening deeply, not just to answer, but to understand. And I think that's something. In [00:31:00] fact, that's something that several other people on the podcast have discussed. And hopefully that'll be one of the things that we come out of COVID, you know, doing and thinking better.

So it's been great talking to you and. You know, I just feel like the thing that you did that your company did with that COVID 19 dataset is just a feat. It's an example of what human beings and organizations can do in a time of crisis to, you know, make a big difference. All of a sudden that wasn't your business model, but you just went out, did it, and it served a lot.

Of companies, a lot of organizations and I think it probably save lives. So that's not something that every business can say. And I think it's just very inspirational that you guys did that. 

Tamas Foldi: Thank you. 

Narration: This episode is brought to you by snowflake the data cloud company. Inside the data cloud [00:32:00] organizations unite their silo data, discover and securely share data and execute diverse analytic workloads across multiple clouds.

Learn more at snowflake.com/podcast.