Episode Summary: Companies with wells of data at their disposal may find themselves asking how they can use them in meaningful ways. Generally speaking, a clean set of data is the foundation for AI applications, but business owners may not know how exactly to organize their data in a way that allows them to best leverage AI. How exactly does a business transition from having data with the potential for usefulness to having data that’s going to allow for an accurate, helpful machine learning tool—one that can actually help solve business problems?

In this episode of the podcast, we speak with Bryon Jacob, Co-founder and Chief Technology Officer at data.world, a company that offers products and services that help enterprises manage their data. In our conversation, Bryon walks us through the common errors companies make when creating and organizing data sets, and how these companies can transition to a more organized and meaningful data management system.

The details in this interview should provide business leaders with a better understanding of some of the processes involved in getting started with AI initiatives, and how to hire data science-related roles into a company.


Subscribe to our AI in Industry Podcast with your favorite podcast service:

itunes-podcast
itunes-podcast
stitcher-podcast
google-podcast

Guest: Bryon Jacob, Co-founder and Chief Technology Officer of Data.World

Expertise: Data collaboration, data science, analytics, data distribution, teamwork, and open data

Brief Recognition: Bryon’s 20-year academic and professional experience includes AI research at Case Western Reserve University, building enterprise configuration software at Trilogy, consumer web experience at Amazon, and overseeing platform development and the integration of HomeAway.com, an online marketplace for vacation rentals. Currently, he is also a mentor at startup company Capital Factory also in Austin. He holds both undergraduate and graduate degrees in computer science from Case Western Reserve University.

Big Idea

Companies often get excited about artificial intelligence, Bryon says, seeing it as an opportunity to answer tough business problems. But using advanced, sophisticated techniques to build AI or machine learning systems are expensive when straightforward and simple statistical methods will often just as well. Building the AI system is only sustainable in the long term if a company has high-quality data, and the right way to classify and access it.

In many cases (in sectors ranging from insurance, advertising, manufacturing, and more), data flows to a company in inconsistent formats. A data warehouse may contain event-type data stored on a large distributed system or individual applications that each have unique data storage formats, and only the experts of each system know how to retrieve the data.

How to Manage Data Well

The first step to doing it right is acknowledging that the data that drives a business is fundamentally one cohesive set of information instead of disconnected pieces or systems, then representing that data in an accessible format.

Enterprises rarely look at data management issues with the intent to solve them until the issues become difficult to manage. In truth, defining data formats and organization at the onset is a strategic move because the costs of doing it wrong are huge in the aggregate, Bryon explains.

The Role of the Chief Data Officer

Each arm of the organization can define the data well, but the data may not flow smoothly across the organization. Having a chief data officer (CDO) whose single most important mandate is to understand and manage the data ensures that that data has value and accomplishes its role in the business. This should also spare the company from having to implement expensive fixes to an improperly set up data architecture.

Dealing with inconsistent systems is a necessary evil. A CDO may be bound to some of the choices that were made in the past. It won’t be realistic to replace the master data management systems that govern fixed data points that drive the core of an enterprise. But there are still flexible elements within the data strategy that the CDO can focus on, such as the data warehouse, the way that applications acquire, consume and produce data without being reformatted, repurposed and re-understood.

Companies looking to hire CDOs should look for data scientists with experience from an engineering organization as these professionals shall have gained data management skills as they build product test solutions. They are also accustomed to solving increasingly large problems through proven agile methodologies. Statistical, mathematical, analytical skills are definitely important, but data management is fundamentally an engineering problem, and Bryon tells us that engineering is where many critical CDO skills will come from.

The chief information officer will be responsible for the systems. There will be other stakeholders, perhaps subject matter experts, marketing or finance. But in the real world, Bryon explains that these people don’t come together until the problem has grown out of proportion and that most companies are likely to only hire a CDO when the pressure of working with poorly organized data makes it necessary.

Interview Highlights with Bryon Jacob

The main questions Bryon answered on this topic are listed below. Listeners can use the embedded podcast player (at the top of this post) to jump ahead to sections they might be interested in:

  • (4:00) What are the common errors that companies experience when they jump into AI too quickly?
  • (6:30) In your opinion, what is the best process to organize data?
  • (10:03) Who are the stakeholders who need to make the strategic decisions about defining the data?
  • (12:55) How can we structure the data to make it useful immediately and in the future?
  • (19:25) What experience and academic credentials are required of a CDO?
  • (23:55) What points do companies from various industries have to know and understand when they start organizing their data to make their way to machine learning?

Subscribe to our AI in Industry Podcast with your favorite podcast service:

itunes-podcast
itunes-podcast
stitcher-podcast
google-podcast

 

Header image credit: HL Chronicle of Data Protection