A week ago marked the release of IBM’s plan to open source it’s SystemML machine learning software freely available to use, share, and modify.

This follows closely in the footsteps of Google’s open source TensorFlow software, and Facebook decision to do the same thing with it’s AI software in January of this year.

Why, though?

We might presume that many of the AI software being released open source took millions and millions of dollars (and man-hours) to craft, and to some degree, would seem to be some of the most important proprietary technology that big companies like Google and Facebook have. How could they release it all for free?

It’s been dubbed “democratizing machine learning” by some, but most experienced tech journalists seem to have been able to call a spade a spade: It’s a rather savvy business move by these large companies.

In an interview for WIRED magazine, CrowdFlower CEO Lukas Biewald mentions “What they’re not opening up is their data, they would never do that” (referring to Google).

Data, frankly, is often more important than software, and by locking down the data in exchange for some software, Google get’s to improve it’s machine learning algorithms with a large number of new tests and new experiments that it doesn’t have to run in-house.

Some might say “Well of course Google can’t open up all of it’s data, that contains information about people, their buying behavior, their embarrassing photos, and more… how could they ever open up their data without violating some kind of laws around sharing information?”

Google could hypothetically take individual segments of it’s massive data pool (say, information about banner ads and retargeting) and share that, presumably in a way that wouldn’t violate anyone’s privacy rights.

But again, data is often more important than software, and sharing data would be unlikely to help improve Google’s own business processes.

IBM recently bought much of the digital presence of the Weather Channel. Why? Because data. Because lots of data, and the ability to be the one who makes that data useful and salable. Trucking companies, companies with traveling salesmen, and any organization with a large focus on logistics may be able to make much better business decisions if they have wield better weather predictions.

Facebook and Google have probably made acquisitions just for the sake of data, too (Google’s acquisition of Zagat comes to mind), but unlike IBM they exist as mega-platforms where information about engagement, clicking behavior and buying behavior are made en masse. Arguably, this kind of data is of greater worth than whether, and the more patterns their AI can recognize and the more causal connections their machines can make, the stronger their respective companies will be. They need the data. Bought, “crunched” in their own lab, or “crunched” by geeks and academics all over the world, data is good.

In addition to that, companies like Facebook, IBM and Google get to pick which elements of their AI engines they share with the world. One might imagine that there is vastly more powerful and important technology under the hood of these big companies – outside of what they’ve decided to share with the public. They can give away enough to get back much more in terms of data.

That being said, they can take the segment of AI that they’re willing to share, open it up, and run as much data (and interesting experiments) through it as possible. After all, it takes a lot of data to train a neural network, so… the more the merrier – especially if it doesn’t have to come out of Google’s research budget directly.

Google famously used 16,000 computers to train their neural network to recognize cats. One might presume that more experiments in new and novel applications will help Google develop additional capabilities

It seems obvious, however, that there is a potentially great benefit for universities, companies, and individuals who otherwise wouldn’t have access to strong machine learning tools that these tech giants have recently made available. It’s a great example of enlightened self-interest, and we might expect other large tech companies to make similar moves if this open source strategy proves itself useful for Google, IBM, and Facebook.


TechEmergence conducts direct interviews and consensus analysis with leading experts in machine learning and artificial intelligence. Stay ahead with of the industry with charts, figures, and insights from our unparalleled network, including executives from Facebook, Google, Baidu, Yahoo!, MIT, Stanford and beyond: