In twitter I came across a very interesting tweet from Andrew Trask.
At first I was very confused on what he meant by data. I thought it had something to do with ETLS or ways to gather more data quickly. Maybe something related to data quality or an algorithm that searches for relevant data inside your organization. I asked him if he had a blogpost and he provided a whole site. It’s a whole open-source organization working on this problem.
The next generation on deep learning frameworks will be privacy. Not so much on obscuring your data but how to work with private data (Imagine the worst, healthcare data, paychecks, anything more private than that). It turns out that if you spend time thinking about this is a big deal.
A lot of people say data is the new oil. Which is something I think is not the future, it already is. In order to power our very cool algorithms, you need data and sometimes a lot. But due to some new regulations (which is a whole other discussion if they should exist, I think they do), a lot of data is off-limits. It’s like having oil in a country you can’t access. Andrew’s organization is building a lot of tools that help us go to that country and use the oil. They have a very cool group of techniques to handle these problems. And they are working on more to come. On the other hand some countries claim the advantage of working with private data due to their lack of regulations, OpenMined is leveling the playing field.
If you are interested in this topic I suggest you take the tutorials or join the slack, and if you liked this blogpost feel free to find me on twitter.