The Data Science Machine
Big data consists of large, variable, and complex data sets, so much so that traditional analytical methods are insufficient. Big data analysis thus involves evaluating these data sets to discover unknown patterns with predictive power. However, choosing which “features” of the data to analyze requires human knowledge and wisdom. The total analytical process can take a team weeks. That may change, however, thanks to computer scientists from MIT and their recently created “Data Science Machine”. To test their prototype, they entered the machine in three data science competitions where it competed against 906 teams, finishing ahead of 615. The predictions made by the Data Science Machine were 87, 94, and 96 percent as accurate as the winning submissions.
How it Works
As big data typically contains different types of information in different tables with the correlations between them being indicated using numerical identifiers, the machine tracks these correlations and uses them as signs to engineer "features". Feature engineering, the process of using the domain knowledge of the data to design features that allow machine learning algorithms to function, is one of the critical steps in solving big data problems. The MIT scientists see the Data Science Machine as “a complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the creation.