Data science is a combination of scientific processes, methods, systems, and algorithms employed to mine and learn from the vast swathes of structured and unstructured data that surround us in our everyday lives. It is the means by which we solve increasingly complex problems that would ordinarily take considerably (in some cases 1000s of years) longer to crunch the numbers and find a solution.
Data science looks at data across a broad selection of industries from healthcare and business to astronomical discoveries and even home entertainment and is rapidly changing the way we interact and use data to advance our culture, our businesses, and our lifestyles.
You would be forgiven for thinking that data science (also known as data mining or big data analysis) was just a buzzword, especially after the Harvard Business Review called it the “sexiest job in the 21st century” back in 2012. In reality it is actually delivering results behind the scenes at exponential rates; over 90% of the world’s data was gathered in just the last few years and by 2020 each human being on Earth will be generating around 1.7 megabytes of data every second. Time to get crunching!
In this article, we’ll explain what big data really is and how it is being used in the real-world to solve problems and advance everything from medical science to uncovering the mysteries of the Universe.
The difference between old-school data and BIG data
We’ve all heard the term, big data, but how do we actually define it? At what point does good old fashioned, regular data get to be called big? Surprisingly it’s not actually a reference to the size or amount of data collected but rather to the tools and processes by which it is collected that decide whether or not it can be referred to as big data. There are a few questions to be asked before we can call data, big:
Are your data analytics tools highly scalable?
You may have heard of big data platforms such as Spark and Hadoop, both of which are so popular because they are very scalable. They can infinitely crunch data without any discernible degradation in performance. In the ‘old days’, we might use simple SQL queries to analyze data but that just wouldn't cut the mustard when it comes to big data.
How flexible is your data?
We used to rely on all our data being nicely structured to sit comfortably in recognizable databases. Big data does not conform to such familiarity and needs a flexible approach. These days big data comes in so many different forms and investigating often very unstructured data is critically important to making practical use of the data. Going forward there are only going to be more familiar types of data increasing the need for future proofing extremely flexible strategies.
Are you getting your results in real-time?
We are getting increasingly used to getting our data in real-time; a great example might be Google’s website analytics data that can show you exactly how your audience is interacting with your website or app in any given moment. The days of patiently waiting for a system to crunch numbers and pump out a report after a period of time are gone. We need to see those numbers now so decisions can be made and action taken immediately. Real-time results are a critical part of big data analysis.
Are you teaching the machines?
Machine learning is something that really sets big data apart from traditional data. Systems like Amazon’s Alexa, Google’s assistant (and it’s search index in general), and Apple’s Siri all rely on big data to ‘learn’ how to answer questions, deliver results and improve themselves. We might not quite be at true AI (Artificial Intelligence) levels yet but we are certainly using big data to help machines to learn and this would not be unlike how a genuine AI might self improve and perhaps even understand its place in the Universe.
How is big data stored?
With the kind of mega data we’re talking about these days, we employ what is known as software-defined, scale-out storage systems. These are behemoth and super-secure storage systems that are designed to cope with today's data storage demands. They are a world away from the cassettes and floppy discs we used to store our data on not all that long ago.
How good is your data?
Probably the most important factor when it comes to large datasets is the quality of the data itself. As the data and the systems and processes that are employed to investigate the data get increasingly complex, we are also relying on the quality of the data being extremely high to maximize the value of the analytical results.
Using big data and data science in the real world
There would be little value in advancing or data mining skills if we weren't making significant use of the results. Here’s a look at how big data is being used in some real-world examples. The demand for data and data scientists is growing at an immense rate; in fact, since 2012 big data has been created more than 14 million jobs worldwide. If you fancy joining their ranks take a look at an online course, like the ones offered at Michigan Technological University and perhaps become a data scientist. You can become part of the growing industry of analyzing data and make a long-lasting career out of it.
1. Airbnb
Airbnb is a great example of a modern tech company generating real growth and success thanks to the use of big data and applied data science. It is an extremely popular online holiday rental service that matches up people with room to spare with people looking for accommodation pretty much anywhere in the world. They’ve used data to help build their platform right from the beginning and actually employ a team of data scientists to help them use data analysis to improve their product.
They have written machine learning software (called Aerosolve) to help their hosts to more accurately set rental prices for their properties. Analyzing previous pricing models, location, season and a host of other factors, Aerosolve predicts the optimum price for any given property. This helps their hosts to get the best return on their investment and make the most money from the platform.
More recently, Airbnb data boffins have released an open-source workflow management platform called Airflow to help speed up the maintenance and management of data pipelines.
Airbnb is typical of the success stories we can expect to see in the future as more and more online services use data to improve their offerings and increase their turnover.
2. Data science in the education industry
Right across the world, educational institutes are starting to leverage big data and data science to inform their operational strategies and improve everything from student life, to campus setup and actual course provision:
Personalized learning
With the historical data on individual students becoming available in more meaningful amounts, courses and learning resources can be customized and improved to significantly improve individual student learning outcomes. Courses can be re-engineered with the help of real-time data monitoring to ensure the best results for students.
More efficient grading systems
With greater data analysis, grading can become far more efficient. Currently, a lot of time, money and manpower are expelled on divisive regrading; however, big data algorithms can significantly improve grading accuracy.
Next steps
Alongside personalized learning we will start to see a similar approach for career advice too. If data science can allow a better understanding of each student's strengths, weaknesses and key attributes then a far more accurate career prediction model can help students to take their next steps in the real world.
3. Big data in the home entertainment and wellbeing industries
The big news this week is Google’s $2.1 billion takeover of Fitbit. Google are no strangers to big data science, of course, but with this takeover they now have even more access to our sleep, movement, and even heart-rate data. While, on the one hand, this will enable them to merge their own healthcare tech with Fitbit’s existing platform but it will also potentially enable them to better target users with services and advertisements based on their health and wellbeing metrics.
It doesn’t stop there, of course. Industry behemoths such as Netflix, Spotify and Amazon are continuously analyzing the viewing, buying and listening habits of their users so they can constantly predict tastes and trends and improve what they deliver. This directly relates to their bottom line as happy customers are more likely to stay loyal to a service that learns and predicts what a user likes and provides a data-driven, personalized service to match, especially in light of the plethora of competitors that are snapping at their digital heels.
Big data and the scientists that are analyzing it via an ever-growing set of software tools are having a huge and increasingly more important impact on life. This doesn’t just apply to businesses but in how our healthcare systems, education institutes, entertainment and even governments are changing.