What does big data mean?
Volume
Velocity (speed)
Variety
Sources of big data: Where does the data come from?
Examples for the use of big data
Example 1: Automotive industry
Example 2: Marketing
Example 3: Health care
Why is big data important?
How big data technologies work
Distribution to multiple systems
Parallel processing
High scalability
Advanced analytics
Automation
Challenges of big data
The future of and with big data
Digital devices and the internet generate huge amounts of data. Companies can use this data to better align their products and services with the market and their customers. Big data can thus make a decisive contribution to the success of a company. But what does big data actually mean? And how exactly can the data be put to good use?
This article explains in a simple and understandable way what big data is all about, where all this data comes from and where it is used. You'll learn why so many companies are diligently collecting data and what technologies are needed to do so. We also show what challenges there are and give an outlook on the role big data will play in the future.
We understand big data as vast amounts of data that are highly complex and highly dynamic. It cannot be stored and evaluated using conventional data processing methods. This means: A single computer cannot handle the masses of data, and common software like Excel cannot analyze it. Special technologies are needed for this. The term big data is also frequently used for these technologies.
Definition: The 3 Vs of big data
The 3V model is usually used to define big data. Computer scientist Doug Laney described three key dimensions of big data in the early 2000s:
They often comprise several million gigabytes. One also speaks of petabyte (approx. 1 million gigabytes) or exabyte (approx. 1 billion gigabytes). We rarely encounter such huge amounts of data in everyday life. With this analogy, it becomes more understandable: One petabyte is equivalent to about 500 billion pages of text. It is easy to imagine that a normal hard drive is not sufficient for this. Because of this enormous volume, big data is also referred to as massive data.
The data sets are created at high speed. And since they quickly lose value due to their dynamic nature, they also need to be transferred and evaluated at high speed. Some digital devices can process dynamic data streams in real-time or near real-time.
Large, fast-moving data sets contain different types of data. There are structured formats, like ordinary tables, and semi-structured and unstructured formats, like photos, videos or emails. The variety of data types requires special systems to store and analyze the data together.
Over the years, the 3V model has been extended by many other terms starting with the letter V, such as Veracity or Value. However, according to different definitions, the main characteristics of big data are always the enormous volume, velocity and variety of data.
The global volume of digital data is growing unabated. Huge amounts of new data are generated each year, and in ever more extreme dimensions—faster, more complex and in greater quantities. Considering the continuous digitization, this comes as no surprise. Digital devices, smart systems, apps and the like are flooding the market. Billions of people use the internet and various digital media. More and more companies and administrations are undergoing digital transformation processes. And the digital infrastructure is constantly expanding through innovative technologies. This leads to numerous sources of data, for example:
smartphones
smartwatches
smarthome devices
social media
search engines
streaming services
e-commerce
The internet of things is a gigantic network of technologies and software systems that are connected and exchange data via the internet.
In our digitized world, data is essentially available anytime and anywhere. Companies are taking advantage of this, as is research. Different industries, departments, and social sectors can gain new insights from big data. Here are some examples:
An important "fuel" for automated and autonomous driving is data, and lots of it. The more autonomously a vehicle is supposed to move in traffic, the better the algorithms of the integrated AI systems have to be. The basis for this is data from kilometers of driving in simulations, on test tracks, and finally in real road traffic. This enables artificial intelligence to test a wide variety of scenarios in road traffic. This data-based driving school for cars ensures a high level of safety for vehicle occupants.
Marketing benefits from customer data. For example, think about your favorite brand. What information do you give the company about yourself? Maybe you shop at the online store. Maybe you follow the brand on social media and interact with their posts. Maybe you fill out customer surveys, write reviews, or have a customer card. All of this generates data—data about your buying behavior, your media usage, your preferences, your brand loyalty, and so on. The company may use this information to learn more about you as a customer and to provide you with personalized information through the channels you use most often.
In medicine and healthcare, large amounts of data are generated from patients and the general population, for example via health insurance companies, health apps or search queries on symptoms. Used sensibly, these data can help, for example, to improve the individual care of patients or to design effective preventive services.
"Data is the new oil." This saying sums up the big data trend well, because data is considered the raw material of the future. The digital transformation is turning the corporate and working world upside down, and digital data is becoming a central resource. Large technology corporations build their success on huge data sets, and more and more small and medium-sized companies want to tap into the potential of big data.
The point is not to collect as much data as possible. It is much more important to use the existing data efficiently. By processing and evaluating them, trends, patterns and correlations can be identified. This provides valuable insights into processes, products, markets and people. On this basis, companies can:
manage processes and resources better (e.g. save time and costs)
optimize products or develop new ones based on market trends
make business decisions based on data
Not only companies can benefit from big data. Data can also lead to more knowledge and progress in public sectors such as medicine, education or administration.
Knowledge and progress do not automatically result from big data. The data must be efficiently stored, managed and, above all, evaluated. This requires special technologies and tools. Suitable big data solutions work according to these principles:
Data is not stored and processed on a single device but distributed across multiple interconnected devices. These can be computers or servers in a data center. A remote solution, on the other hand, is cloud computing. Here, the data is stored online and can be accessed at any time and from anywhere with an existing internet connection.
With data volumes in the peta- and exabyte range, it would take a very long time to process the data one by one. In order to speed up the evaluation, both the data and the partial steps of the data analysis are therefore distributed across several computers. This allows the data to be processed simultaneously. Subsequently, the partial results are combined. This is significantly faster than a sequential approach.
Since data streams are very dynamic, the capacities of the big data infrastructure must be constantly adjusted. This is the only way to efficiently intercept peaks or dips in the data flow. A highly scalable system can accomplish exactly that: If necessary, new computing resources are added to increase its size and performance. Highly scalable storage systems for big data include data lakes or NoSQL databases, also known as non-relational databases.
Frequency distributions and correlations are not sufficient for evaluating big data. More complex analytical methods such as data mining or artificial intelligence are required. These can be used in the area of business intelligence, where company data is systematically analyzed. Advanced analytics methods require—as the name suggests advanced skills. Data scientists bring this know-how with them. Their task is to turn big data into smart data and to prepare the information obtained in a comprehensible way, for example by means of visualizations.
To cope with the rapidly growing flood of data, automated solutions are increasingly in demand. Even today, huge amounts of data can no longer be managed and analyzed manually, and the global volume of data is growing exponentially every year. Promising technologies to reduce the human factor in data analysis as much as possible are artificial intelligence, machine learning and neural networks.
Those who work with big data must always be up to date with the latest technology. The technical infrastructure is constantly evolving, and the methods of data processing are changing. For example, just a few years ago, the Apache Hadoop framework was the common big data ecosystem for storing and processing large amounts of data. Meanwhile, there is Apache Spark and Apache Flink, which enable faster data processing.
Another challenge is data quality. Many data sets have duplicates, gaps or errors due to their complexity and rapid change. Before the data can be evaluated properly, it often has to be cleaned, prepared and checked in a time-consuming process.
A frequent point of criticism in the debate about big data is data protection. Companies collect a great deal of information about their customers, some of it very private. Users of online services, apps or smart devices are often unaware of what data is being used by whom and for what purpose. Maintaining an overview of one's own data is a major challenge for everyone in the face of the daily growing information overload through digital media and the Internet.
Data will continue to be a valuable asset in our information and knowledge society. The amount of data generated is increasing rapidly each year, and the market for big data and AI technologies is growing unabated. Machine learning applications and solutions that can process data in real time are currently very popular.
Due to their high potential to generate knowledge and automate processes, data and big data analytics act as key drivers for Industry 4.0. Topics such as data protection and information security remain at the top of the agenda. Phenomena such as deepfakes or discrimination by AI are increasingly being discussed in public.
So big data and artificial intelligence are not only interesting for data experts and AI developers! Our e-learning "Big Data—Understanding the World of Data" will give you a deeper understanding.