chitkara logo


Vol. 4, Issue 2, January 2018

Digital Data Technology - Big Data, Data Mining & Machine Learning

The growth of internet has been exponential especially in the past couple of decades and the evolution of internet services and applications is so dramatic that today almost entire world is connected to internet using various computing devices. A lot of Digital Data is generated through these computing devices. Social networking platforms like Facebook, Twitter, WhatsApp etc. and e-commerce platforms like Amazon, Flipkart etc. are some of the key sources of unstructured digital data production. This high volume digital data is known as Big Data. Big Data invisibly has tremendous influence on modern day lifestyle. Our movements & activities are tracked and we get preferred news feeds, suggestions, advertisements etc on internet pertaining to so many different things we surf, traffic information through GPS is possible only because of Big Data and these are just a few examples. Big Data has various characteristics like volume, large number of sources, high speed/frequency, complexity and different kinds/forms of data. It is important that using these characteristics some useful information is extracted from the data otherwise this huge data is useless.

Image Source - quora.com

The process of examining number of records from relational databases and finding useful information is called Data Mining. It involves collection of raw data from different unmanaged sources, process the data, analyze it and store it for future use. Given the fact that the volume and frequency of data generation has increased many folds, Data Mining has become a challenging task. The first book on introduction of Data Mining was published in 1998 by Indrukya and Weiss [1] and was based on extraction of meaningful information from data. Various activities like classification, estimation, prediction, visualization, pattern extraction can be performed on datasets to obtain a valuable information that can be used as useful knowledge.

According to [2] traditional Machine Learning methods like supervised, unsupervised and reinforcement learning approaches are good enough only for small sized datasets. But with advanced Machine Learning, approaches like Active learning, Parallel learning have come into existence. Hadoop, Map reduce, Canva, Jupyter are some of the Big Data analysis tools based on new machine learning methods like Deep learning, Active learning and Kernel learning. But even before choosing a learning method there are a few concern areas that need to be addressed like - problem of missing values, difficult to categorize data, noisy data and mislabeling in data generation, data acquisition, data storage etc. that could result in loss of data and information.

By: Ms. Jyoti Arora, Asst. Prof., CSE, Chitkara University, H.P.

References

  1. Weiss, Sholom M., and Nitin Indurkhya. "Predictive data mining: a practical guide". Morgan Kaufmann, 1998.
  2. Fisher, Danyel, Rob DeLine, Mary Czerwinski, and Steven Drucker. "Interactions with big data analytics". interactions 19, no. 3 (2012): 50-59.

CLICK HERE to Rate the Article


Disclaimer: The content of this newsletter is contributed by Chitkara University faculty & taken from resources that are believed to be reliable. The content is verified by editorial team to best of its accuracy but editorial team denies any ownership pertaining to validation of the source & accuracy of the content. The objective of the newsletter is only limited to spread awareness among faculty & students about technology and not to impose or influence decision of individuals.