All about BIg Data

What is BIG DATA?


Big data as the name itself describes is huge and contains large volume of data, but not only that it can contain structured, semi-structured and unstructured data which are difficult to process using traditional database.

The characteristics of big data can be described in a number of V’s.

  • Volume: Refers to the amount of data generated throughout the time.
  • Velocity: Refers to the speed at which data is being generated and moves from one point to the other.
  • Variety: Refers to the various forms of data. Such as audio, video, text, geospatial, images and etc.
  • Veracity: Refers to the frequency of unmeasurable uncertainties and trustworthiness of data.
  • Valence: Refers to the connectivity of big data in the form of graphs.
  • Value: Refers to the benefits gained by big data.

Where does BIG DATA come from?


Big data is generated from various sources, considering the characteristics mentioned above (i.e. the Variety). The major categories are data generated by machines, human, and organizations.

Machine Generated Big Data

This is the largest source which generates data from sensors and provides real time data. Some machines collect data all the time (24/7) in personal scale as well as industrial scale. Some of them are listed below.

  • Environmental sensors like global climate models that are interconnected with sensing data systems.
  • Sensors in transportation services like airplanes. (example: Boeing 787 produces 0.5 terabyte of data every time it flies.)
  • Industrial machinery like SCADA (SCADA is an industrial control system which monitors the physically existing industrial processes remotely.)
  • Scientific equipment like Large Hadron Collider (The LHC is the world’s largest and most powerful particle accelerator, which generates 40 terabytes of data every second during experiments.)
  • Data from satellite and surface-based measurement campaigns.
  • Personal health trackers. (These equipment can track the blood pressure, amount of calories burnt, heart rate and so on.)

Human Generated Big Data

This is the huge amount of data generated by the activities of human through online. Most of the data generated by human is unstructured and text-heavy which are in the format of texts, images, video, audio, emails, internet searches, web pages, PDFs, XML and other formats . They are as follow.

  • Email like Gmail, Hotmail, Yahoo Mail.
  • Blogging and commenting.
  • Internet searches.
  • Social networking sites like Facebook, Twitter, and LinkedIn.
  • Online video sharing websites like YouTube, Vimeo.
  • Online photo sharing sites like Instagram, Picasa ,Flickr.

Organizational Generated Big Data

This refers to more traditional types of data which are the spreadsheets, structured data in data warehouses, transaction information in databases, commercial transactions, credit cards, government institutions, e-commerce, banking or stock records, medical records. In organizations, these data are stored for the use of present, future, and specially to analyze the past trends to make decisions.

Most organizations are adopted for the relational databases because of the ability to define relationships between columns and fields in the table. SQL (Structured Query Language) is used to extract the needful data from the tables.

Big data comes really handy in the scenario when integrating structured data with other types of data generated by other departments. Most organizations collect and store their data department wise. This has delayed the growth of scalable pattern recognition which increases the risk of the organization to have outdated, unsynchronized datasets. Integrating department wise data into one system that has access to all the data of that organization is called breaking up the silo of information, which is very useful to make future decisions of the company.

Benefits gained by organizations with the use of big data are as follow.

  • Operational efficiency
  • Improved marketing outcomes
  • Higher profits
  • Improved customer satisfaction