What is BIG DATA?

What is BIG DATA?

Introduction:

1.      As we know that the data is a collection of facts and figures. A raw data may be getting meaningful or meaning less. It when supposed to undergo through the processing then only the meaningful information is coming out of it.

2.      Big Data is an approach which is a collection of such raw data that is huge in volume and it always growing exponentially with time. Of course, when the volume of the data get increases then the complexity associated with the data is also got increases.

3.      Big Data is a collection set of data with huge, large size and complexity that none of traditional data management tools can be able to store it or process it efficiently. So, we need a proper mechanism to process such huge data like Hadoop.

4.      The Hadoop is basically an Apache product which is open-source framework which is very user friendly, reliable, and written in java environment.

5.      Whenever we are going to deals with distributed processing of large datasets across clusters of computers then it uses a simple programming models to accomplishes the task.

6.      Its architecture is very simple and sophisticated and designed in such a manner so that it can scale up from single server to thousands of machines. It also offering local computation and storage of data.

As the technology is going to be Advanced day by day ahead, so we need to understand the importance of Hadoop, and its application strategy using which it can be able to provide the solution to the problems associated with Big Data.


What is Data?

As we have already discussed above that the data is a collection of facts and figures. A raw data may be getting meaningful or meaning less. In better way we can say that the data is a quantities, characters, or symbols on which operations are performed by a computer. When we are going to process the data then they may be getting stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.


What is Big Data?

In general, if we are going to discuss about Big Data, then it is a collection of raw data that is huge in volume/ quantity? it always growing exponentially with time. Of course, when the volume of the data get increases then the complexity associated with the data is also got increases. Big Data is a collection set of data with huge, large size and complexity that none of traditional data management tools can be able to store it or process it efficiently.

Examples of Big Data:

When we are going to discuss about the Big Data then we come across several things. We may take those as a kind of Big Data examples. Here I am going to discuss the few relevant examples as below.

1.      Social media applications like Facebook, Twitter, WhatsApp etc. These applications are used to generate more than 1000+terabytes of fresh new data into the databases. The statistics say that alone Facebook, every day generates more than 200 terabytes of data. Similarly, the next is Twitter which generates more than 170 Terabytes of data every day.

2.      Stock exchanges and Big trader’s companies are also used to add the huge collection of data to the database every day.  The statistics say that alone New York Stock Exchange generates about one terabyte of new trade data per day.

3.      The another most common example is Airlines companies which are used to generate near about 20-30 Terabytes of data at every single day. So now you can imagine that if we are having 10 numbers of such flights then what will be the amount of data will be? Surely it is in the range of petabytes.

 

Like above we are having many more applications too like Instagram, YouTube, skype, Google videos etc which are going to add huge data to the database every day and every minutes.

 

 

Types of Big Data:

 

When we are going to analyse the Big Data then the appearance of the data will be of following types, such as:

  1. Structured data,
  2. Unstructured data, and
  3. Semi-structured data.

Now let us discuss the each category here.


Structured Data:

1.      A Structure data is always having the fixed schema to get represent itself.

2.      These data are going to be stored and process in a systematic and predefined manner.

3.      Whenever we are going to access these data then their mechanism is always fixed. 

4.      As the format is well known in advance and hence the user doesn’t feel any problem to deals with it.

5.      But if the data size is going to be get increases up to maximum limit, then the processing is becoming more difficult.

6.      As the Big Data is used to have the huge collections of variety of data so sometimes structure pattern is bit difficult to get adopted over here.


Examples:

Let us consider the case of ‘Employee' table in a database. It is a best example of Structured Data where each data is going to be represented in their fixed format.

Employee_ID 

Employee_Name 

Gender 

Department 

Salary_In_lacs

2365 

Rajesh Kulkarni 

Male 

Finance

650000

3398 

Pratibha Joshi 

Female 

Admin 

650000

7465 

Shushil Roy 

Male 

Admin 

500000

7500 

Shubhojit Das 

Male 

Finance 

500000

7699 

Priya Sane 

Female 

Finance 

550000

 

 










Unstructured Data:

1.      In contrast to above if we are trying to corelate then these data don’t have any fixed format for their representation like the structured data.

2.      Most of the data which we are used to receive form the social media-based application, Digital, Video, QR Geo-spatial are unstructured as they have no common format for their representation.

3.      Here the data may be of the any category and hence very difficult to get manage.  

4.      In addition to the size being huge, un-structured data poses multiple challenges when we need to process them for deriving value out of it.

5.      Unstructured data is basically collection of heterogeneous data source which may lead the combination of text files, images files, and videos files etc.

6.      In present scenario most of the organizations have their data available in unstructured way and they don't know how to derive value out of it.


Example:

For better understanding the concept of unstructured data and its relevancy we can take the example of searching performed in google search engine. Here whenever we are trying to find the solution for a particular test data then google search engine will give us all relevant solution which are completely different than others.

Typically, when we are going to combine the structured data format and unstructured format data in to a single category then the Big Data concept is get found. The cost of managing the unstructured data is always very higher than structural data.


Semi-structured Data:

1.      As discussed earlier, that the Semi-structured data is usually contain both the forms of data ie Structured as well as Unstructured.

2.      We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file.

Examples:

The most common example of Semi-Structured data set is Personal data which is going to be get stored in an XML file. Let us consider the following formed of data written in XML file as showed below.

<rec><name>Prashant</name><sex>Male</sex><age>35</age></rec>

<rec><name>Seema</name><sex>Female</sex><age>41</age></rec>

<rec><name>Satish</name><sex>Male</sex><age>29</age></rec>


Characteristics of Big Data:

When we are going to discussed about the Big data then as the data capacity is very huge here and they are mostly semi-structured so they can be described by the following characteristics:

  • Volume
  • Variety
  • Velocity
  • Variability

Let us consider the following diagram which will let you know the schematic in better way.


1.      Volume – Usually the name Big Data itself indicates that it is related to a size which is enormous and very high. Here the Size of data is used to determine the value out of data. So, when we are going to consider any data as a Big data then it depends on the nature and the size of the Data.

2.      Variety – The Variety of Big Data is used to refers to the heterogeneous sources from which we are going to receive the data and the nature of data, both structured and unstructured. In traditional concept we aare using the spreadsheets and databases. At that time we were have the only sources of data considered by most of the applications but Nowadays, the data pattern and its variety is get different. Now we have the data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications.

3.      Velocity – The term 'velocity' in Big Data is basically used to refers to the speed of generation of data. Here the velocity is used to meant that How fast the data is going to be generated and get processed in a superficial way so that we can be able to meet the demands of the client and determines the real potential being associated with the data.

           In Big Data mostly the Velocity relates to the speed factor by using which the massive and continuous data get flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc.

4.       Variability – This refers to the inconsistency which can be shown by the data at times, thus hampering the process of being able to handle and manage the data effectively.

 

Benefits of Big Data Processing:

When we are going to have the benefits considerations, then in Big Data the Ability to process is identified. Here the Big Data brings in multiple benefits, such as

1.      The Businesses can be getting utilize outside intelligence while taking decisions.

2.      Access to social data from search engines and sites like Facebook, twitter.

3.      It basically enables the organizations to fine tune for their improved business strategies and Improved customer service.

4.      With the advancement in Big Data technologies the Traditional customer feedback systems are getting replaced.

5.      Now we are having the new systems get designed where the Big Data and natural language processing technologies are being used to read and evaluate consumer responses.

6.      It enables the identification of risk associated with the product and services in early stages, so failure risk is also getting decreases up to great extent and we have the better operational efficiency too.

7.      With the advancement in Big Data technologies, we could able to have the provision for creating a staging area or landing zone for new data. Especially before identifying their properties and what data should be moved to the data warehouse.

8.      In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data.


Scope @ N9 IT Solutions:

1.    N9 IT Solutions is a leading IT development and consulting firm providing a broad array of customized solutions to clients throughout the United States. We established primarily with an aim to provide consulting and IT services in today’s dynamic environment.

2.      We established primarily with an aim to provide consulting and IT services in today’s dynamic environment.

3.      N9 IT also offers consulting services in Java/J2ee, Cloud Computing, Database Solutions, DevOps, ERP, Mobility, Big Data, Application Development, Infrastructure Managed Services, Quality Assurance and Testing.

Achieving your dream goal is our motto. Our excellent team is working restlessly for our employees to click their target. So, believe on us and our advice, and we assured you about your sure success.

OUR BLOG

What Is Happening