What is BIG DATA?
Introduction:
1. As we know that the data is a
collection of facts and figures. A raw data may be getting meaningful or
meaning less. It when supposed to undergo through the processing then only the
meaningful information is coming out of it.
2. Big Data is an approach which is a
collection of such raw data that is huge in volume and it always growing
exponentially with time. Of course, when the volume of the data get increases
then the complexity associated with the data is also got increases.
3. Big Data is a collection set of data
with huge, large size and complexity that none of traditional data
management tools can be able to store it or process it efficiently. So, we need
a proper mechanism to process such huge data like Hadoop.
4.
The
Hadoop is basically an Apache product which is open-source framework which
is very user friendly, reliable, and written in java environment.
5.
Whenever
we are going to deals with distributed processing of large datasets across
clusters of computers then it uses a simple programming models to accomplishes
the task.
6. Its architecture is very simple and
sophisticated and designed in such a manner so that it can scale up from single
server to thousands of machines. It also offering local computation and storage
of data.
As the
technology is going to be Advanced day by day ahead, so we need to understand
the importance of Hadoop, and its application strategy using which it can be
able to provide the solution to the problems associated with Big Data.
What
is Data?
As we have already discussed above that
the data is a collection of facts and figures. A raw data may be getting
meaningful or meaning less. In better way we can say that the
data is a quantities, characters, or symbols on which operations are performed
by a computer. When we are going to process the data then they may be getting stored
and transmitted in the form of electrical signals and recorded on magnetic,
optical, or mechanical recording media.
What
is Big Data?
In
general, if we are going to discuss about Big Data, then it is a
collection of raw data that is huge in volume/ quantity? it always growing
exponentially with time. Of course, when the volume of the data get increases
then the complexity associated with the data is also got increases. Big Data is
a collection set of data with huge, large size and complexity that
none of traditional data management tools can be able to store it or process it
efficiently.
Examples
of Big Data:
When we
are going to discuss about the Big Data then we come across several things. We
may take those as a kind of Big Data examples. Here I am going to discuss the
few relevant examples as below.
1.
Social
media applications like Facebook, Twitter, WhatsApp etc. These applications are
used to generate more than 1000+terabytes of fresh new data into the
databases. The statistics say that alone Facebook, every day generates
more than 200 terabytes of data. Similarly, the next is Twitter which generates
more than 170 Terabytes of data every day.
2. Stock exchanges and Big trader’s
companies are also used to add the huge collection of data to the database
every day. The statistics say that alone
New York Stock Exchange generates about one terabyte of
new trade data per day.
3. The another most common example is
Airlines companies which are used to generate near about 20-30 Terabytes of
data at every single day. So now you can imagine that if we are having 10
numbers of such flights then what will be the amount of data will be? Surely it
is in the range of petabytes.
Like above we are
having many more applications too like Instagram, YouTube, skype, Google videos
etc which are going to add huge data to the database every day and every
minutes.
Types of Big Data:
When we are going to analyse the Big Data
then the appearance of the data will be of following types, such as:
- Structured data,
- Unstructured data, and
- Semi-structured data.
Now let us discuss the each category
here.
Structured Data:
1.
A Structure data is always having the
fixed schema to get represent itself.
2.
These data are going to be stored and
process in a systematic and predefined manner.
3.
Whenever we are going to access these
data then their mechanism is always fixed.
4.
As the format is well known in advance
and hence the user doesn’t feel any problem to deals with it.
5.
But if the data size is going to be get
increases up to maximum limit, then the processing is becoming more difficult.
6.
As the Big Data is used to have the huge
collections of variety of data so sometimes structure pattern is bit difficult
to get adopted over here.
Examples:
Let us consider the case of ‘Employee'
table in a database. It is a best example of Structured Data where each data is
going to be represented in their fixed format.
Employee_ID |
Employee_Name |
Gender |
Department |
Salary_In_lacs |
2365 |
Rajesh Kulkarni |
Male |
Finance |
650000 |
3398 |
Pratibha
Joshi |
Female |
Admin |
650000 |
7465 |
Shushil Roy |
Male |
Admin |
500000 |
7500 |
Shubhojit
Das |
Male |
Finance |
500000 |
7699 |
Priya Sane |
Female |
Finance |
550000 |
Unstructured Data:
1.
In contrast to above if we are trying to
corelate then these data don’t have any fixed format for their representation
like the structured data.
2.
Most of the data which we are used to
receive form the social media-based application, Digital, Video, QR Geo-spatial
are unstructured as they have no common format for their representation.
3.
Here the data may be of the any category
and hence very difficult to get manage.
4.
In addition to the size being huge,
un-structured data poses multiple challenges when we need to process them for
deriving value out of it.
5.
Unstructured data is basically collection
of heterogeneous data source which may lead the combination of text files,
images files, and videos files etc.
6.
In present scenario most of the
organizations have their data available in unstructured way and they don't know
how to derive value out of it.
Example:
For better understanding the concept of unstructured data and its relevancy we can take the example of searching performed in google search engine. Here whenever we are trying to find the solution for a particular test data then google search engine will give us all relevant solution which are completely different than others.
Typically, when we are going to
combine the structured data format and unstructured format data in to a single
category then the Big Data concept is get found. The cost of managing the
unstructured data is always very higher than structural data.
Semi-structured Data:
1. As
discussed earlier, that the Semi-structured data is usually contain both the
forms of data ie Structured as well as Unstructured.
2. We
can see semi-structured data as a structured in form but it is actually not
defined with e.g. a table definition in relational DBMS. Example of
semi-structured data is a data represented in an XML file.
Examples:
The most common example of
Semi-Structured data set is Personal data which is going to be get stored in an
XML file. Let us consider the following formed of data written in XML file as
showed below.
<rec><name>Prashant</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish</name><sex>Male</sex><age>29</age></rec>
Characteristics of Big Data:
When we are going to discussed about the Big data then
as the data capacity is very huge here and they are mostly semi-structured so
they can be described by the following characteristics:
- Volume
- Variety
- Velocity
- Variability
Let us consider the following diagram
which will let you know the schematic in better way.
1. Volume
– Usually the name Big Data itself indicates that it is related to a size
which is enormous and very high. Here the Size of data is used to determine the
value out of data. So, when we are going to consider any data as a Big data
then it depends on the nature and the size of the Data.
2. Variety
– The Variety of Big Data is used to refers to
the heterogeneous sources from which we are going to receive the data and the
nature of data, both structured and unstructured. In traditional concept we
aare using the spreadsheets and databases. At that time we were have the only
sources of data considered by most of the applications but Nowadays, the data
pattern and its variety is get different. Now we have the data in the form of
emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being
considered in the analysis applications.
3. Velocity
– The term 'velocity' in Big Data is basically used to refers
to the speed of generation of data. Here the velocity is used to meant that How
fast the data is going to be generated and get processed in a superficial way
so that we can be able to meet the demands of the client and determines the real
potential being associated with the data.
In Big Data mostly the Velocity
relates to the speed factor by using which the massive and continuous data get flows
in from sources like business processes, application logs, networks, and social
media sites, sensors, Mobile devices,
etc.
4. Variability – This refers to the
inconsistency which can be shown by the data at times, thus hampering the
process of being able to handle and manage the data effectively.
Benefits of Big Data Processing:
When we are going to have
the benefits considerations, then in Big Data the Ability to process is
identified. Here the Big Data brings in multiple benefits, such as
1.
The Businesses can be getting utilize
outside intelligence while taking decisions.
2.
Access to social data from search engines
and sites like Facebook, twitter.
3.
It basically enables the organizations to
fine tune for their improved business strategies and Improved customer service.
4.
With the advancement in Big Data
technologies the Traditional customer feedback systems are getting replaced.
5.
Now we are having the new systems get designed
where the Big Data and natural language processing technologies are being used
to read and evaluate consumer responses.
6.
It enables the identification of risk associated
with the product and services in early stages, so failure risk is also getting
decreases up to great extent and we have the better operational efficiency too.
7.
With the advancement in Big Data technologies,
we could able to have the provision for creating a staging area or landing
zone for new data. Especially before identifying their properties and what data
should be moved to the data warehouse.
8.
In addition, such integration of Big Data
technologies and data warehouse helps an organization to offload infrequently
accessed data.
Scope @ N9 IT Solutions:
1. N9
IT Solutions is a
leading IT development and consulting firm providing a broad array of
customized solutions to clients throughout the United States. We established
primarily with an aim to provide consulting and IT services in today’s dynamic
environment.
2.
We
established primarily with an aim to provide consulting and IT services in
today’s dynamic environment.
3.
N9
IT also offers consulting services in Java/J2ee, Cloud Computing, Database
Solutions, DevOps, ERP, Mobility, Big Data, Application Development,
Infrastructure Managed Services, Quality Assurance and Testing.
Achieving your dream goal is our motto. Our excellent team is working restlessly for our employees to click their target. So, believe on us and our advice, and we assured you about your sure success.