Global Information Lookup Global Information

Apache Hadoop information


Apache Hadoop
Original author(s)Doug Cutting, Mike Cafarella
Developer(s)Apache Software Foundation
Initial releaseApril 1, 2006; 17 years ago (2006-04-01)[1]
Stable release
2.10.x2.10.2 / May 31, 2022; 21 months ago (2022-05-31)[2]
3.2.x3.2.4 / July 22, 2022; 20 months ago (2022-07-22)[2]
3.3.x3.3.6 / June 23, 2023; 9 months ago (2023-06-23)[2]
RepositoryHadoop Repository
Written inJava
Operating systemCross-platform
TypeDistributed file system
LicenseApache License 2.0
Websitehadoop.apache.org Edit this at Wikidata

Apache Hadoop ( /həˈdp/) is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation.[vague] It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use.[3] It has since also found use on clusters of higher-end hardware.[4][5] All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.[6]

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality,[7] where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking.[8][9]

The base Apache Hadoop framework is composed of the following modules:

  • Hadoop Common – contains libraries and utilities needed by other Hadoop modules;
  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster;
  • Hadoop YARN – (introduced in 2012) is a platform responsible for managing computing resources in clusters and using them for scheduling users' applications;[10][11]
  • Hadoop MapReduce – an implementation of the MapReduce programming model for large-scale data processing.
  • Hadoop Ozone – (introduced in 2020) An object store for Hadoop

The term Hadoop is often used for both base modules and sub-modules and also the ecosystem,[12] or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Apache Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.[13]

Apache Hadoop's MapReduce and HDFS components were inspired by Google papers on MapReduce and Google File System.[14]

The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts. Though MapReduce Java code is common, any programming language can be used with Hadoop Streaming to implement the map and reduce parts of the user's program.[15] Other projects in the Hadoop ecosystem expose richer user interfaces.

  1. ^ "Hadoop Releases". apache.org. Apache Software Foundation. Retrieved 28 April 2019.
  2. ^ a b c "Apache Hadoop". Retrieved 27 September 2022.
  3. ^ Judge, Peter (22 October 2012). "Doug Cutting: Big Data Is No Bubble". silicon.co.uk. Retrieved 11 March 2018.
  4. ^ Woodie, Alex (12 May 2014). "Why Hadoop on IBM Power". datanami.com. Datanami. Retrieved 11 March 2018.
  5. ^ Hemsoth, Nicole (15 October 2014). "Cray Launches Hadoop into HPC Airspace". hpcwire.com. Retrieved 11 March 2018.
  6. ^ "Welcome to Apache Hadoop!". hadoop.apache.org. Retrieved 25 August 2016.
  7. ^ "What is the Hadoop Distributed File System (HDFS)?". ibm.com. IBM. Retrieved 12 April 2021.
  8. ^ Malak, Michael (19 September 2014). "Data Locality: HPC vs. Hadoop vs. Spark". datascienceassn.org. Data Science Association. Retrieved 30 October 2014.
  9. ^ Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (October 2014). "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems". 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE. pp. 799–808. doi:10.1109/IPDPS.2014.87. ISBN 978-1-4799-3800-1. S2CID 11157612.
  10. ^ "Resource (Apache Hadoop Main 2.5.1 API)". apache.org. Apache Software Foundation. 12 September 2014. Archived from the original on 6 October 2014. Retrieved 30 September 2014.
  11. ^ Murthy, Arun (15 August 2012). "Apache Hadoop YARN – Concepts and Applications". hortonworks.com. Hortonworks. Retrieved 30 September 2014.
  12. ^ "Continuuity Raises $10 Million Series A Round to Ignite Big Data Application Development Within the Hadoop Ecosystem". finance.yahoo.com. Marketwired. 14 November 2012. Retrieved 30 October 2014.
  13. ^ "Hadoop-related projects at". Hadoop.apache.org. Retrieved 17 October 2013.
  14. ^ Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons. 19 December 2014. p. 300. ISBN 9781118876220. Retrieved 29 January 2015.
  15. ^ "[nlpatumd] Adventures with Hadoop and Perl". Mail-archive.com. 2 May 2010. Retrieved 5 April 2013.

and 30 Related for: Apache Hadoop information

Request time (Page generated in 0.8511 seconds.)

Apache Hadoop

Last Update:

Apache Hadoop ( /həˈduːp/) is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving...

Word Count : 5095

Apache Hive

Last Update:

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface...

Word Count : 2300

Apache ZooKeeper

Last Update:

Apache Hadoop Apache Accumulo Apache HBase Apache Hive Apache Kafka Apache Drill Apache Solr Apache Spark Apache NiFi Apache Druid Apache Helix Apache Pinot...

Word Count : 714

Apache Avro

Last Update:

remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes...

Word Count : 1326

Apache Parquet

Last Update:

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other...

Word Count : 740

Apache HBase

Last Update:

Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File System) or Alluxio...

Word Count : 812

Apache Impala

Last Update:

Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala...

Word Count : 577

Apache Mahout

Last Update:

past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark. Mahout also provides Java/Scala...

Word Count : 649

Apache Nutch

Last Update:

have been spun out into their own subproject, called Hadoop. In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject...

Word Count : 625

Apache Spark

Last Update:

testing), Hadoop YARN, Apache Mesos or Kubernetes. For distributed storage, Spark can interface with a wide variety, including Alluxio, Hadoop Distributed...

Word Count : 2729

MapReduce

Last Update:

implementation that has support for distributed shuffles is part of Apache Hadoop. The name MapReduce originally referred to the proprietary Google technology...

Word Count : 5489

List of Apache Software Foundation projects

Last Update:

platforms such as Apache Spark Beam, an uber-API for big data Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem. Bloodhound:...

Word Count : 4536

Apache Pig

Last Update:

Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute...

Word Count : 979

Apache Oozie

Last Update:

Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs. Workflows in Oozie are defined as a collection of control flow and action...

Word Count : 204

MapR

Last Update:

single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management...

Word Count : 526

Apache Cassandra

Last Update:

6, released Apr 12 2010, added support for integrated caching, and Apache Hadoop MapReduce 0.7, released Jan 08 2011, added secondary indexes and online...

Word Count : 2256

Hortonworks

Last Update:

Platform (HDP): based on Apache Hadoop, Apache Hive, Apache Spark Hortonworks DataFlow (HDF): based on Apache NiFi, Apache Storm, Apache Kafka Hortonworks DataPlane...

Word Count : 474

Apache Drill

Last Update:

include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online...

Word Count : 697

Apache Solr

Last Update:

more advanced customization. Apache Solr is developed in an open, collaborative manner by the Apache Solr project at the Apache Software Foundation. In 2004...

Word Count : 1438

Apache Trafodion

Last Update:

Trafodion was a relational database management system that ran on Apache Hadoop, providing support for transactional or operational workloads in a big...

Word Count : 366

Apache ORC

Last Update:

Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink and Apache Hadoop...

Word Count : 222

Apache Yetus

Last Update:

projects. Portions are used by a wide variety of Apache projects, including Apache Hadoop and Apache HBase. It consists of the following components: Precommit...

Word Count : 111

Apache Ambari

Last Update:

The Apache Ambari project intends to simplify the management of Apache Hadoop clusters using a web UI. It also integrates with other existing applications...

Word Count : 106

Apache Kudu

Last Update:

Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks...

Word Count : 323

Open source

Last Update:

Apache Software Foundation, which supports community projects such as the open-source framework Apache Hadoop and the open-source HTTP server Apache HTTP...

Word Count : 11741

Apache Phoenix

Last Update:

Apache Phoenix is an open source, massively parallel, relational database engine supporting OLTP for Hadoop using Apache HBase as its backing store. Phoenix...

Word Count : 306

Cloudera

Last Update:

Hadoop Development". The New York Times. VentureBeat. October 27, 2010. Rao, Leena (7 November 2011). "Ignition, Accel, Greylock Put $40M In Apache Hadoop...

Word Count : 1071

Doug Cutting

Last Update:

are now managed through the Apache Software Foundation. Cutting and Cafarella are also the co-founders of Apache Hadoop. Cutting graduated from Stanford...

Word Count : 688

Apache Accumulo

Last Update:

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache...

Word Count : 586

Oracle NoSQL Database

Last Update:

from OND natively into Hadoop MapReduce jobs. One use for this class is to read NoSQL database records into Oracle Loader for Hadoop. Oracle Big Data SQL...

Word Count : 2000

PDF Search Engine © AllGlobal.net