Apache Bigtop 1.2.1 has support for OpenJDK 8, and a sandbox feature released back in 2017 that lets you run big data pseudo clusters on Docker.
Bigtop is an Apache Foundation project that you can use for packaging, testing, and configuration of the big name open source big data components that make up the Hadoop infrastructure.
Bigtop supports a wide range of components and projects, including Hadoop, HBase and Spark. The primary goal of Bigtop is to build a community around the packaging, deployment and interoperability testing of Hadoop-related projects.
This includes testing at various levels, including packaging, platform, runtime, and upgrade, and focussing on the system as a whole, rather than individual projects.
While Hadoop is generally used to refer to the central collection of tools, Bigtop looks at the wider selection that makes up the Hadoop-related projects, including Hbase, Pig, Hive MapReduce, Zookeeper and Avro among others. Bigtop packages Hadoop RPMs and DEBs, so that you can manage and maintain your Hadoop cluster, and it provides an integrated smoke testing framework, alongside a suite of 50 test files.
It also helps with virtualization testing, with vagrant recipes, raw images, and (work-in-progress) docker recipes for deploying Hadoop from zero, and you can use Bigtop Provisioner to spin-up a virtual cluster with a single command.
The release of Bigtop adds a Sandbox feature that lets you use Docker to run pseudo clusters. Creating it is a single line command, and for HDFS the process takes around 30 seconds. You then get a local WebUI to play around it. You can run HDFS and Spark standalong, or HDFS, Yarn, Hive and Pig – there are simple instructions on the Bigtop site.
This presentation from DataWorks Summit 2017 on using the Sandbox comes from Apache Bigtop Project Committer and PMC member, Evans Ye:
The new release also includes a faster Docker Provisioner which has been rewritten to fully embrace the Docker ecosystem. The OpenJDK support in the new release means that all the components are now built on JDK8.
The major components have been updated to recent versions, including Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72, and most of the ecosystem projects have also been updated, including Apex, Crunch, Flume, Ignite, Mahout, Oozie, and Phoenix, among others.