{"id":6760,"date":"2019-11-22T08:34:00","date_gmt":"2019-11-22T08:34:00","guid":{"rendered":"https:\/\/gtechbooster.com\/?p=6760"},"modified":"2022-11-30T22:01:19","modified_gmt":"2022-11-30T22:01:19","slug":"tiledb-promises-faster-discoveries-for-data-scientists","status":"publish","type":"post","link":"https:\/\/gtechbooster.com\/tiledb-promises-faster-discoveries-for-data-scientists\/","title":{"rendered":"Faster discoveries for Data Scientists"},"content":{"rendered":"\n<p>A new database designed to help data science teams make faster \ndiscoveries by giving them a more powerful way to store, update, \nanalyze, and share large sets of diverse data has been released.<\/p>\n\n\n\n<div class=\"gtech-migrated-from-ad-inserter-placement-2\" style=\"text-align: center;\" id=\"gtech-3870098880\"><div style=\"margin-right: auto;margin-left: auto;text-align: center;\" id=\"gtech-2796278511\"><a data-bid=\"1\" data-no-instant=\"1\" href=\"https:\/\/gtechbooster.com\/linkout\/17207\" rel=\"noopener\" class=\"notrack\" aria-label=\"26001\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/gtechbooster.com\/media\/2023\/01\/26001.jpeg\" alt=\"\"  srcset=\"https:\/\/gtechbooster.com\/media\/2023\/01\/26001.jpeg 1024w, https:\/\/gtechbooster.com\/media\/2023\/01\/26001-768x960.jpeg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" width=\"500\" height=\"625\"  style=\"display: inline-block;\" \/><\/a><\/div><\/div><p>TileDB consists of a new multi-dimensional array data format, a fast,  embeddable, open-source C++ storage engine with data science tooling  integrations, and a cloud service for easy data management and  serverless computations.<\/p>\n\n\n\n<p>The developers say traditional databases aren&#8217;t ideal for data \nscience use as they&#8217;re not cloud-optimized, while cloud object stores \nsuffer from object immutability, eventual consistency, and IO request \nlimiting. A second problem is that some formats lack sufficient support \nfor efficient data updates. They give the example of updating a Parquet \nfile requiring the creation of a new file, pushing the entire update \nlogic to the user\u2019s higher-level application, and say similar problems \narise whenever the update logic is not built into the format and storage\n engine, but it is rather delegated to higher-level applications.<\/p>\n\n\n\n<p>Finally, the developers cite limited scope as a problem, on the basis\n that most data science applications require at least two separate file \nformats to handle both array data and dataframes; multi-dimensional \narrays for uses such as linear algebra; and dataframes for OLAP \noperations.<\/p>\n\n\n\n<p>The team started with the storage layer when creating TileDB, and say\n it has the only format and storage engine that handles both dense and \nsparse multi-dimensional arrays. It supports efficient array IO on \nmultiple storage backends, including cloud object stores like AWS S3. It\n also offers rapid, highly parallel, lock-free, batch updates that are \ndesigned to work particularly well on the cloud with immutable objects. \nAll update logic and functionality (like time traveling) is built into \nthe format and storage engine.<\/p>\n\n\n\n<p>TileDB offers a standalone, embeddable C++ library that ships with \nAPIs in C, C++, Python, R, Java and Go, and has direct access to TileDB \narrays. The library is integrated with Spark, Dask, PrestoDB, MariaDB, \nArrow and geospatial libraries like PDAL, GDAL and Rasterio. TileDB \npushes down as much computation as possible to storage, such as filter \nconditions from the SQL engines and&nbsp; dataframe computations from Dask \nand Spark.<\/p>\n\n\n\n<p>Alongside the database is TileDB Cloud, a pay as you use priced  service that you can use to share TileDB arrays on the cloud with other  users and perform serverless computations on them. Both TileDB and  TileDB Cloud are available to try now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">More Information<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li><a href=\"https:\/\/tiledb.com\/\">TileDB<\/a><\/li><\/ul>\n<div class=\"gtech-end-cont\" id=\"gtech-2479936506\"><div style=\"margin-right: auto;margin-left: auto;text-align: center;\" id=\"gtech-3868316916\"><a data-bid=\"1\" data-no-instant=\"1\" href=\"https:\/\/gtechbooster.com\/linkout\/17207\" rel=\"noopener\" class=\"notrack\" aria-label=\"26001\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/gtechbooster.com\/media\/2023\/01\/26001.jpeg\" alt=\"\"  srcset=\"https:\/\/gtechbooster.com\/media\/2023\/01\/26001.jpeg 1024w, https:\/\/gtechbooster.com\/media\/2023\/01\/26001-768x960.jpeg 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" width=\"500\" height=\"625\"  style=\"display: inline-block;\" \/><\/a><\/div><\/div>","protected":false},"excerpt":{"rendered":"<p>A new database designed to help data science teams make faster discoveries by giving them a more powerful way to store, update, analyze, and share large sets of diverse data has been released. TileDB consists of a new multi-dimensional array data format, a fast, embeddable, open-source C++ storage engine with data science tooling integrations, and [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":6761,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[235,238,6],"class_list":["post-6760","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-features","tag-data-science","tag-database","tag-programming"],"blocksy_meta":{"styles_descriptor":{"styles":{"desktop":"","tablet":"","mobile":""},"google_fonts":[],"version":6}},"_links":{"self":[{"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/posts\/6760","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/comments?post=6760"}],"version-history":[{"count":0,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/posts\/6760\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/media\/6761"}],"wp:attachment":[{"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/media?parent=6760"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/categories?post=6760"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gtechbooster.com\/api-json\/wp\/v2\/tags?post=6760"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}