Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

apache-arrow

apache1.2mApache-2.019.0.0TypeScript support: included

Apache Arrow columnar in-memory format

apache, arrow

readme

Apache Arrow in JS

npm version

Arrow is a set of technologies that enable big data systems to process and transfer data quickly.

Install apache-arrow from NPM

npm install apache-arrow or yarn add apache-arrow

(read about how we package apache-arrow below)

Powering Columnar In-Memory Analytics

Apache Arrow is a columnar memory layout specification for encoding vectors and table-like containers of flat and nested data. The Arrow spec aligns columnar data in memory to minimize cache misses and take advantage of the latest SIMD (Single input multiple data) and GPU operations on modern processors.

Apache Arrow is the emerging standard for large in-memory columnar data (Spark, Pandas, Drill, Graphistry, ...). By standardizing on a common binary interchange format, big data systems can reduce the costs and friction associated with cross-system communication.

Get Started

Check out our API documentation to learn more about how to use Apache Arrow's JS implementation. You can also learn by example by checking out some of the following resources:

Cookbook

Get a table from an Arrow file on disk (in IPC format)

import { readFileSync } from 'fs';
import { tableFromIPC } from 'apache-arrow';

const arrow = readFileSync('simple.arrow');
const table = tableFromIPC(arrow);

console.table(table.toArray());

/*
 foo,  bar,  baz
   1,    1,   aa
null, null, null
   3, null, null
   4,    4,  bbb
   5,    5, cccc
*/

Create a Table when the Arrow file is split across buffers

import { readFileSync } from 'fs';
import { tableFromIPC } from 'apache-arrow';

const table = tableFromIPC([
    'latlong/schema.arrow',
    'latlong/records.arrow'
].map((file) => readFileSync(file)));

console.table([...table]);

/*
        origin_lat,         origin_lon
35.393089294433594,  -97.6007308959961
35.393089294433594,  -97.6007308959961
35.393089294433594,  -97.6007308959961
29.533695220947266, -98.46977996826172
29.533695220947266, -98.46977996826172
*/

Create a Table from JavaScript arrays

import { tableFromArrays } from 'apache-arrow';

const LENGTH = 2000;

const rainAmounts = Float32Array.from(
    { length: LENGTH },
    () => Number((Math.random() * 20).toFixed(1)));

const rainDates = Array.from(
    { length: LENGTH },
    (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));

const rainfall = tableFromArrays({
    precipitation: rainAmounts,
    date: rainDates
});

console.table([...rainfall]);

Load data with fetch

import { tableFromIPC } from "apache-arrow";

const table = await tableFromIPC(fetch("/simple.arrow"));

console.table([...table]);

Vectors look like JS Arrays

You can create vector from JavaScript typed arrays with makeVector and from JavaScript arrays with vectorFromArray. makeVector is a lot faster and does not require a copy.

import { makeVector } from "apache-arrow";

const LENGTH = 2000;

const rainAmounts = Float32Array.from(
    { length: LENGTH },
    () => Number((Math.random() * 20).toFixed(1)));

const vector = makeVector(rainAmounts);

const typed = vector.toArray()

assert(typed instanceof Float32Array);

for (let i = -1, n = vector.length; ++i < n;) {
    assert(vector.get(i) === typed[i]);
}

String vectors

Strings can be encoded as UTF-8 or dictionary encoded UTF-8. Dictionary encoding encodes repeated values more efficiently. You can create a dictionary encoded string conveniently with vectorFromArray or efficiently with makeVector.

import { makeVector, vectorFromArray, Dictionary, Uint8, Utf8 } from "apache-arrow";

const utf8Vector = vectorFromArray(['foo', 'bar', 'baz'], new Utf8);

const dictionaryVector1 = vectorFromArray(
    ['foo', 'bar', 'baz', 'foo', 'bar']
);

const dictionaryVector2 = makeVector({
    data: [0, 1, 2, 0, 1],  // indexes into the dictionary
    dictionary: utf8Vector,
    type: new Dictionary(new Utf8, new Uint8)
});

Getting involved

See DEVELOP.md

Even if you do not plan to contribute to Apache Arrow itself or Arrow integrations in other projects, we'd be happy to have you involved:

We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the github.com/apache/arrow repository.

If you are looking for some ideas on what to contribute, check out the GitHub issues for the Apache Arrow project. Comment on the issue and/or contact dev@arrow.apache.org with your questions and ideas.

If you’d like to report a bug but don’t have time to fix it, you can still post it on GitHub issues, or email the mailing list dev@arrow.apache.org

Packaging

apache-arrow is written in TypeScript, but the project is compiled to multiple JS versions and common module formats.

The base apache-arrow package includes all the compilation targets for convenience, but if you're conscientious about your node_modules footprint, we got you.

The targets are also published under the @apache-arrow namespace:

npm install apache-arrow # <-- combined es2015/CommonJS/ESModules/UMD + esnext/UMD
npm install @apache-arrow/ts # standalone TypeScript package
npm install @apache-arrow/es5-cjs # standalone es5/CommonJS package
npm install @apache-arrow/es5-esm # standalone es5/ESModules package
npm install @apache-arrow/es5-umd # standalone es5/UMD package
npm install @apache-arrow/es2015-cjs # standalone es2015/CommonJS package
npm install @apache-arrow/es2015-esm # standalone es2015/ESModules package
npm install @apache-arrow/es2015-umd # standalone es2015/UMD package
npm install @apache-arrow/esnext-cjs # standalone esNext/CommonJS package
npm install @apache-arrow/esnext-esm # standalone esNext/ESModules package
npm install @apache-arrow/esnext-umd # standalone esNext/UMD package

Why we package like this

The JS community is a diverse group with a varied list of target environments and tool chains. Publishing multiple packages accommodates projects of all stripes.

If you think we missed a compilation target and it's a blocker for adoption, please open an issue.

Supported Browsers and Platforms

The bundles we compile support moderns browser released in the last 5 years. This includes supported versions of Firefox, Chrome, Edge, and Safari. We do not actively support Internet Explorer. Apache Arrow also works on maintained versions of Node.

People

Full list of broader Apache Arrow committers.

  • Brian Hulette, committer
  • Paul Taylor, committer
  • Dominik Moritz, committer

Powered By Apache Arrow in JS

Full list of broader Apache Arrow projects & organizations.

Open Source Projects

  • Apache Arrow -- Parent project for Powering Columnar In-Memory Analytics, including affiliated open source projects
  • Perspective -- Perspective is an interactive analytics and data visualization component well-suited for large and/or streaming datasets. Perspective leverages Arrow C++ compiled to WebAssembly.
  • Falcon is a visualization tool for linked interactions across multiple aggregate visualizations of millions or billions of records.
  • Vega is an ecosystem of tools for interactive visualizations on the web. The Vega team implemented an Arrow loader.
  • Arquero is a library for query processing and transformation of array-backed data tables.
  • OmniSci is a GPU database. Its JavaScript connector returns Arrow dataframes.

License

Apache 2.0

changelog

Apache Arrow 6.0.1 (2021-11-18)

Bug Fixes

  • ARROW-14437 - [Python] Make CSV cancellation test more robust
  • ARROW-14492 - [JS] Fix export for browser bundles
  • ARROW-14513 - [Release][Go] Add /v6 suffix to release-6.0.0
  • ARROW-14519 - [C++] joins segfault when data contains list column
  • ARROW-14523 - [C++] Fix potential data loss in S3 multipart upload
  • ARROW-14538 - [R] Work around empty tr call on Solaris
  • ARROW-14550 - [Doc] Remove the JSON license; a non-free one.
  • ARROW-14583 - [R][C++] Crash when summarizing after filtering to no rows on partitioned data
  • ARROW-14584 - [Python][CI] Python sdist installation fails with latest setuptools 58.5
  • ARROW-14620 - [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior
  • ARROW-14630 - [C++] DCHECK in GroupByNode when error encountered
  • ARROW-14739 - [JS][Docs] Point to wrong source
  • ARROW-15071 - [C#] Fixed a bug in Column.cs ValidateArrayDataTypes method
  • ARROW-15072 - [R] Error: This build of the arrow package does not support Datasets

New Features and Improvements

  • ARROW-13156 - [R] bindings for str_count
  • ARROW-14181 - [C++][Compute] Hash Join support for dictionary
  • ARROW-14189 - [Docs] Add version dropdown to the sphinx docs
  • ARROW-14310 - [R] Make expect_dplyr_equal() more intuitive
  • ARROW-14365 - [R] Update README example to reflect new capabilities
  • ARROW-14390 - [Packaging][Ubuntu] Add support for Ubuntu 21.10
  • ARROW-14433 - [Release][APT] Skip arm64 Ubuntu 21.04 verification
  • ARROW-14450 - [R] Old macos build error
  • ARROW-14459 - [Doc] Update the pinned sphinx version to 4.2
  • ARROW-14480 - [R] Expose arrow::dataset::ExistingDataBehavior to R
  • ARROW-14486 - [Packaging][deb] Add missing libthrift-dev dependency
  • ARROW-14490 - [Doc] Regenerate CHANGELOG.md to include all versions
  • ARROW-14496 - [Docs] Create relative links for R / JS / C/Glib references in the sphinx toctree using stub pages
  • ARROW-14499 - [Docs] Version dropdown side-by-side with search box
  • ARROW-14514 - [C++][R] UBSAN error on round kernel
  • ARROW-14580 - [Python] update trove classifiers to include Python 3.10
  • ARROW-14623 - [Packaging][Java] Upload not only .jar but also .pom
  • ARROW-14628 - [Release][Python] Use python -m pytest
  • ARROW-15058 - [Java] Remove log4j2 dependency in performance module

Apache Arrow 6.0.0 (2021-10-26)

Bug Fixes

  • ARROW-6946 - [Go] Run tests with assert build tag enabled to ensure safety
  • ARROW-8452 - [Go] support proper nested nullable flags
  • ARROW-8453 - [Go][Integration] Support and enable recursive nested type integration tests
  • ARROW-8999 - [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
  • ARROW-9948 - [C++] Fix scale handling in Decimal{128, 256}::FromString
  • ARROW-10213 - [C++] Temporal cast from timestamp to date rounds instead of extracting date component
  • ARROW-10373 - [C++] Validate null_count in Array::ValidateFull()
  • ARROW-10773 - [R] parallel as.data.frame.Table hangs indefinitely on Windows
  • ARROW-11518 - [C++][Parquet] Fix buffer allocation when reading/skipping boolean columns
  • ARROW-11579 - [R] read_feather hanging on Windows
  • ARROW-11634 - [C++][Parquet] Parquet statistics (min/max) for dictionary columns are incorrect
  • ARROW-11729 - [R] Add examples to datasets documentation
  • ARROW-12011 - [C++] Fix crashes and incorrect results when printing extreme date values
  • ARROW-12072 - [Go] Fix panics in ipc writer for sliced records
  • ARROW-12087 - [C++] Allow sorting durations, timestamps with timezones
  • ARROW-12321 - [R][C++] Arrow opens too many files at once when writing a dataset
  • ARROW-12513 - [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls
  • ARROW-12540 - [C++] Implementing casting support from date32/date64 to uft8/large_utf8
  • ARROW-12636 - [JS] ESM Tree-Shaking produces broken code
  • ARROW-12700 - [R] Read/Write_feather stuck forever after bad write, R, Win32
  • ARROW-12837 - [C++] Do not crash when printing invalid arrays
  • ARROW-13134 - [C++][CI] Unpin conda package for aws-sdk-cpp
  • ARROW-13151 - [C++][Parquet] Propagate schema changes from selection all the way up the stack
  • ARROW-13198 - [C++][Dataset] Async scanner occasionally segfaulting in CI
  • ARROW-13293 - [R] open_dataset followed by collect hangs (while compute works)
  • ARROW-13304 - [C++] Unable to install nightly on Ubuntu 21.04 due to day of week options
  • ARROW-13336 - [Doc] Make clean in docs should clean generated docs
  • ARROW-13422 - [R] Clarify README about S3 support on Windows
  • ARROW-13424 - [C++] Remove needless workaround for conda and benchmark
  • ARROW-13425 - [Archery] Avoid importing PyArrow indirectly
  • ARROW-13429 - [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type
  • ARROW-13430 - [Go] fix handling of zero value for FromBigInt
  • ARROW-13436 - [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
  • ARROW-13437 - [C++] Relax FixedSizeList validation to allow excess child values
  • ARROW-13441 - [C++][CSV] Skip empty batches in column decoder
  • ARROW-13443 - [C++] : Fix the incorrect mapping from flatbuf::MetadataVersion to arrow::ipc::MetadataVersion
  • ARROW-13445 - [Java][Packaging] Fix artifact patterns for the Java jars
  • ARROW-13446 - [Release] Fix verification on amazon linux
  • ARROW-13447 - [Release] Verification script for arm64 and universal2 macOS wheels
  • ARROW-13450 - [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
  • ARROW-13469 - [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h
  • ARROW-13474 - [Python] Fix crash in take/filter of empty ExtensionArray
  • ARROW-13477 - [Release] Pass ARTIFACTORY_API_KEY to the upload script
  • ARROW-13484 - [Release] Add support for uploading Amazon Linux 2 packages
  • ARROW-13490 - [R][CI] Need to gate duckdb examples on duckdb version
  • ARROW-13492 - [R][CI] Move r tools 35 build back to per-commit/pre-PR
  • ARROW-13493 - [C++] Anonymous structs in an anonymous union are a GNU extension
  • ARROW-13495 - [C++][Compute] Fixing unaligned memory access in GrouperFastImpl
  • ARROW-13496 - [CI][R] Repair r-sanitizer job
  • ARROW-13497 - [C++][R] FunctionOptions not used by aggregation nodes
  • ARROW-13499 - [R] Aggregation on expression doesn't NSE correctly
  • ARROW-13500 - [C++] Fix using '-Wno-unknown-warning-option' with GCC
  • ARROW-13504 - [Python] Move marks from fixtures to individual tests/params
  • ARROW-13507 - [R] LTO job on CRAN fails
  • ARROW-13509 - [C++] Take kernel with empty inputs
  • ARROW-13522 - [C++] Fix regression in UTF8 trim functions
  • ARROW-13523 - [C++] Normalize test executable name
  • ARROW-13524 - [C++] Fix description for ApplicationVersion::VersionEq
  • ARROW-13529 - [Go] Fixing too many releases in IPC writer
  • ARROW-13538 - [R][CI] Don't test DuckDB in the minimal build
  • ARROW-13543 - [R] Handle summarize() with 0 arguments or no aggregate functions
  • ARROW-13556 - [C++] Add protobuf to linking for flight
  • ARROW-13559 - [CI][C++] Move the test-conda-cpp-valgrind nightly build to azure
  • ARROW-13560 - [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys
  • ARROW-13580 - [C++] quoted_strings_can_be_null only applied to string columns
  • ARROW-13597 - [C++][Compute] Remove AddOnLoad helper
  • ARROW-13600 - [C++] Fix maybe uninitialized warnings
  • ARROW-13602 - [C++] Fix strict aliasing warning in bit util test
  • ARROW-13603 - [GLib] Fix typos in GARROW_VERSION_CHECK()
  • ARROW-13605 - [C++] Capture node with shared_ptr to avoid TSan warning
  • ARROW-13608 - [R] vendor cpp11 to fix segfault under LTO
  • ARROW-13611 - [C++] Scanning datasets does not enforce back pressure
  • ARROW-13624 - [R] readr short type mapping has T and t backwards
  • ARROW-13628 - [Format][C++][Java] Add MONTH_DAY_NANO interval type
  • ARROW-13630 - [CI][C++][s390x] Reduce parallelism to build Arrow library
  • ARROW-13632 - [C++] Fix filtering of sliced FixedSizeList array
  • ARROW-13638 - [C++] Hold owned copy of function options in GroupByNode
  • ARROW-13639 - [C++] Fix out-of-bounds access in Concatenate with null slots and empty dictionary
  • ARROW-13654 - [C++][Parquet] Avoid infinite loop when appending a FileMetaData to itself
  • ARROW-13655 - [C++][Parquet] Disable Thrift message size protections
  • ARROW-13662 - [CI] Fix failing strftime test with older pandas
  • ARROW-13662 - [CI] Failing test test_extract_datetime_components with pandas 0.24
  • ARROW-13669 - [C++] Fix variant emplace methods (add brackets)
  • ARROW-13671 - [Dev] Fix conda recipe on Arm 64k page system
  • ARROW-13676 - [C++][Parquet] Avoid potential invalid access.
  • ARROW-13681 - [C++] Fix list_parent_indices behaviour on chunked array
  • ARROW-13685 - [C++] Cannot write dataset to S3FileSystem if bucket already exists
  • ARROW-13689 - [C#][Integration] Initial commit of C# Integration tests
  • ARROW-13694 - [R] Arrow filter crashes (R aborted session)
  • ARROW-13743 - [CI] OSX job fails due to incompatible git and libcurl
  • ARROW-13744 - [CI] c++14 and 17 nightly job fails
  • ARROW-13747 - [Python][CI] Requiring s3fs >= 2021.8
  • ARROW-13755 - [Python] Allow writing datasets using a partitioning that only specifies field_names
  • ARROW-13761 - [R] arrow::filter() crashes (aborts R session)
  • ARROW-13784 - [Python] Table.from_arrays should raise an error when array is empty but names is not
  • ARROW-13786 - [R][CI] Don't fail the RCHK build if arrow doesn't build
  • ARROW-13788 - [C++] Temporal component extraction functions don't support date32/64
  • ARROW-13792 - [Java] : The toString representation is incorrect for unsigned integer vectors
  • ARROW-13799 - [R] case_when error handling is capturing strings
  • ARROW-13800 - [R] Use divide instead of divide_checked
  • ARROW-13812 - [C++] Fix Valgrind error in Grouper.BooleanKey test
  • ARROW-13814 - [CI] Fix Spark master integration tests
  • ARROW-13819 - [C++] Initialize subseconds in value_parsing.h
  • ARROW-13846 - [C++] Fix crashes on invalid IPC file
  • ARROW-13850 - [C++] Fix crashes on invalid Parquet data
  • ARROW-13860 - [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame
  • ARROW-13865 - [C++][R] Writing moderate-size parquet files of nested dataframes from R slows down/process hangs
  • ARROW-13872 - [Java] ExtensionTypeVector does not work with RangeEqualsVisitor
  • ARROW-13876 - [C++] Add trivial null kernels to arithmetic, sort functions
  • ARROW-13877 - [C++] Support FixedSizeList in generic list kernels
  • ARROW-13878 - [C++] Implement fixed-size-binary support for several kernels
  • ARROW-13880 - [C++] Compute function sort_indices does not support timestamps with time zones
  • ARROW-13881 - [C++][FlightRPC][Packaging] Ensure Flight is packaged with advanced TLS options on Windows
  • ARROW-13882 - [C++] Improve min_max/hash_min_max type support
  • ARROW-13884 - [JS] Move source files into a separate directory
  • ARROW-13912 - [R] TrimOptions implementation breaks test-r-minimal-build due to dependencies
  • ARROW-13913 - [C++] Don't segfault if IndexOptions omitted
  • ARROW-13915 - [R][CI] R UCRT C++ bundles are incomplete
  • ARROW-13916 - [C++] Implement strftime on date32/64 types
  • ARROW-13921 - [Python][Packaging] Pin minimum setuptools version for the macos wheels
  • ARROW-13940 - [R] Turn on multithreading with Arrow engine queries
  • ARROW-13961 - [C++] Fix use of non-const references, declaration without initialization
  • ARROW-13976 - [C++] Add path to libjvm.so in ARM CPU
  • ARROW-13978 - [C++] Bump gtest to 1.11 to unbreak builds with recent clang
  • ARROW-13981 - [Java] VectorSchemaRootAppender doesn't work for BitVector
  • ARROW-13982 - [C++] Don't stall in async scanner if a fragment generates no batches
  • ARROW-13983 - [C++] Avoid raising error if fadvise() isn't supported
  • ARROW-13996 - [Go][Parquet] Fix file offsets in go impl
  • ARROW-13997 - [C++] restore exec node based query performance
  • ARROW-14001 - [Go] Fixing AppendBoolean function in BitmapWriter
  • ARROW-14004 - [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
  • ARROW-14014 - [Java] Fix Flight parseTrailers for :status keys
  • ARROW-14017 - [C++] NULLPTR is not included in type_fwd.h
  • ARROW-14020 - [R] Writing datafames with list columns is slow and scales poorly with nesting level
  • ARROW-14024 - [C++] Test that batch size is respected for IPC/CSV
  • ARROW-14026 - [C++] Enable batch parallelism in Parquet scanner
  • ARROW-14027 - [C++] Handle scalars in Grouper
  • ARROW-14040 - [C++] Fix result order dependence in scanner test
  • ARROW-14053 - [C++][CSV] Use atomic counter for async tests
  • ARROW-14057 - [C++] Bump aws-c-common version
  • ARROW-14063 - [R] open_dataset() does not work on CSVs without header rows
  • ARROW-14076 - Unable to use `red-arrow` gem on Heroku/Ubuntu 20.04 (focal)
  • ARROW-14090 - [C++][Parquet] rows_written_ should be int64_t instead of int
  • ARROW-14103 - [R] [C++] Allow min/max in grouped aggregation
  • ARROW-14109 - [C++] Fix segfault when parsing JSON with duplicate keys.
  • ARROW-14124 - [R] Timezone support in R <= 3.4
  • ARROW-14129 - [C++][Python] Fix unique/value_counts on empty dictionary arrays
  • ARROW-14139 - [IR][C++] Table flatbuffer object fails to compile on older GCCs
  • ARROW-14141 - [IR][C++] Join missing from RelationImpl
  • ARROW-14156 - [C++] Properly synthesize validity buffer in StructArray::Flatten
  • ARROW-14162 - [R] Simple arrange %>% head does not respect ordering
  • ARROW-14173 - [IR] Allow typed null literals to be represented
  • ARROW-14179 - [C++][C] Do not export/import null bitmap for union and null types
  • ARROW-14184 - [C++] allow joins where the keys include new columns on the left
  • ARROW-14192 - [C++][Dataset] Backpressure broken on ordered scans
  • ARROW-14195 - [R] Fix ExecPlan binding annotations
  • ARROW-14197 - [C++][Compute] Fixing wrong buffer size in GrouperFastImpl
  • ARROW-14200 - [R] strftime on a date should not use or be confused by timezones
  • ARROW-14203 - [C++] Fix description of ExecBatch.length for Scalars in aggregate kernels
  • ARROW-14204 - [C++] Fails to compile Arrow without RE2 due to missing ifdef guard in MatchLike
  • ARROW-14206 - [Go][Parquet] Clean up s390x and arm build code
  • ARROW-14206 - [Go][CI] Fix build on s390x and ARM
  • ARROW-14208 - [C++] Fix compilation on Windows
  • ARROW-14210 - [C++] Add AR and RANLIB flags to bzip2
  • ARROW-14211 - [C++][Compute] Fixing thread sanitizer problems in hash join node
  • ARROW-14214 - [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
  • ARROW-14216 - [R] Disable auto-cleaning of duckdb tables
  • ARROW-14219 - [R][CI] DuckDB valgrind failure
  • ARROW-14220 - [C++] Missing ending quote in thirdpartyversions
  • ARROW-14221 - [R][CI] DuckDB tests fail on R < 4.0
  • ARROW-14223 - [C++] add missing third-party dependency
  • ARROW-14224 - [C++] Try to reduce build time/memory usage
  • ARROW-14226 - [R] Handle n_distinct() (and others) with args != 1
  • ARROW-14237 - [R][CI] Disable altrep in R <= 3.5
  • ARROW-14240 - [C++] Fix wrong nlohmann-json header path
  • ARROW-14246 - [C++] Fix wrong find_package() usage in build_google_cloud_cpp_storage()
  • ARROW-14247 - [C++] Fix Valgrind errors in parquet-arrow-test
  • ARROW-14249 - [R] Slow down in dataframe-to-table benchmark
  • ARROW-14252 - [R] Partial matching of arguments warning
  • ARROW-14255 - [Python] Fix FlightClient.do_action
  • ARROW-14257 - [Python][Docs] Fix usage of sync scanner in dataset writing docs
  • ARROW-14260 - [C++] GTest linker error with vcpkg and Visual Studio 2019
  • ARROW-14283 - [CI][C++] Use LLVM 12 on macOS GHA builds
  • ARROW-14285 - [C++] Fix crashes when pretty-printing data from valid IPC file
  • ARROW-14299 - [Dev][CI] Avoid downloading MinIO multiple times
  • ARROW-14300 - [C++][R][CI] Work around missing include in xsimd
  • ARROW-14301 - [C++] use consistent CMAKE_CXX_STANDARD definition
  • ARROW-14302 - [C++] Valgrind errors
  • ARROW-14305 - [C++][Compute] Fixing Valgrind errors in hash join node tests
  • ARROW-14307 - [R] crashes when reading empty feather with POSIXct column
  • ARROW-14313 - [Doc] Make Archery installation docs more accurate
  • ARROW-14321 - [R] segfault converting dictionary ChunkedArray with 0 chunks
  • ARROW-14340 - [C++] Bump xsimd to fix build error on Apple M1
  • ARROW-14370 - [C++] Fix memory leak in SeqMergedGeneratorTestFixture.ErrorItem
  • ARROW-14373 - [Packaging][Java] Missing LLVM dependency in the macOS java-jars build
  • ARROW-14377 - [Packaging][Python] Python 3.9 installation fails in macOS wheel build
  • ARROW-14381 - [CI][Python] Fix Spark integration failures
  • ARROW-14382 - [C++][Compute] Remove duplicated ThreadIndexer definition
  • ARROW-14392 - [C++] Bundled gRPC misses bundled Abseil include path
  • ARROW-14393 - [C++] GTest linking errors during the source release verification
  • ARROW-14397 - [C++] Fix valgrind error in test utility
  • ARROW-14406 - [CI] Skip failing test on dask-master nightly build
  • ARROW-14411 - [Release][Integration] Go integration tests fail for 6.0.0-RC1
  • ARROW-14417 - [R] Joins ignore projection on left dataset
  • ARROW-14423 - [Python] Fix version constraints in pyproject.toml
  • ARROW-14424 - [Packaging][Python] Disable windows wheel testing for python 3.6
  • ARROW-14434 - R crashes when making an empty selection for Datasets with DateTime
  • ARROW-14439 - [Python][C++] Segfault with read_json when a field is missing
  • PARQUET-2067 - [C++][Parquet] Fix Parquet null count stats for enclosing null lists
  • PARQUET-2089 - [C++] Align RowGroup file_offset with specification

New Features and Improvements

  • ARROW-1565 - [C++] Implement TopK/BottomK
  • ARROW-1568 - [C++] Implement Drop Null Kernel for Arrays
  • ARROW-4333 - [C++] Sketch out design for kernels and "query" execution in compute layer
  • ARROW-4700 - [C++] Added support for decimal128 and decimal256 json converted
  • ARROW-5002 - [C++] Implement Hash Aggregation query execution node
  • ARROW-5244 - [C++] Remove experimental marker from some APIs
  • ARROW-6072 - [C++] Implement casting List <-> LargeList
  • ARROW-6607 - [Python] Support for set/list columns when converting from Pandas
  • ARROW-6626 - [Python] Support converting nested sets when converting to arrow
  • ARROW-6870 - [C#] Add Support for Dictionary Arrays and Dictionary Encoding
  • ARROW-7102 - [Python] Make filesystems compatible with fsspec
  • ARROW-7179 - [C++][Python][R] Consolidate coalesce/fill_null
  • ARROW-7901 - [Go][Integration] enable integration tests for null case
  • ARROW-8022 - [C++] Add static and small vector implementations
  • ARROW-8147 - [C++] add GCS library to ThirdpartyToolchain
  • ARROW-8379 - [R] Investigate/fix thread safety issues (esp. Windows)
  • ARROW-8621 - [Release] Add post release step to add tags for Go versioning
  • ARROW-8780 - [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
  • ARROW-8928 - [C++] Add microbenchmarks to help measure ExecBatchIterator overhead
  • ARROW-9226 - [Python] Support core-site.xml default filesystem.
  • ARROW-9434 - [C++] Store type code in UnionScalar
  • ARROW-9719 - [Python] Improve HadoopFileSystem docstring
  • ARROW-10094 - [Python][Doc] Document missing pandas to arrow conversions
  • ARROW-10415 - [R] Support for dplyr::distinct()
  • ARROW-10898 - [C++] Improve table sort performance
  • ARROW-11238 - [Python] Make SubTreeFileSystem print method more informative
  • ARROW-11243 - [C++] Recognize time types in CSV files
  • ARROW-11460 - [R] Use system libraries if present on Linux
  • ARROW-11691 - [Developer][CI] Provide a consolidated .env file for benchmark-relevant environment variables
  • ARROW-11748 - [C++] Ensure Decimal fields are in native endian order
  • ARROW-11828 - [C++] Expose CSVWriter object in api
  • ARROW-11885 - [R] Turn off some capabilities when LIBARROW_MINIMAL=true
  • ARROW-11981 - [C++] Implement Union ExecNode
  • ARROW-12063 - [C++] Add null placement option to sort functions
  • ARROW-12181 - [C++][R] The "CSV dataset" in test-dataset.R is failing on RTools 3.5
  • ARROW-12216 - [R] Proactively disable multithreading on RTools3.5 (32bit?)
  • ARROW-12359 - [C++] Deprecate FileSystem::OpenAppendStream
  • ARROW-12388 - [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva
  • ARROW-12410 - [C++][Gandiva] Implement regexp_replace function on Gandiva
  • ARROW-12479 - [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions
  • ARROW-12563 - [C++][Gandiva] Add add_months and datediff functions for string
  • ARROW-12615 - [C++] Add options for handling NAs to stddev and variance
  • ARROW-12650 - [Doc][Python] Improve documentation regarding dealing with memory mapped files
  • ARROW-12657 - [C++] Adding String hex to numeric conversion
  • ARROW-12669 - [C++][Python] Implement a new scalar function: list_element
  • ARROW-12673 - [C++] Add callback to handle incorrect column counts
  • ARROW-12688 - [R] Use DuckDB to query an Arrow Dataset
  • ARROW-12714 - [C++] String title case kernel
  • ARROW-12725 - [C++][Compute] Column at a time hash and comparison in group by
  • ARROW-12728 - [C++] Implement count_distinct/distinct hash aggregate kernels
  • ARROW-12744 - [C++][Compute] Add rounding kernel
  • ARROW-12759 - [C++][Compute] Add ExecNode for group by
  • ARROW-12763 - [R] Optimize dplyr queries that use head/tail after arrange
  • ARROW-12846 - [Release] Reduce download/upload bandwidth for APT/Yum repositories
  • ARROW-12866 - [C++][Gandiva] Implement STRPOS function on Gandiva
  • ARROW-12871 - [R] upgrade to testthat 3e
  • ARROW-12876 - [R] Fix build flags on Raspberry Pi
  • ARROW-12944 - [C++] String capitalize kernel
  • ARROW-12946 - [C++] String swap case kernel
  • ARROW-12953 - [C++][Compute] Refactor CheckScalar* to take Datum arguments
  • ARROW-12959 - [C++][R] Option for is_null(NaN) to evaluate to true
  • ARROW-12965 - [Java] C Data Interface implementation
  • ARROW-12980 - [C++] Kernels to extract datetime components should be timezone aware
  • ARROW-12981 - [R] Install source package from CRAN alone
  • ARROW-13033 - [C++] Kernel to localize naive timestamps to a timezone (preserving clock-time)
  • ARROW-13056 - [MATLAB] Add a matlab label for dev Pull Requests
  • ARROW-13067 - [C++][Compute] Implement integer to decimal cast
  • ARROW-13089 - [Python] Allow creating RecordBatch from Python dict
  • ARROW-13112 - [R] altrep vectors for strings and other types
  • ARROW-13132 - [C++] Add Scalar validation
  • ARROW-13138 - [C++][R] Implement extract temporal components (year, month, day, etc) from date32/64 types
  • ARROW-13141 - [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
  • ARROW-13163 - [C++][Gandiva] Implement REPEAT function on Gandiva
  • ARROW-13164 - [R] altrep vectors from Array with nulls
  • ARROW-13172 - [Java] Make TYPE_WIDTH publicly accessible
  • ARROW-13174 - [C++][Compute] Add strftime kernel
  • ARROW-13202 - [MATLAB] Enable GitHub Actions CI for MATLAB Interface on Linux
  • ARROW-13218 - [Format] Clarify interpretation of timestamp values
  • ARROW-13220 - [C++] Implement 'choose' function
  • ARROW-13222 - [C++] Improve type support for case_when
  • ARROW-13227 - [Documentation][Compute] Document ExecNode
  • ARROW-13257 - [Java][Dataset] Allow passing empty columns for projection
  • ARROW-13268 - [C++][Compute] Add ExecNode for semi and anti-semi join
  • ARROW-13279 - [R] Use C++ DayOfWeekOptions in wday implementation instead of manually calculating via Expression
  • ARROW-13287 - [C++] [Dataset] FileSystemDataset::Write should use an async scan
  • ARROW-13295 - [C++] add hash_mean, hash_variance, hash_stddev kernels
  • ARROW-13298 - [C++] Implement any/all hash aggregate kernels
  • ARROW-13307 - [C++] Remove reflection-based enums
  • ARROW-13311 - [C++][Documentation] Document hash aggregate kernels
  • ARROW-13317 - [Python] Improve documentation on what 'use_threads' does in 'read_feather'
  • ARROW-13326 - [R][Archery] Add linting to dev CI
  • ARROW-13327 - [C++][Python] Improve consistency of explicit C++ types in PyArrow files
  • ARROW-13330 - [Go][Parquet] Add the rest of the Encoding package
  • ARROW-13344 - [R] Initial bindings for ExecPlan/ExecNode
  • ARROW-13345 - [C++] Add basic implementation for log to base b
  • ARROW-13358 - [C++] Improve type support in if_else
  • ARROW-13379 - [Dev][Docs] Improvements to archery docs
  • ARROW-13390 - [C++] Implement coalesce for remaining types
  • ARROW-13397 - [R] Update arrow.Rmd vignette
  • ARROW-13399 - [R] Update dataset.Rmd vignette
  • ARROW-13402 - [R] Update flight.Rmd vignette
  • ARROW-13403 - [R] Update developing.Rmd vignette
  • ARROW-13404 - [Doc][Python] Improve PyArrow documentation for new users
  • ARROW-13405 - [Doc] Guide users to the documentation for their own platform
  • ARROW-13416 - [C++] Implement mod compute function
  • ARROW-13420 - [JS] Update dependencies
  • ARROW-13421 - [C++][Python] Add CSV convert option to change decimal point
  • ARROW-13433 - [R] Remove CLI hack from Valgrind test
  • ARROW-13434 - [R] group_by() with an unnammed expression
  • ARROW-13435 - [R] Add function arrow_table() as alias for Table$create()
  • ARROW-13444 - [C++] Remove usage of deprecated std::result_of
  • ARROW-13448 - [R] Bindings for strftime
  • ARROW-13453 - [R] DuckDB has not yet released 0.2.8
  • ARROW-13455 - [C++][Docs] Typo in RecordBatch::SetColumn
  • ARROW-13458 - [C++][Docs] Typo in RecordBatch::schema
  • ARROW-13459 - [C++][Docs] Missing param docs for RecordBatch::SetColumn
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • ARROW-13463 - [Release][Python] Verify python 3.8 macOS arm64 wheel
  • ARROW-13465 - [R] to_arrow() from duckdb
  • ARROW-13466 - [R] make installation fail if Arrow C++ dependencies cannot be installed
  • ARROW-13468 - [Release] Fix binary download/upload failures
  • ARROW-13472 - [R] Remove .engine = "duckdb" argument
  • ARROW-13475 - [Release] Don't consider rust tarballs when cleaning up old releases
  • ARROW-13476 - [Doc][Python] Switch ipc/io doc to use context managers
  • ARROW-13478 - [Release] Unnecessary rc-number argument for the version bumping post-release script
  • ARROW-13480 - [C++] Fix possible deadlock when dataset produces an error
  • ARROW-13482 - [C++][Compute] Refactoring away from hard coded ExecNode factories to a registry
  • ARROW-13485 - [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh
  • ARROW-13488 - [Website] Update Linux packages install information for 5.0.0
  • ARROW-13489 - [R] Bump CI jobs after 5.0.0
  • ARROW-13501 - [R] Bindings for count aggregation
  • ARROW-13502 - [R] Bindings for min/max aggregation
  • ARROW-13503 - [GLib][Ruby][Flight] Add support for DoGet
  • ARROW-13506 - [C++][Java] Upgrade ORC to 1.6.9
  • ARROW-13508 - [C++] Support custom retry strategies in S3Options
  • ARROW-13510 - [CI][R][C++] Add -Wall to fedora-clang-devel as-cran checks
  • ARROW-13511 - [CI][R] Fail in the docker build step if R deps don't install
  • ARROW-13516 - [C++] Detect --version-script flag availability
  • ARROW-13519 - [R] Make doc examples less noisy
  • ARROW-13520 - [C++] Implement hash_aggregate tdigest kernel
  • ARROW-13521 - [C++][Docs] Add note about tdigest in compute functions docs
  • ARROW-13525 - [Python] Mention alternative deprecation message for ParquetDataset.partitions
  • ARROW-13528 - [R] Bindings for mean, var, sd aggregation
  • ARROW-13532 - [C++][Compute] - adding set membership type filtering to hash table interface
  • ARROW-13534 - [C++] Improve csv chunker
  • ARROW-13540 - [C++] Add order by sink node
  • ARROW-13541 - [C++][Python] Implement ExtensionScalar
  • ARROW-13542 - [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an ExecPlan to disk
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to ArrowBuf)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to JDBC)
  • ARROW-13544 - [Java] : Remove APIs that have been deprecated for long (Changes to Vectors)
  • ARROW-13548 - [C++] Implement temporal difference kernels
  • ARROW-13549 - [C++] Add casts from timestamp to date/time
  • ARROW-13550 - [R] Support .groups argument to dplyr::summarize()
  • ARROW-13552 - [C++] Remove deprecated APIs
  • ARROW-13557 - [Packaging][Python] Skip test_cancellation test case on M1
  • ARROW-13561 - [C++] Implement week kernel that accepts WeekOptions
  • ARROW-13562 - [R] Styler followups
  • ARROW-13565 - [Packaging][Ubuntu] Drop support for 20.10
  • ARROW-13572 - [C++][Datasets] Add ORC support to Datasets API
  • ARROW-13573 - [C++] Support dictionaries natively in case_when
  • ARROW-13574 - [C++] Add 'count all' option to count kernels
  • ARROW-13575 - [C++] Add hash_product kernel
  • ARROW-13576 - [C++] Replace ExecNode::InputReceived with ::MakeTask
  • ARROW-13577 - [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
  • ARROW-13585 - [GLib] Add support for C ABI interface
  • ARROW-13587 - [R] Handle --use-LTO override
  • ARROW-13595 - [C++] Add debug mode check for compute kernel output type
  • ARROW-13604 - [Java] : Remove deprecation annotations for APIs representing unsupported operations
  • ARROW-13606 - [R] Actually disable LTO
  • ARROW-13613 - [C++] Add decimal support to (hash) sum/mean/product
  • ARROW-13614 - [C++] Add decimal support to min_max/hash_min_max
  • ARROW-13618 - [R] Use Arrow engine for summarize() by default
  • ARROW-13620 - [R] Binding for n_distinct()
  • ARROW-13626 - [R] Bindings for log base b
  • ARROW-13627 - [C++] Fully support ScalarAggregateOptions in (hash) any/all/sum/product/mean
  • ARROW-13629 - [Ruby] Add support for building/converting map
  • ARROW-13633 - [Packaging][Debian] Add support for bookworm
  • ARROW-13634 - [R] Update distro() in nixlibs.R to map from "bookworm" to 12
  • ARROW-13635 - [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
  • ARROW-13637 - [Python] Fix docstrings
  • ARROW-13642 - [C++][Compute] Hash join node supporting all semi, anti, inner, outer join types
  • ARROW-13645 - [Java] : Allow NullVectors to have distinct field names
  • ARROW-13646 - [Go][Parquet] adding the parquet metadata package
  • ARROW-13648 - [Dev] Use #!/usr/bin/env instead of #!/bin where possible
  • ARROW-13650 - [C++] Create dataset writer to encapsulate dataset writer logic
  • ARROW-13651 - [Ruby][Symbol] to Arrow array
  • ARROW-13652 - [Python] Expose copy_files in pyarrow.fs
  • ARROW-13660 - [C++] Remove seq_num from ExecNode::InputReceived
  • ARROW-13670 - [C++] add virtual destructors
  • ARROW-13674 - [CI] PR checks should check for JIRA components
  • ARROW-13675 - [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
  • ARROW-13679 - [GLib][Ruby] Add support for group aggregation
  • ARROW-13680 - [C++] Create an asynchronous nursery to simplify capture logic
  • ARROW-13682 - [C++] Add TDigest API to merge one TDigest
  • ARROW-13684 - [C++][Compute] Strftime kernel follow-up
  • ARROW-13686 - [Python] Update deprecated pytest yield_fixture functions
  • ARROW-13687 - [Ruby] Add support for loading table by Arrow Dataset
  • ARROW-13691 - [C++] Support skip_nulls/min_count in VarianceOptions
  • ARROW-13693 - [Website] arrow-site should pin down a specific Ruby version and leverage toolings like rbenv
  • ARROW-13696 - [Python] Support for MapType with Fields
  • ARROW-13699 - [Python][Docs] Improve filesystem documentation
  • ARROW-13700 - [Docs][C++] Clarify DayOfWeekOptions args
  • ARROW-13702 - [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
  • ARROW-13704 - [C#] Add support for reading streaming format delta dictionaries
  • ARROW-13705 - [Website] Pin node version
  • ARROW-13721 - [Doc][Cookbook] Specifying Schemas - Python
  • ARROW-13733 - [Java] : Allow JDBC adapters to reuse vector schema roots
  • ARROW-13734 - [Format] Clarify allowed values for time types
  • ARROW-13736 - [C++] Reconcile PrettyPrint and StringFormatter
  • ARROW-13737 - [C++] Support for grouped aggregation over scalar columns
  • ARROW-13739 - [R] Support dplyr::count() and tally()
  • ARROW-13740 - [R] summarize() should not eagerly evaluate
  • ARROW-13757 - [R] Fix download of C++ source for CRAN patch releases
  • ARROW-13759 - [C++] Update linting and formatting scripts to specify python3 in shebang line
  • ARROW-13760 - [C++] Bump required Protobuf when using Flight
  • ARROW-13764 - [C++] Support CountOptions in grouped count distinct
  • ARROW-13768 - [R] Allow JSON to be an optional component
  • ARROW-13772 - [R] Binding for median aggregation
  • ARROW-13776 - [C++] Offline thirdparty versions.txt is missing extensions for some files
  • ARROW-13777 - [R] mutate after group_by should be ok as long as there are only scalar functions
  • ARROW-13778 - [R] Handle complex summarize expressions
  • ARROW-13782 - [C++] Add skip_nulls/min_count to tdigest/mode/quantile
  • ARROW-13783 - . [Python] Preview data when printing tables
  • ARROW-13785 - [C++] Add methods to print exec nodes/plans
  • ARROW-13787 - [C++] Verify third-party downloads
  • ARROW-13789 - [Go] Implement Scalar Values for Go
  • ARROW-13793 - [C++] Migrate ORCFileReader to Result<T>
  • ARROW-13794 - [C++] Deprecate PARQUET_VERSION_2_0
  • ARROW-13797 - [C++][Python] Column projection pushdown for ORC dataset reading + use liborc for column selection
  • ARROW-13803 - [C++] Don't read past end of buffer in BitUtil::SetBitmap
  • ARROW-13804 - [Go] Add Interval type Month, Day, Nano
  • ARROW-13806 - [C++][Python] Add support for new MonthDayNano Interval Type
  • ARROW-13809 - [C++][ABI] Add support for MonthDayNanoInterval to C ABI
  • ARROW-13810 - [C++][Compute] Predicate IsAsciiCharacter allows invalid types and values
  • ARROW-13815 - [R] : Adapt to new callstack changes in rlang
  • ARROW-13816 - [Go][C] Implement Consumer APIs for C Data Interface in Go
  • ARROW-13820 - [R] Rename na.min_count to min_count and na.rm to skip_nulls
  • ARROW-13821 - [R] Handle na.rm in sd, var bindings
  • ARROW-13823 - [Java] : Exclude .factorypath
  • ARROW-13824 - [C++][Compute] Make constexpr BooleanToNumber kernel
  • ARROW-13831 - [GLib][Ruby] Add support for writing by Arrow Dataset
  • ARROW-13835 - [Doc][Python] Add documentation for unify_schemas
  • ARROW-13842 - [C++] Bump vendored date library
  • ARROW-13843 - [C++][CI] Exercise ToString / PrettyPrint in fuzzing setup
  • ARROW-13845 - [C++] Reconcile RandomArrayGenerator::ArrayOf implementations
  • ARROW-13847 - [Java] Avoid unnecessary collection copies
  • ARROW-13849 - [C++] Wrap min_max with min/max functions
  • ARROW-13852 - [R] Handle Dataset schema metadata in ExecPlan
  • ARROW-13853 - [R] String to_title, to_lower, to_upper kernels
  • ARROW-13855 - [C++][Python] Implement C data interface support for extension types
  • ARROW-13857 - [R][CI] Remove checkbashisms download
  • ARROW-13859 - [Java] Add code coverage support
  • ARROW-13866 - [R] Implement Options for all compute kernels available via list_compute_functions
  • ARROW-13869 - [R] Implement options for non-bound MatchSubstringOptions kernels
  • ARROW-13871 - [C++] JSON reader can fail if a list array key is present in one chunk but not in a later chunk
  • ARROW-13874 - [R] Implement TrimOptions
  • ARROW-13883 - [Python] Allow more than numpy.array as masks when creating arrays
  • ARROW-13890 - [R] Split up test-dataset.R and test-dplyr.R
  • ARROW-13893 - [R] Make head/tail lazy on datasets and queries
  • ARROW-13897 - [Python] Correct TimestampScalar.as_py() and DurationScalar.as_py() docstrings
  • ARROW-13898 - [C++][Compute] Add support for string binary transforms
  • ARROW-13899 - [Ruby] Implement slicer by compute kernels
  • ARROW-13901 - [R] Implement IndexOptions
  • ARROW-13904 - [R] Implement ModeOptions
  • ARROW-13905 - [R] Implement ReplaceSliceOptions
  • ARROW-13906 - [R] Implement PartitionNthOptions
  • ARROW-13908 - [R] Implement ExtractRegexOptions
  • ARROW-13909 - [GLib] Add tests for GArrowVarianceOptions
  • ARROW-13909 - [GLib] Add GArrowVarianceOptions
  • ARROW-13910 - [Ruby] accepts Range and selectors
  • ARROW-13919 - [GLib] Add GArrowFunctionDoc
  • ARROW-13924 - [R] Bindings for stringr::str_starts, stringr::str_ends, base::startsWith and base::endsWith
  • ARROW-13925 - [R] Remove system installation devdocs jobs
  • ARROW-13927 - [R] Add Karl to the contributors list for the pacakge
  • ARROW-13928 - [R] Rename the version(s) tasks so that it's clearer which is which
  • ARROW-13937 - [C++][Compute] Add explicit output values to sign function and fix unary type checks
  • ARROW-13942 - [Dev] Update cmake_format usage in autotune comment bot
  • ARROW-13944 - [C++] Bump xsimd to latest version
  • ARROW-13958 - [Python] Migrate Python ORC bindings to use new Result-based APIs
  • ARROW-13959 - [R] Update tests for extracting components from date32 objects
  • ARROW-13962 - [R] Catch up on the NEWS
  • ARROW-13963 - [Go] Minor: Add bitmap reader/writer impl from go Parquet module to Arrow Bitutil package
  • ARROW-13964 - MINOR: [Go][Parquet] remove base bitmap reader/writer from parquet module, use arrow bitutil ones
  • ARROW-13965 - [C++] dynamic_casts in parquet TypedColumnWriterImpl impacting performance
  • ARROW-13966 - [C++] Support decimals in comparisons
  • ARROW-13967 - [Go] Implement Concatenate function for array.Interface
  • ARROW-13973 - [C++] Add a SelectKSinkNode
  • ARROW-13974 - [C++] Resolve follow-up reviews for TopK/BottomK
  • ARROW-13975 - [C++] Implement decimal round
  • ARROW-13977 - [Format] clarify leap seconds for interval type
  • ARROW-13979 - [Go] Enable -race for go tests
  • ARROW-13990 - [R] Bindings for round kernels
  • ARROW-13994 - [Doc][C++] Build document misses git submodule update
  • ARROW-13995 - [R] Bindings for join node
  • ARROW-13999 - [C++] Fix bundled LZ4 build on MinGW
  • ARROW-14002 - [Python] Support tuples in unify_schemas
  • ARROW-14003 - [C++][Python] Not providing a sort_key in the "select_k_unstable" kernel crashes
  • ARROW-14005 - [R] Fix tests for PartitionNthOptions so that can run on various platformsFix partition_nth_indices test
  • ARROW-14006 - [C++][Python] Support cast of naive timestamps to strings
  • ARROW-14007 - [C++] Fix compiler warnings in decimal promotion helper
  • ARROW-14008 - [R][Compute] Running an ExecPlan should yield Reader instead of Table
  • ARROW-14009 - [C++] Seed parallellism in SourceNode
  • ARROW-14012 - [Python] Update kernel categories in compute doc to match C++
  • ARROW-14013 - [C++][Docs] Add instructions for Fedora
  • ARROW-14016 - [C++] Wrong type_name used for directory partitioning
  • ARROW-14019 - [R] expect_dplyr_equal() test helper function ignores grouping
  • ARROW-14023 - [Ruby] Arrow::Table#slice accepts Hash
  • ARROW-14025 - [R][C++] PreBuffer is not enabled when scanning parquet via exec nodes
  • ARROW-14030 - [GLib] Use arrow::Result based ORC API
  • ARROW-14031 - [Ruby] Use min and max separately
  • ARROW-14033 - [Ruby] Append OpenSSL's .pc path automatically on macOS with Homebrew
  • ARROW-14033 - [Ruby][Doc] Add macOS development guide for Red Arrow
  • ARROW-14035 - [C++][Python][R] Implement count distinct kernel
  • ARROW-14036 - [R] Binding for n_distinct() with no grouping
  • ARROW-14043 - [Python] Allow unsigned integer index type in dictionary() type factory function
  • ARROW-14044 - [R] Handle group_by .drop parameter in summarize
  • ARROW-14049 - [C++][Java] Upgrade ORC to 1.7.0
  • ARROW-14050 - [C++] Make TDigest/Quantile kernels return nulls instead
  • ARROW-14052 - [C++] Add approximate_median aggregation
  • ARROW-14054 - [C++][Docs] Simplify C++ row conversion example
  • ARROW-14055 - [Docs] Add canonical url to the sphinx docs
  • ARROW-14056 - [Doc][C++] Document ArrayData
  • ARROW-14061 - [Go][C++] Add Cgo Arrow Memory Pool Allocator
  • ARROW-14062 - [Format] Initial arrow-internal specification of compute IR
  • ARROW-14064 - [CI] Use Debian 11
  • ARROW-14069 - [R] By default, filter out hash functions in list_compute_functions()
  • ARROW-14070 - [C++][CI] Remove support for VS2015
  • ARROW-14072 - [GLib][Parquet] Add gparquet_arrow_file_reader_get_n_rows()
  • ARROW-14073 - [C++] Deduplicate sort keys
  • ARROW-14084 - [GLib][Ruby][Dataset] Add support for scanning from directory
  • ARROW-14088 - [GLib][Ruby][Dataset] Add support for filter
  • ARROW-14106 - [Go][C] Implement Exporting to the C Data Interface
  • ARROW-14107 - [R][CI] Parallelize Windows CI jobs
  • ARROW-14111 - [C++] Add extraction function support for time32/time64
  • ARROW-14116 - [C++][Docs] Consistent variable names in WriteCSV example
  • ARROW-14127 - [C++][Docs] Example of using compute function and output
  • ARROW-14128 - [Go] Implement MakeArrayFromScalar for nested types
  • ARROW-14132 - [C++] Improve CSV chunker tests
  • ARROW-14135 - [Python] Missing Python tests for compute kernels
  • ARROW-14140 - [R] skip arrow_binary/arrow_large_binary class from R metadata
  • ARROW-14143 - [IR][C++] Add explicit cast node to IR
  • ARROW-14146 - [Dev] Update merge script to specify python3 in shebang line
  • ARROW-14150 - [C++] Don't check delimiter in CSV chunker if no quoting
  • ARROW-14155 - [Go] add fingerprint and hash functions for types and scalars
  • ARROW-14157 - [C++] Refactor Abseil to its own macro
  • ARROW-14165 - [C++] Improve table sort performance
  • ARROW-14178 - [C++] Boost download location has moved
  • ARROW-14180 - [Packaging] Add support for AlmaLinux 8
  • ARROW-14191 - [C++][Dataset] Dataset writes should respect backpressure
  • ARROW-14194 - [Docs] Improve vertical spacing in the sphinx C++ API docs
  • ARROW-14198 - [Java] Upgrade netty, grpc, and boringssl dependencies
  • ARROW-14207 - [C++] Add missing dependencies for bundled Boost targets
  • ARROW-14212 - [GLib][Ruby] Add GArrowTableConcatenateOptions
  • ARROW-14217 - [Python][CI] Add support for python 3.10
  • ARROW-14222 - [C++] implement GCSFileSystem skeleton
  • ARROW-14228 - [R] Allow for creation of nullable fields
  • ARROW-14230 - [C++] Deprecate ArrayBuilder::Advance
  • ARROW-14232 - [C++] update crc32c to version 1.1.2
  • ARROW-14235 - [C++][Compute] Use a node counter as the label if no label is supplied
  • ARROW-14236 - [C++] Add GCS testbench for testing
  • ARROW-14239 - [R] Don't use rlang::as_label
  • ARROW-14241 - [C++][Java][CI] Fix java-jars build
  • ARROW-14243 - [C++] Split vector_sort.cc
  • ARROW-14244 - [C++] Reduce scalar_temporal.cc compilation time
  • ARROW-14258 - [R] Warn if an SF column is made into a table
  • ARROW-14259 - [R] converting from R vector to Array when the R vector is altrep
  • ARROW-14261 - [C++] Includes should be in alphabetical order
  • ARROW-14269 - [C++] Consolidate utf8 benchmark
  • ARROW-14274 - [C++] Refine base64 api
  • ARROW-14284 - [C++][Python] Improve error message when trying use SyncScanner when requiring async
  • ARROW-14291 - [CI][C++] Add cpp/examples/ files to lint targets
  • ARROW-14295 - [Doc] Indicate location of archery
  • ARROW-14296 - [Go] Update generated flatbuf
  • ARROW-14304 - [R] Update news for 6.0.0
  • ARROW-14309 - [Python] Extend CompressedInputStream to work with paths, strings and files
  • ARROW-14317 - [Doc] Update C data interface implementation status
  • ARROW-14326 - [Docs] Add C/GLib and Ruby to C Data/Stream interface supported libraries
  • ARROW-14327 - [Release] Remove conda-* from packaging group
  • ARROW-14335 - [GLib][Ruby] Add support for expression
  • ARROW-14337 - [C++] Arrow doesn't build on M1 when SIMD acceleration is enabled
  • ARROW-14341 - [C++] Improve decimal benchmark
  • ARROW-14343 - [Packaging][Python] Enable NEON SIMD optimization for M1 wheels
  • ARROW-14345 - [C++] Implement streaming reads
  • ARROW-14348 - [R] add group_vars.RecordBatchReader method
  • ARROW-14349 - [IR] Remove RelBase
  • ARROW-14358 - [Doc] Update CMake options in documentation
  • ARROW-14361 - [C++] Add default simd level
  • ARROW-14364 - [CI][C++] Support LLVM 13
  • ARROW-14368 - [CI] Use ubuntu-latest for Azure Pipelines
  • ARROW-14369 - [C++][Python] Use std::move() explicitly for g++ 4.8.5
  • ARROW-14386 - [Packaging][Java] Ensure using installed devtoolset version
  • ARROW-14387 - [Release][Ruby] Check Homebrew/MSYS2 package version before releasing
  • ARROW-14396 - [R][Doc] Remove relic note in write_dataset that columns cannot be renamed
  • ARROW-14400 - [Go] Equals and ApproxEquals for Tables and Chunked Arrays
  • ARROW-14401 - [C++] Fix bundled crc32c's include path
  • ARROW-14402 - [Release][Yum] Specify gpg path explicitly
  • ARROW-14404 - [Release][APT] Skip arm64 Debian GNU/Linux bookwarm verification
  • ARROW-14408 - [Packaging][Crossbow] Option for skipping artifact pattern validation
  • ARROW-14410 - [Python][Packaging] Use numpy 1.21.3 to build python 3.10 wheels for macOS and windows
  • ARROW-14452 - [Release][JS] Update Javascript testing
  • ARROW-14511 - [Website][Rust] Rust 6.0.0 release blog post
  • PARQUET-490 - [C++][Parquet] Basic support for reading DELTA_BINARY_PACKED data

Apache Arrow 5.0.0 (2021-07-28)

Bug Fixes

  • ARROW-6189 - [Rust] [Parquet] Plain encoded boolean column chunks limited to 2048 values
  • ARROW-6312 - [C++] Add support for "pkg-config --static arrow"
  • ARROW-7948 - [Go] Decimal128 Integration fix
  • ARROW-9594 - [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas
  • ARROW-10910 - [Python] Provide better error message when trying to read from None source
  • ARROW-10958 - [GLib] "Nested data conversions not implemented" through glib, but not through pyarrow
  • ARROW-11077 - [Rust] ParquetFileArrowReader panicks when trying to read nested list
  • ARROW-11146 - [CI] Remove test-conda-python-3.8-jpype build
  • ARROW-11161 - [C++][Python] Add stream metadata
  • ARROW-11633 - [CI][Doc] Maven default skin not found
  • ARROW-11780 - [Python] Avoid crashing when a ChunkedArray is provided to StructArray.from_arrays()
  • ARROW-11908 - [Rust] Intermittent Flight integration test failures
  • ARROW-12007 - [C++] Loading parquet file returns "Invalid UTF8 payload" error
  • ARROW-12055 - [R] is.na() evaluates to FALSE on Arrow NaN values
  • ARROW-12096 - [C++] Allows users to define arrow timestamp unit for Parquet INT96 timestamp
  • ARROW-12122 - [Python] Cannot install via pip M1 mac
  • ARROW-12142 - [Python][Doc] Mention the CXX ABI flag in the docs
  • ARROW-12150 - [Python] Correctly infer type of mixed-precision Decimals
  • ARROW-12232 - [Rust][Datafusion] Error with CAST: Unsupported SQL type Time
  • ARROW-12240 - [Python] Fix invalid-offsetof warning
  • ARROW-12377 - [Doc][Java] Java doc build broken
  • ARROW-12407 - [Python][Dataset] Remove ScanTask bindings
  • ARROW-12431 - [Python] Mask is inverted when creating FixedSizeBinaryArray
  • ARROW-12472 - [Python] Properly convert paths to strings (using fspath)
  • ARROW-12482 - [Doc][C++][Python] Mention CSVStreamingReader pitfalls with type inference
  • ARROW-12491 - [Packaging][RPM] Add support for Amazon Linux 2
  • ARROW-12503 - [C++] Ensure using "lib/" for jemalloc's library directory
  • ARROW-12508 - [R] expect_as_vector implementation causes test failure on R <= 3.3 & variables defined outside of test_that break build when no arrow install
  • ARROW-12543 - [CI][Python] Fix test-conda-python-3.9 build (gdb version conflict)
  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12569 - [R][CI] Run revdep in CI
  • ARROW-12570 - [JS] Fix issues that blocked the v4.0.0 release
  • ARROW-12579 - [Python] Pyarrow 4.0.0 dependency numpy 1.19.4 throws errors on Apple silicon/M1 compilation
  • ARROW-12589 - [C++] Compiling on windows doesn't work when -DARROW_WITH_BACKTRACE=OFF
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12605 - [Documentation] Update line numbers in cpp/dataset.rst
  • ARROW-12606 - [C++][Compute] Fix Quantile and Mode on arrays with offset
  • ARROW-12610 - [C++] Skip TestS3FSGeneric TestDeleteDir and TestDeleteDirContents on Windows as they are flaky
  • ARROW-12611 - [CI][Python] Add different numpy versions to pandas nightly builds
  • ARROW-12613 - [Python] Support comparison to None in Scalar values
  • ARROW-12614 - [C++][Compute] Remove support for Tables in ExecuteScalarExpression
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12620 - [C++][Dataset] Fix projection during writing
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12630 - [Dev][Integration] conda-integration docker build fails
  • ARROW-12639 - [CI][Archery] Archery build fails to create branch
  • ARROW-12640 - [C++] Fix errors from VS 2019 in cpp/src/parquet/types.h
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12644 - [C++][Python][R][Dataset] URL-decode path segments in partitioning
  • ARROW-12646 - [C++][CI][Packaging][Python] Bump vcpkg version to its latest release
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12668 - [C++][Dataset] Fix segfault in CountRows
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12672 - [C++] Fix fill_null kernel to set null_count + cast kernel to handle no-bitmap with unknown null_count case
  • ARROW-12679 - [Java] JDBC->Arrow for NOT NULL columns.
  • ARROW-12684 - [Go][Flight] fix nil pointer dereference, add test.
  • ARROW-12708 - [C++] Valgrind errors when calling negate_checked
  • ARROW-12729 - [R] Fix length method for Table, RecordBatch
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12756 - [C++] MSVC build fails with latest gtest from vcpkg
  • ARROW-12757 - [Archery] Fix spurious warning when running "archery docker run"
  • ARROW-12762 - [Python] Preserve field name when pickling list types
  • ARROW-12769 - [Python] Fix slicing array with "negative" length (start > stop)
  • ARROW-12771 - [C++][Compute] Fix MaybeReserve parameter in the Consume function of GroupedCountImpl
  • ARROW-12772 - [CI] Merge script test fails due to missing dependency
  • ARROW-12773 - [Docs] Clarify Java support for ORC and Parquet via JNI bindings
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12779 - [Python][FlightRPC] Guard against DoGet handler that never sends data
  • ARROW-12780 - [CI][C++] Install necessary packages for MinGW builds
  • ARROW-12790 - [C++] Improve HadoopFileSystem conformance
  • ARROW-12793 - [Python] Fix support for pyarrow debug builds
  • ARROW-12797 - [JS] Update readme with new links and remove outdated examples
  • ARROW-12798 - [JS] Use == null Comparison
  • ARROW-12799 - [JS] Use Nullish Coalescing Operator (??) For Defaults
  • ARROW-12804 - [C++] Return expected result for IsNull and IsValid for NullArray
  • ARROW-12807 - [C++] Fix build errors in IPC reader
  • ARROW-12838 - [Java][Gandiva] Fix JNI CI test
  • ARROW-12842 - [FlightRPC][Java] Fix sending trailers using CallStatus
  • ARROW-12850 - [R] is.nan() evaluates to null on Arrow null values
  • ARROW-12854 - [Dev][Release] Windows wheel verification script fails to download artifacts
  • ARROW-12857 - [C++] Fix build of hash_aggregate_test
  • ARROW-12864 - [C++] Remove needless out argument from arrow::internal::InvertBitmap
  • ARROW-12865 - [C++][FlightRPC] Link gRPC with RE2
  • ARROW-12882 - [C++][Gandiva] Fix behavior of the convert replace function on gandiva
  • ARROW-12887 - [CI] AppVeyor SSL certificate issue
  • ARROW-12906 - [C++][Python] Fix fill_null segfault
  • ARROW-12907 - [Java] Fix memory leak on deserialization errors
  • ARROW-12911 - [Python] Export scalar aggregate options to pc.sum
  • ARROW-12917 - [C++] Fix handling of decimal types with negative scale in C data import
  • ARROW-12918 - [C++] Fill out iterator_traits<ArrayIterator>
  • ARROW-12919 - [Dev][Archery] Crossbow comment bot failing to react to comments
  • ARROW-12935 - [C++][CI] Fix compiler error on some clang versions
  • ARROW-12941 - [C++] Add rows skipped to rows seen
  • ARROW-12942 - [C++][Compute] Fix incorrect result of Arrow compute hash_min_max with a chunked array
  • ARROW-12956 - [C++] Fix crash on Parquet file (OSS-Fuzz)
  • ARROW-12969 - [C++] Fix match_substring with empty haystack
  • ARROW-12974 - [R] test-r-without-arrow build fails because of example requiring Arrow
  • ARROW-12983 - [C++][Python][R] Properly overflow to chunked array in Python-to-Arrow conversion
  • ARROW-12987 - [C++][CI] Switch to bundled utf8proc with version 2.2 in Ubuntu 18.04 images
  • ARROW-12988 - [CI][Python] Revert skip of failing test in kartothek nightly integration build
  • ARROW-12988 - [CI] Skip the failing test in kartothek nightly integration build
  • ARROW-12989 - [CI] Avoid aggressive cancellation of the "Dev PR" workflow
  • ARROW-12991 - [CI] Migrate Travis-CI ARM job to "arm64-graviton2" arch
  • ARROW-12993 - [Python] Avoid half-initialized FeatherReader object
  • ARROW-12995 - [C++] Add validation to CSV options
  • ARROW-12998 - [C++] Add dataset->toolchain dependency
  • ARROW-13001 - [Go][Parquet] fix build failure on s390x
  • ARROW-13003 - [C++] Fix key map unaligned access
  • ARROW-13008 - [C++] Avoid deprecated API in minimal example
  • ARROW-13010 - [C++][Compute] Support outputting to slices from kleene kernels
  • ARROW-13018 - [C++][Docs] Use consistent terminology for nulls (min_count) in scalar aggregate kernels
  • ARROW-13026 - [CI] Use LLVM 10 for s390x
  • ARROW-13037 - [R] Incorrect param when creating Expression crashes R
  • ARROW-13039 - [R] Fix error message handling
  • ARROW-13041 - [C++] Ensure unary kernels zero-initialize data behind null entries
  • ARROW-13046 - [Release] JS package failing test prior to publish
  • ARROW-13048 - [C++] Fix copying objects with special characters on S3FS
  • ARROW-13053 - [Python] Fix build issue with Homebrewed arrow library
  • ARROW-13069 - [Website] Add Daniël to committer list
  • ARROW-13073 - [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'
  • ARROW-13080 - [Release] Generate the API docs in ubuntu 20.10
  • ARROW-13083 - [Python] Wrong SCM version detection both in setup.py and crossbow
  • ARROW-13085 - [Python] Document compatible toolchains for python bindings
  • ARROW-13090 - [Python] Fix create_dir() implementation in FSSpecHandler
  • ARROW-13104 - [C++] Fix unsafe cast in ByteStreamSplit implementation
  • ARROW-13108 - [Python] Pyarrow 4.0.0 crashes upon import on macOS 10.13.6
  • ARROW-13116 - [R] Test for RecordBatchReader to C-interface fails on arrow-r-minimal due to missing dependencies
  • ARROW-13125 - [R] Throw error when 2+ args passed to desc() in arrange()
  • ARROW-13128 - [C#] TimestampArray conversion logic for nano and micro is wrong
  • ARROW-13135 - [C++] Fix Status propagation from Parquet exception
  • ARROW-13139 - [C++] ReadaheadGenerator cannot be safely copied/moved
  • ARROW-13145 - [C++][CI] Flight test crashes on MinGW
  • ARROW-13148 - [Dev][Archery] Fix crossbow job submission
  • ARROW-13153 - [C++] parquet_dataset loses ordering of files in _metadata
  • ARROW-13154 - [C++] Remove the undocumented type_code <= 125 restriction in union types
  • ARROW-13169 - [C++][Compute] Fix array offset support in GrouperFastImpl
  • ARROW-13173 - [C++] TestAsyncUtil.ReadaheadFailed asserts occasionally
  • ARROW-13187 - [Python] Avoid creating reference cycle when reading CSV file
  • ARROW-13189 - [R] Disable row-level metadata application on datasets
  • ARROW-13203 - [R] Fix optional component checks causing failures
  • ARROW-13207 - [Python][Doc] Dataset documentation still suggests deprecated scan method as the preferred iterative approach
  • ARROW-13216 - [R] Type checks test fails with rtools35
  • ARROW-13217 - [C++][Gandiva] Correct error on convert replace function for initial invalid bytes
  • ARROW-13223 - [C++] Fix Thread Sanitizer test failures
  • ARROW-13225 - [Go][FlightRPC][Integration] Implement Flight Custom Middleware and Integration Tests for Go
  • ARROW-13229 - [Python] ascii_trim, ascii_ltrim and ascii_rtrim lack options
  • ARROW-13239 - [Python][Doc] Expose signatures in pyx modules
  • ARROW-13243 - [R] altrep function call in R 3.5
  • ARROW-13246 - [C++] Using CSV skip_rows_after_names can cause data to be discarded prematurely
  • ARROW-13249 - [Java][CI] Consistent timeout in the Java JNI build
  • ARROW-13253 - [FlightRPC][C++] Fix segfault with large messages
  • ARROW-13254 - [Python] Processes killed and semaphore objects leaked when reading pandas data
  • ARROW-13265 - [R] cli valgrind errors in nightlies
  • ARROW-13266 - [JS] Improve benchmark names & include suite name in json
  • ARROW-13281 - [C++][Gandiva] Correct error on timestampDiffMonth function
  • ARROW-13284 - [C++] Fix wrong pkg_check_modules() option name
  • ARROW-13288 - [Python] Missing default values of kernel options in PyArrow
  • ARROW-13290 - [C++] Add missing include
  • ARROW-13305 - [C++] Unable to install nightly on Ubuntu 21.04 due to CSV options
  • ARROW-13315 - [R] Wrap r_task_group includes with ARROW_R_WITH_ARROW checking
  • ARROW-13321 - - [C++][Python] MakeArrayFromScalar doesn't work for FixedSizeBinaryType
  • ARROW-13324 - [R] Typo in bindings for utf8_reverse and ascii_reverse
  • ARROW-13332 - [C++] TSAN failure in TestAsyncUtil.ReadaheadFailed
  • ARROW-13341 - [C++][Compute] Fix race condition in ScalarAggregateNode
  • ARROW-13350 - [Python][CI] Fix test_extract_datetime_components for pandas 0.24
  • ARROW-13352 - [C++] Make sure scalar case_when fully initializes output
  • ARROW-13353 - [Docs] Pin breathe to avoid failure parsing template parameters
  • ARROW-13360 - [C++] Missing dependencies in cpp thirdparty offline dependencies versions.txt
  • ARROW-13363 - [R] is.nan() errors on non-floating point data
  • ARROW-13368 - [C++][Doc] Rename project to make_struct in docs
  • ARROW-13381 - [C++] ArrayFromJSON doesn't work for float value dictionary type
  • ARROW-13382 - [C++] Avoid multiple definitions of same symbol
  • ARROW-13384 - [C++] Specify minimum required zstd version in cmake
  • ARROW-13391 - [CSV] Correct row and column number to error messages with CSV streaming reader
  • ARROW-13417 - [C++] The merged generator can sometimes pull from source sync-reentrant
  • ARROW-13419 - [JS] Fix perf tests
  • ARROW-13428 - [C++][Flight] Add missing -lssl with bundled gRPC and system shared OpenSSL
  • ARROW-13431 - [Release] Bump go version to 1.15; don't verify rust source anymore
  • ARROW-13432 - [Release] Fix ssh connection to the binary uploader container

New Features and Improvements

  • ARROW-2665 - [C++][Python] Add index() kernel
  • ARROW-3014 - [C++] Minimal writer adapter for ORC file format
  • ARROW-3316 - [R] Multi-threaded conversion from R data.frame to Arrow table / record batch
  • ARROW-5385 - [Go] Implement EXTENSION datatype
  • ARROW-5640 - [Go] Implement Arrow Map Array
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-6513 - [CI] Rename conda requirements files to have txt extension instead of yml
  • ARROW-7001 - [C++] Develop threading APIs to accommodate nested parallelism
  • ARROW-7114 - [JS][CI] Enable NodeJS tests for Windows
  • ARROW-7252 - [Rust] [Parquet] Reading UTF-8/JSON/ENUM field results in a lot of vec allocation
  • ARROW-7396 - [Format] Register media types (MIME types) for Apache Arrow formats to IANA
  • ARROW-8421 - [Rust] [Parquet] Implement parquet writer
  • ARROW-8459 - [Dev][Archery] Use a more recent cmake-format
  • ARROW-8527 - [C++][CSV] Add support for ReadOptions::skip_rows >= block_size
  • ARROW-8655 - [C++][Python] Preserve partitioning information for a discovered Dataset
  • ARROW-8676 - [Rust] Create implementation of IPC RecordBatch body buffer compression from ARROW-300
  • ARROW-9054 - [C++] Add ScalarAggregateOptions
  • ARROW-9056 - [C++] Support aggregations over scalars
  • ARROW-9140 - [R] Zero-copy Arrow to R where possible
  • ARROW-9295 - [Archery] Support rust clippy in the lint command
  • ARROW-9299 - [C++][Python] Expose ORC metadata
  • ARROW-9313 - [Rust] Use feature enum
  • ARROW-9421 - [C++][Parquet] Redundancies SchemaManifest::GetFieldIndices
  • ARROW-9430 - [C++] Implement replace_with_mask kernel
  • ARROW-9697 - [C++][Python][R][Dataset] Add CountRows for Scanner
  • ARROW-10031 - [CI][Java] Support Java benchmark in Archery
  • ARROW-10115 - [C++] Add CSV option to treat quoted strings as always non-null
  • ARROW-10316 - [Python] Improve introspection of compute function options
  • ARROW-10391 - [Rust] [Parquet] Nested Arrow reader
  • ARROW-10440 - [C++][Dataset] Visit FileWriters before Finish
  • ARROW-10550 - [Rust] [Parquet] Write nested types (struct, list)
  • ARROW-10557 - [C++] Add scalar string slicing/substring extract kernel
  • ARROW-10640 - [C++] A, "if_else" ("where") kernel to combine two arrays based on a mask
  • ARROW-10658 - [Python][Packaging] Wheel builds for Apple Silicon
  • ARROW-10675 - [C++][Python] Support AWS S3 Web identity credentials
  • ARROW-10797 - [C++] Vendor and use PCG random generator library
  • ARROW-10926 - [Rust] Add parquet reader / writer for decimal types
  • ARROW-10959 - [C++] Add scalar string join kernel
  • ARROW-11061 - [Rust] Validate array properties against schema
  • ARROW-11173 - [Java] Add map type in complex reader / writer
  • ARROW-11199 - [C++][Python] Fix the unit tests for the ORC reader
  • ARROW-11206 - [C++][Compute][Python] Rename 'project' to 'make_struct'
  • ARROW-11342 - [Python][Gandiva] Expose ToString and result type information
  • ARROW-11499 - [Release] Use Artifactory instead of Bintray
  • ARROW-11514 - [R][C++] Bindings for paste(), paste0(), str_c()
  • ARROW-11515 - [R] Bindings for strsplit
  • ARROW-11565 - [C++][Gandiva] Modify upper()/lower() to work with UTF8 and add INIT_CAP function
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11608 - [CI] Fix turbodbc nightly
  • ARROW-11660 - [C++] Move RecordBatch::SelectColumns method from R to C++ library
  • ARROW-11673 - - [C++] Casting dictionary type to use different index type
  • ARROW-11675 - [CI][C++] Resolve ctest failures on VS 2019 builds
  • ARROW-11705 - [R] Support scalar value recycling in RecordBatch/Table$create()
  • ARROW-11759 - [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type
  • ARROW-11769 - [R] Pull groups from grouped_df into RecordBatch or Table
  • ARROW-11772 - [C++] Provide reentrant IPC file reader
  • ARROW-11782 - [GLib][Ruby][Dataset] Remove bindings for internal classes
  • ARROW-11787 - [R] Implement write csv
  • ARROW-11843 - [C++] Provide async Parquet reader
  • ARROW-11849 - [R] Use roxygen @examplesIf
  • ARROW-11889 - [C++] Add parallelism to streaming CSV reader
  • ARROW-11909 - [C++] Remove MakeIteratorGenerator
  • ARROW-11926 - [R] Add ucrt64 binaries and fix CI
  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-11928 - [C++] Execution engine API
  • ARROW-11929 - [C++][Dataset][Compute] Promote expression to the compute namespace
  • ARROW-11930 - [C++][Dataset][Compute] Use an ExecPlan for dataset scans
  • ARROW-11932 - [C++] Provide ArrayBuilder::AppendScalar
  • ARROW-11950 - [C++][Compute] Add unary negative kernel
  • ARROW-11960 - [C++][Gandiva] Support escape in LIKE
  • ARROW-11980 - [Python] Remove experimental status from Table.replace_schema_metadata
  • ARROW-11986 - [C++][Gandiva] Implement IN expressions for doubles and floats
  • ARROW-11990 - [C++][Compute] Handle errors consistently
  • ARROW-12004 - [C++] Resultdetail::Empty is annoying
  • ARROW-12010 - [C++][Compute] Improve performance of the hash table used in GroupIdentifier
  • ARROW-12016 - [C++] Implement array_sort_indices and sort_indices for BOOL type
  • ARROW-12050 - [C++][Python][FlightRPC] Make Flight operations interruptible in Python
  • ARROW-12074 - [C++][Compute] Add scalar arithmetic kernels for decimal
  • ARROW-12083 - [C++][Dataset] Use given column types when determining CSV fragment schema
  • ARROW-12092 - [R] Make expect_dplyr_equal() a bit stricter
  • ARROW-12166 - [C++][Gandiva] Implements CONVERT_TO(value, type) function
  • ARROW-12184 - [R] Bindings for na.fail, na.omit, na.exclude, na.pass
  • ARROW-12185 - [R] Bindings for any, all
  • ARROW-12198 - [R] bindings for strptime
  • ARROW-12199 - [R] bindings for stddev, variance
  • ARROW-12205 - [C++][Gandiva][number][number] seconds) function
  • ARROW-12231 - [C++][Python][Dataset] Isolate one-shot data to scanner
  • ARROW-12253 - [Rust] [Ballista] Implement scalable joins
  • ARROW-12255 - [Rust] [Ballista] Integrate scheduler with DataFusion
  • ARROW-12256 - [Rust] [Ballista] Add DataFrame support
  • ARROW-12257 - [Rust] [Ballista] Publish user guide to Arrow site
  • ARROW-12261 - [Rust] [Ballista] Ballista should not have its own DataFrame API
  • ARROW-12291 - [R] Determine the type of an unevaluated expression
  • ARROW-12310 - [Java] ValueVector#getObject should support covariance for complex types
  • ARROW-12355 - [C++] Implement efficient async CSV scanning
  • ARROW-12362 - [Rust] [DataFusion] topk_query test failure
  • ARROW-12364 - [Python][Dataset] Add metadata_collector option to ds.write_dataset()
  • ARROW-12378 - [C++][Gandiva] Implement castVARBINARY functions
  • ARROW-12386 - [C++] Support file parallelism in AsyncScanner
  • ARROW-12391 - [Rust][DataFusion] Implement date_trunc() function
  • ARROW-12392 - [C++] Restore asynchronous streaming CSV reader
  • ARROW-12393 - [JS] Use closure compiler for all UMD targets
  • ARROW-12403 - [Rust] [Ballista] Integration tests should check that query results are correct
  • ARROW-12415 - [CI][Python] Failed building wheel for pygit2 on ARM64
  • ARROW-12424 - [Go][Parquet] Adding Schema Package for Go Parquet
  • ARROW-12428 - [Python] Expose pre_buffer in pyarrow.parquet
  • ARROW-12434 - [Rust] [Ballista] Show executed plans with metrics
  • ARROW-12442 - [CI] Set job timeouts on GitHub Actions
  • ARROW-12443 - [C++][Gandiva] Implement castVARCHAR function for varbinary input
  • ARROW-12444 - [Rust] Remove rust
  • ARROW-12445 - [Rust] Design and implement packaging process to bundle Rust in signed tar
  • ARROW-12468 - [Python][R] Expose ScannerBuilder::UseAsync to Python & R
  • ARROW-12478 - [C++] Support LLVM 12
  • ARROW-12484 - [CI] Change jinja macros to not require CROSSBOW_TOKEN to upload artifacts in Github Actions
  • ARROW-12489 - [Developer] autotune is broken
  • ARROW-12490 - [Dev] Use only miniforge in verify-release-candidate.sh
  • ARROW-12492 - [Python] Helper method to decode DictionaryArray back to Array
  • ARROW-12496 - [C++][Dataset] Ensure AsyncScanner is covered by all scanner tests
  • ARROW-12499 - [C++][Compute] Add ScalarAggregateOptions to Any and All kernels
  • ARROW-12500 - [C++][Datasets] Ensure better test coverage of Dataset file formats
  • ARROW-12501 - [CI][Ruby] Remove needless workaround for MinGW build
  • ARROW-12507 - [CI] Remove duplicated cron/nightly builds
  • ARROW-12512 - [C++][Python][Dataset] Create CSV writer class and add Datasets support
  • ARROW-12514 - [Release] Don't run Gandiva related Ruby test with ARROW_GANDIVA=OFF
  • ARROW-12517 - [Go][Flight] Expose app metadata in flight client and server
  • ARROW-12518 - [Python] Expose Parquet statistics has_null_count / has_distinct_count
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12522 - [C++] Add ReadRangeCache::WaitFor
  • ARROW-12525 - [JS] Vector toJSON() returns an array
  • ARROW-12527 - [Dev] Don't try getting JIRA information for MINOR PR
  • ARROW-12528 - [JS] Support typed arrays in Table.new
  • ARROW-12530 - [C++] Remove Buffer::mutable_data_
  • ARROW-12533 - [C++] Add random real distribution function
  • ARROW-12534 - [C++][Gandiva] Implement LEFT and RIGHT functions on Gandiva for string input values
  • ARROW-12537 - [JS] Docs build should not include test sources
  • ARROW-12541 - [Docs] Improve styling/readability of tables in the new doc theme
  • ARROW-12551 - [Java][Release] Java post-release tests fail due to missing testing data
  • ARROW-12554 - [C++] Allow duplicates in SetLookupOptions::value_set
  • ARROW-12555 - [Java][Release] Java post-release script misses dataset JNI bindings
  • ARROW-12556 - [C++][Gandiva] Implement BYTESUBSTRING function on Gandiva
  • ARROW-12560 - [C++] Add scheduling option for Future callbacks
  • ARROW-12567 - [C++][Gandiva] Implement ILIKE SQL function
  • ARROW-12567 - [C++][Gandiva] Implement LPAD and RPAD functions for string input values
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12575 - [R] Use unary negative kernel
  • ARROW-12577 - [Website] Use Artifactory instead of Bintray in all places
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12581 - [C++][FlightRPC] Allow benchmarking DoPut with a data file
  • ARROW-12584 - [C++][Python] Expose method for benchmarking tools to release unused memory from the allocators
  • ARROW-12591 - [Java][Gandiva] Create single Gandiva jar for MacOS and Linux
  • ARROW-12593 - [Packaging][Ubuntu] Add support for Ubuntu 21.04
  • ARROW-12597 - [C++] Enable per-row-group parallelism in async Parquet reader
  • ARROW-12598 - [C++][Dataset] Speed up CountRows for CSV
  • ARROW-12599 - [Doc][Python] Documentation missing for pyarrow.Table
  • ARROW-12600 - [CI] Push docker images from crossbow tasks
  • ARROW-12602 - [R] Add BuildInfo from C++ to arrow_info
  • ARROW-12608 - [C++][Python][R] Add split_pattern_regex kernel
  • ARROW-12612 - [C++] Add Expression to type_fwd.h
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12621 - [C++][Gandiva] Add alias to sha1 and sha256 functions
  • ARROW-12631 - [Python] Accept Scanner in pyarrow.dataset.write_dataset
  • ARROW-12643 - [Governance] Added experimental repos guidelines.
  • ARROW-12645 - [Python] Fix numpydoc validation
  • ARROW-12648 - [C++][FlightRPC] Enable TLS for Flight benchmark
  • ARROW-12649 - [Python/Packaging] Move conda-aarch64 to Azure with cross-compilation
  • ARROW-12653 - [Archery] allow me to add a comment to crossbow requests
  • ARROW-12658 - [C++] Bump aws-c-common to v0.5.10
  • ARROW-12660 - [R] Post-4.0 adjustments for CRAN
  • ARROW-12661 - [C++] Add ReaderOptions::skip_rows_after_names
  • ARROW-12662 - [Website] Force to use squash merge
  • ARROW-12667 - [Python] Add a more complete test for strided numpy array conversion
  • ARROW-12675 - [C++] CSV parsing report row on which error occurred
  • ARROW-12677 - [Python] Add a mask argument to pyarrow.StructArray.from_arrays
  • ARROW-12685 - [C++][Compute] Add unary absolute value kernel
  • ARROW-12686 - [C++][Python][FlightRPC] Convert Flight reader into a regular reader
  • ARROW-12687 - [C++][Python][Dataset] Convert Scanner into a RecordBatchReader
  • ARROW-12689 - [R] Implement ArrowArrayStream C interface
  • ARROW-12692 - [R] Improve tests and comments for strsplit() bindings
  • ARROW-12694 - [C++] Fix segfault under RTools35 toolchain
  • ARROW-12696 - [R] Improve testing of error messages converted to warnings
  • ARROW-12699 - [CI][Packaging][Java] Generate a jar compatible with Linux and MacOS for all Arrow components
  • ARROW-12702 - [JS] Update webpack and terser
  • ARROW-12703 - [JS] Separate Table from DataFrame
  • ARROW-12704 - [JS] Support and use optional chaining
  • ARROW-12709 - [C++] Add binary_join_element_wise
  • ARROW-12713 - [C++] String reverse kernel
  • ARROW-12715 - [C++][Python] Add SQL LIKE match kernel
  • ARROW-12716 - [C++] Add string padding kernel
  • ARROW-12717 - [C++][Python] Add find_substring kernel
  • ARROW-12719 - [C++] Allow passing S3 canned ACL as output stream metadata
  • ARROW-12721 - [CI] Fix path for uploading aarch64 conda artifacts from the nightly builds
  • ARROW-12722 - [R] Raise error when attemping to print table with duplicated naming
  • ARROW-12730 - [MATLAB] Update featherreadmex and featherwritemex to build against latest Arrow C++ APIs
  • ARROW-12731 - [R] Use InMemoryDataset for Table/RecordBatch in dplyr code
  • ARROW-12736 - [C++] Eliminate forced copy of potentially large vector<shared_ptr<>>
  • ARROW-12738 - [C++/Python/R] Update conda variant files
  • ARROW-12741 - [CI] Configure Crossbow GitHub Token for Nightly Builds
  • ARROW-12745 - [C++][Compute] Add floor, ceiling, and truncate kernels
  • ARROW-12749 - [C++] Construct RecordBatch/Table/Schema with rvalue arguments
  • ARROW-12750 - [CI][R] Actually pass parameterized docker options to the templates
  • ARROW-12751 - [C++] Implement minimum/maximum kernels
  • ARROW-12758 - [R] Add examples to more function documentation
  • ARROW-12760 - [C++][Python][R] Allow setting I/O thread pool size
  • ARROW-12761 - [R] Better error handling for write_to_raw
  • ARROW-12764 - [CI] Support wildcard expansion when uploading crossbow artifacts
  • ARROW-12777 - [R] Convert all inputs to Arrow objects in match_arrow and is_in
  • ARROW-12781 - [R] Implement is.type() functions for dplyr
  • ARROW-12785 - [CI] the r-devdocs build errors when brew installing gcc
  • ARROW-12791 - [R] Better error handling for DatasetFactory$Finish() when no format specified
  • ARROW-12796 - [JS] Support JSON output from benchmarks
  • ARROW-12800 - [JS] Remove text encoder and decoder polyfills
  • ARROW-12801 - [CI][Packaging][Java] Include all modules in script that generate Arrow jars
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-12808 - [JS] Document browser support
  • ARROW-12810 - [Python] Stop AWS SDK from looking for metadata service
  • ARROW-12812 - [Packaging][Java] Improve JNI jars build
  • ARROW-12824 - [R][CI] Upgrade builds for R 4.1 release
  • ARROW-12827 - [C++] Improve error message for dataset discovery failure
  • ARROW-12829 - [GLib][Ruby] Add support for Apache Arrow Flight
  • ARROW-12831 - [CI][macOS] Remove needless Homebrew workaround
  • ARROW-12832 - [JS] Write benchmarks in TypeScript
  • ARROW-12833 - [JS] Construct perf data in JS
  • ARROW-12835 - [C++][Python][R] Implement case-insensitive match using RE2
  • ARROW-12836 - [C++] Add support for newer IBM i
  • ARROW-12841 - [R] Add examples to more function documentation - part 2
  • ARROW-12843 - [C++][R] Implement is_inf kernel
  • ARROW-12848 - [Release] Fix URLs in vote mail template
  • ARROW-12851 - [Go][Parquet] Add Golang Parquet encoding package
  • ARROW-12856 - [C++][Gandiva] Implement castBIT and castBOOLEAN functions
  • ARROW-12859 - [C++] Add ScalarFromJSON for testing
  • ARROW-12861 - [C++][Compute] Add sign function kernels
  • ARROW-12867 - [R] Bindings for abs()
  • ARROW-12868 - [R] Bindings for find_substring and find_substring_regex
  • ARROW-12869 - [R] Bindings for utf8_reverse and ascii_reverse
  • ARROW-12870 - [R] Bindings for stringr::str_like
  • ARROW-12875 - [JS] Upgrade Jest and other minor updates
  • ARROW-12883 - [R][CI] version compatibility fails on R 4.1
  • ARROW-12891 - [C++] Move subtree pruning to compute
  • ARROW-12894 - [R] Bump R version
  • ARROW-12895 - [CI] Use "concurrency" setting on Github Actions to cancel stale jobs
  • ARROW-12898 - [Release][C#] Fix package upload
  • ARROW-12900 - [Python][Doc] Add missing numpy import
  • ARROW-12901 - [R] Follow on to more examples
  • ARROW-12909 - [R][Release] Build of ubuntu-docs is failing
  • ARROW-12912 - [Website] Use .asf.yaml for publishing
  • ARROW-12915 - [Release] Build of ubuntu-docs is failing on thrift
  • ARROW-12936 - [C++][Gandiva] Implement ASCII Hive function on Gandiva
  • ARROW-12937 - [C++][Python] Allow setting default metadata for new S3 files
  • ARROW-12939 - [R] Simplify RTask stop handling
  • ARROW-12940 - [R] Expose C interface as R6 methods
  • ARROW-12948 - [C++][Python] Add slice_replace kernel
  • ARROW-12949 - [C++] Add starts_with and ends_with
  • ARROW-12950 - [C++] Add count_substring kernel
  • ARROW-12951 - [C++] Reduce generated code size for string kernels
  • ARROW-12952 - [C++] Add count_substring_regex
  • ARROW-12955 - [C++] Add additional type support for if_else kernel
  • ARROW-12957 - [R] rchk issues on cran
  • ARROW-12961 - [Python] Fix MSVC warning building PyArrow
  • ARROW-12962 - [GLib][Ruby] Add Arrow::Scalar
  • ARROW-12964 - [R] Add bindings for ifelse() and if_else()
  • ARROW-12966 - [Python] Expose element_wise_min/max and options in Python
  • ARROW-12967 - [R] Add bindings for pmin() and pmax()
  • ARROW-12968 - [R][CI] Add an rchk job to our nightlies
  • ARROW-12972 - [CI] Fix centos-8 cmake error
  • ARROW-12975 - [C++][Python] if_else kernel doesn't support upcasting
  • ARROW-12982 - [C++] Re-enable unused-variable warning
  • ARROW-12984 - [C++][Compute] Passing options parameter of Count/Index aggregation by reference
  • ARROW-12985 - [Python][Packaging] Unable to install pygit2 in the arm64 wheel builds
  • ARROW-12986 - [C++][Gandiva] Implement new cache eviction policy algorithm in Gandiva
  • ARROW-12992 - [R] bindings for substr(), substring(), str_sub()
  • ARROW-12994 - [R] Fix tests that assume UTC local tz
  • ARROW-12996 - Add bytes_read() to StreamingReader
  • ARROW-13002 - [C++] Add a check for the utf8proc's version in CMake
  • ARROW-13005 - [C++] Add support for take implementation on dense union type
  • ARROW-13006 - [C++][Gandiva] Implement BASE64 and UNBASE64 Hive functions on Gandiva
  • ARROW-13009 - [Doc][Dev] Document builds mailing-list
  • ARROW-13022 - [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions
  • ARROW-13025 - [C++][Python] Add FunctionOptions::Equals/ToString/Serialize
  • ARROW-13027 - [C++] Fix ASAN stack traces in CI
  • ARROW-13030 - [CI][Go] Setup Arm64 golang CI
  • ARROW-13031 - [JS] Support arm in closure compiler on macOS
  • ARROW-13032 - [Java] Update guava version
  • ARROW-13034 - [Python][Docs] Update the cloud examples on the Parquet doc page
  • ARROW-13036 - [Doc] Mention recommended file extension(s) for Arrow IPC
  • ARROW-13042 - [C++] Check that kernel output is fully initialized
  • ARROW-13043 - [GLib][Ruby] Add GArrowEqualOptions
  • ARROW-13044 - [Java] Change UnionVector and DenseUnionVector to extend AbstractContainerVector
  • ARROW-13045 - [Packaging][RPM][deb] Don't install system utf8proc if it's old
  • ARROW-13047 - [Website] Add kiszk to committer list
  • ARROW-13049 - [C++][Gandiva] Implement BIN Hive function on Gandiva
  • ARROW-13050 - [C++][Gandiva] Implement SPACE Hive function on Gandiva
  • ARROW-13054 - [C++] Add option to specify the first day of the week for the "day_of_week" temporal kernel
  • ARROW-13064 - [C++] Implement select ('case when') function for fixed-width types
  • ARROW-13065 - [Packaging][RPM] Add missing required LZ4 version information
  • ARROW-13068 - [GLib][Dataset] Change prefix to gdataset_ from gad_
  • ARROW-13070 - [R] bindings for sd and var
  • ARROW-13072 - [C++] Add bit-wise arithmetic kernels
  • ARROW-13074 - [Python] Deprecate ParquetDataset custom properties (eg pieces, partitions)
  • ARROW-13075 - [Python] Expose C data interface API for pyarrow.Field
  • ARROW-13076 - [Java] Allow ExtensionTypeVector with Struct or Union vector storage
  • ARROW-13082 - [CI] Forward R argument to ubuntu-docs build
  • ARROW-13086 - [Python] De-duplicate time unit conversion code
  • ARROW-13086 - [Python] Expose Parquet ArrowReaderProperties::coerce_int96_timestamp_unit_
  • ARROW-13091 - [Python] Add compression_level argument to IpcWriteOptions constructor
  • ARROW-13092 - [C++] Return an error in CreateDir if target is a file
  • ARROW-13095 - [C++] Implement trig compute functions
  • ARROW-13096 - [C++] Implement logarithm compute functions
  • ARROW-13097 - [C++] Provide simple reflection utility
  • ARROW-13098 - [Dev][Archery] Reorganize docker submodule to its own subpackage
  • ARROW-13100 - [MATLAB] Integrate GoogleTest with MATLAB Interface C++ Code
  • ARROW-13101 - [Python][Doc] pyarrow.FixedSizeListArray does not appear in the documentation
  • ARROW-13110 - [C++] Deadlock can happen when using BackgroundGenerator without transferring callbacks
  • ARROW-13113 - [R] use RTasks to manage parallel in converting arrow to R
  • ARROW-13117 - [R] Retain schema in new Expressions
  • ARROW-13119 - [R] Set empty schema in scalar Expressions
  • ARROW-13124 - [Ruby] Add support for memory view
  • ARROW-13127 - [R] Valgrind nightly errors
  • ARROW-13136 - [C++] Add coalesce function
  • ARROW-13137 - [C++][Documentation] Make in-table references consistent
  • ARROW-13140 - [C++/Python] Upgrade libthrift pin in the nightlies
  • ARROW-13142 - [Python] Use vector append when converting from list of non-strided numpy arrays
  • ARROW-13147 - [Java] Respect the rounding policy when allocating vector buffers
  • ARROW-13157 - [C++][Python] Add find_substring_regex kernel and implement ignore_case for find_substring
  • ARROW-13158 - [Python] Fix StructScalar contains and repr with duplicate field names
  • ARROW-13162 - [C++][Gandiva] Add new alias for extract date functions in registry
  • ARROW-13171 - [R] Add binding for str_pad()
  • ARROW-13190 - [C++][Gandiva] Change behavior of INITCAP function
  • ARROW-13194 - [Java][Document] Create prose document about Java algorithms
  • ARROW-13195 - [R] Problem with rlang reverse dependency checks
  • ARROW-13199 - [R] add ubuntu 21.04 to nightly builds
  • ARROW-13200 - [R] Add binding for case_when()
  • ARROW-13201 - [R] Add binding for coalesce()
  • ARROW-13210 - [Python][CI] Fix vcpkg caching mechanism for the macOS wheels
  • ARROW-13211 - [C++][CI] Remove outdated Github Actions ARM builds
  • ARROW-13212 - [Release] Support deploying to test PyPI in the python post release script
  • ARROW-13215 - [R][CI] Add ENV TZ to docker files
  • ARROW-13218 - [Doc] Document/clarify conventions for timestamp storage
  • ARROW-13219 - [C++][GLib] Demote/deprecate CompareOptions
  • ARROW-13224 - [Python][Doc] Documentation missing for pyarrow.dataset.write_dataset
  • ARROW-13226 - [Python] Add a general purpose cython trampolining utility
  • ARROW-13228 - [C++] S3 CreateBucket fails because AWS treats us-east-1 differently than other regions
  • ARROW-13230 - [Docs][Python] Add CSV writer docs
  • ARROW-13234 - [C++] Put extra padding spaces on the right
  • ARROW-13235 - [C++][Python] Simplify mapping of function options
  • ARROW-13236 - [Python] Include options class name in repr
  • ARROW-13238 - [C++][Compute][Dataset] Use an ExecPlan for dataset scans
  • ARROW-13242 - [C++] Improve random generation of decimal arrays
  • ARROW-13244 - [C++] Add facility to get current thread id as uint64
  • ARROW-13258 - [Python] Improve the repr of ParquetFileFragment
  • ARROW-13262 - [R] transmute() fails after pulling data into R
  • ARROW-13273 - [C++] Don't use .pc only in CMake paths for Requires.private
  • ARROW-13274 - [JS] Remove Webpack
  • ARROW-13275 - [JS] : Fix perf tests
  • ARROW-13276 - [GLib][Ruby][Flight] Add support for ListFlights
  • ARROW-13277 - [JS] Add declaration maps for TypeScript and refactor testing infrastructure
  • ARROW-13280 - [R] Bindings for log and trig functions
  • ARROW-13282 - [C++] Remove obsolete generated files
  • ARROW-13283 - [Archery][Dev] Support passing CPU/memory limits to Docker
  • ARROW-13286 - [CI] Require docker-compose 1.27.0 or later
  • ARROW-13289 - [C++] Accept integer args in trig/log functions via promotion to double
  • ARROW-13291 - [GLib][CI] Require gobject-introspection 3.4.5 or later
  • ARROW-13296 - [C++] Provide a reflection compatible enum replacement
  • ARROW-13299 - [JS] Upgrade ix and rxjs
  • ARROW-13303 - [JS] Revise bundles
  • ARROW-13306 - [Java][JDBC] use ResultSetMetaData.getColumnLabel instead of ResultSetMetaData.getColumnName
  • ARROW-13313 - [C++][Compute] Add scalar aggregate node
  • ARROW-13320 - [Website] Add MIME types to FAQ
  • ARROW-13323 - [Archery] Validate docker compose configuration
  • ARROW-13343 - [R] Update NEWS.md for 5.0
  • ARROW-13346 - [C++] Remove compile time parsing from EnumType
  • ARROW-13355 - [R] ensure that sf is installed in our revdep job
  • ARROW-13357 - [R] bindings for sign()
  • ARROW-13365 - [R] bindings for floor/ceiling/truncate
  • ARROW-13385 - [C++] Demonstrate registering compute functions out-of-tree
  • ARROW-13386 - [R][C++] CSV streaming changes break Rtools 35 32-bit build
  • ARROW-13418 - [R] typo in python.r
  • ARROW-13461 - [Python][Packaging] Build M1 wheels for python 3.8
  • PARQUET-1798 - [C++] Review logic around automatic assignment of field_id's
  • PARQUET-1998 - [C++] Implement LZ4_RAW compression
  • PARQUET-2056 - [C++] Add ability for retrieving dictionary and indices separately for ColumnReader

Apache Arrow 4.0.1 (2021-05-26)

Bug Fixes

  • ARROW-12568 - [C++][Compute] Fix nullptr deference when array contains no nulls
  • ARROW-12601 - [R][Packaging] Fix pkg-config check in r/configure
  • ARROW-12603 - [C++][Dataset] Backport fix for specifying CSV column types (#10344)
  • ARROW-12604 - [R][Packaging] Dataset, Parquet off in autobrew and CRAN Mac builds
  • ARROW-12617 - [Python] Align orc.write_table keyword order with parquet.write_table
  • ARROW-12622 - [Python] Fix segfault in read_csv when not on main thread
  • ARROW-12642 - [R] LIBARROW_MINIMAL, LIBARROW_DOWNLOAD, NOT_CRAN env vars should not be case-sensitive
  • ARROW-12663 - [C++] Fix a cuda 11.2 compiler segfault
  • ARROW-12670 - [C++] Fix extract_regex output after non-matching values
  • ARROW-12746 - [Go][Flight] append instead of overwriting outgoing metadata
  • ARROW-12769 - [Python] Fix slicing array with "negative" length (start > stop)
  • ARROW-12774 - [C++][Compute] replace_substring_regex() creates invalid arrays => crash
  • ARROW-12776 - [Archery][Integration] Fix decimal case generation in write_js_test_json
  • ARROW-12855 - error: no member named 'TableReader' in namespace during compilation

New Features and Improvements

  • ARROW-11926 - [R] preparations for ucrt toolchains
  • ARROW-12520 - [R] Minor docs updates
  • ARROW-12571 - [R][CI] Run nightly R with valgrind
  • ARROW-12578 - [JS] Remove Buffer in favor of TextEncoder API to support bundlers such as Rollup
  • ARROW-12619 - [Python] pyarrow sdist should not require git
  • ARROW-12806 - [Python] test_write_to_dataset_filesystem missing a dataset mark
  • ARROW-13533 - Buy Yellow Xanax Bars R039 | Buy Yellow Xanax Bars 2mg Online With Creditcard

Apache Arrow 4.0.0 (2021-04-26)

Bug Fixes

  • ARROW-4784 - [C++][CI] Re-enable flaky mingw tests.
  • ARROW-6818 - [DOC] Remove reference to Apache Drill design docs
  • ARROW-7288 - [C++][Parquet] Don't use regular expression to parse application version
  • ARROW-7830 - [C++][Parquet] Use Arrow version number for parquet
  • ARROW-9451 - [Python] Refuse implicit cast of str to unsigned integer
  • ARROW-9634 - [C++][Python] Restore non-UTC time zones when reading Parquet file that was previously Arrow
  • ARROW-9878 - [Python] Document caveats of to_pandas(self_destruct=True)
  • ARROW-10038 - [C++] Spawn thread pool threads lazily
  • ARROW-10056 - [C++] Increase flatbuffers max_tables parameter in order to read wide tables
  • ARROW-10364 - [Dev][Archery] Add support for semver 2.13.0
  • ARROW-10370 - [Python] Clean-up filesystem handling in write_dataset
  • ARROW-10403 - [C++] Implement unique kernel for non-uniform chunked dictionary arrays
  • ARROW-10405 - [C++] IsIn kernel should be able to lookup dictionary in string
  • ARROW-10457 - [CI] Fix Spark integration tests with branch-3.0
  • ARROW-10489 - [C++] Add Intel C++ compiler options for different warning levels
  • ARROW-10514 - [C++][Parquet] Make the column name the same for both output formats of parquet reader
  • ARROW-10953 - [R] Validate when creating Table with schema
  • ARROW-11066 - [FlightRPC][Java] Make zero-copy writes a configurable option
  • ARROW-11066 - [FlightRPC][Java] Revert "fix zero-copy optimization"
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11066 - Revert "ARROW-11066: [Java][FlightRPC] fix zero-copy opt…
  • ARROW-11066 - [Java][FlightRPC] fix zero-copy optimization
  • ARROW-11134 - [CI][C++] Always run tests on Travis-CI
  • ARROW-11147 - [CI][Python] Remove pandas=0.25.3 pin for dask-latest
  • ARROW-11180 - [Developer] cmake-format pre-commit hook doesn't run
  • ARROW-11192 - [Documentation] Describe opening Visual Studio so it inherits a working env
  • ARROW-11223 - [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overflow
  • ARROW-11235 - [Python] Fix test failure inside non-default S3 region
  • ARROW-11239 - [Rust] Fixed equality with offsets and nulls
  • ARROW-11269 - [Rust][Parquet] Preserve timezone in int96 reader
  • ARROW-11277 - [C++] Workaround macOS 10.11: don't default construct consts
  • ARROW-11299 - [Python] Fix invalid-offsetof warnings
  • ARROW-11303 - [Release][C++] Enable mimalloc in the windows verification script
  • ARROW-11305 - Skip first argument (which is the program name) in parquet-rowcount binary
  • ARROW-11311 - [Rust] Fixed unset_bit
  • ARROW-11313 - [Rust] Fixed size_hint
  • ARROW-11315 - [Packaging][APT][arm64] Add missing gir1.2 files
  • ARROW-11320 - [C++] Try to strengthen temporary dir creation
  • ARROW-11322 - [Rust] Re-opening memory module as public
  • ARROW-11323 - [Rust][DataFusion] Allow sort queries to return no results
  • ARROW-11328 - [R] Collecting zero columns from a dataset returns entire dataset
  • ARROW-11334 - [Python][CI] Fix failing pandas nightly tests
  • ARROW-11337 - [C++] Compilation error with ThreadSanitizer
  • ARROW-11357 - [Rust] : Fix out-of-bounds reads in take and other undefined behavior
  • ARROW-11376 - [C++] ThreadedTaskGroup failure with Thread Sanitizer enabled
  • ARROW-11379 - [C++][Dataset] Better formatting for timestamp scalars
  • ARROW-11387 - [Rust] fix build for conditional compilation of features 'simd + avx512'
  • ARROW-11391 - [C++] Allow writing more than 2 GB to HDFS
  • ARROW-11394 - [Rust] Tests for Slice & Concat
  • ARROW-11400 - [Python] Ensure pickling Dataset with dictionary partitions works
  • ARROW-11403 - [Developer] archery benchmark list: unexpected keyword 'benchmark_filter'
  • ARROW-11412 - [Python][Dataset] Disallow logical operators for Expression
  • ARROW-11412 - [Python] Improve Expression docs
  • ARROW-11427 - [C++] On Windows, only use AVX512 when enabled by the OS
  • ARROW-11448 - [C++] Fix tdigest build failure on Windows with Visual Studio
  • ARROW-11451 - [C++] Fix gcc-4.8 build errors
  • ARROW-11452 - [Rust] Fix issue with Parquet Arrow reader not following type path
  • ARROW-11461 - [Go][Flight] Some cleanup for flight, Fix Schema bytes
  • ARROW-11464 - [Python] Fix parquet.read_pandas to support all keywords of read_table
  • ARROW-11470 - [C++] Detect overflow on computation of tensor strides
  • ARROW-11472 - [Python][CI] Remove temporary pin of numpy in kartothek integration build
  • ARROW-11472 - [Python][CI] Temporary pin numpy on kartothek integration builds
  • ARROW-11480 - [Python] Test filtering on INT96 timestamps
  • ARROW-11483 - [C++] Write integration JSON files compatible with the Java reader
  • ARROW-11488 - [Rust] Don't leak memory in StructBuilder
  • ARROW-11490 - [C++] BM_ArrowBinaryDict/EncodeLowLevel is not deterministic
  • ARROW-11494 - [Rust] fix take bench
  • ARROW-11497 - [Python] Provide parquet enable compliant nested type flag for python binding
  • ARROW-11538 - [Python] Segfault reading Parquet dataset with Timestamp filter
  • ARROW-11547 - [Packaging][Conda][Drone] Fix undefined variable error
  • ARROW-11548 - [C++] Fix RandomArrayGenerator::List
  • ARROW-11551 - [C++][Gandiva] Fix castTimestamp(utf8) function
  • ARROW-11560 - [C++][FlightRPC] fix mutex error on SIGINT
  • ARROW-11567 - [C++][Compute] Improve variance kernel precision
  • ARROW-11577 - [Rust] Fix Array transform on strings
  • ARROW-11582 - [R] write_dataset 'format' argument default and validation could be better
  • ARROW-11586 - [Rust][Datafusion] Remove force unwrap
  • ARROW-11595 - [C++][NIGHTLY:test-conda-cpp-valgrind] Avoid branching on potentially indeterminate values in GenerateBitsUnrolled
  • ARROW-11596 - [Python][Dataset] make ScanTask.execute() eager
  • ARROW-11603 - [Rust] Fix Clippy Lints for Rust 1.50
  • ARROW-11607 - [C++][Parquet] Update values_capacity_ when resetting.
  • ARROW-11614 - Fix round() logic to return positive zero when argument is zero
  • ARROW-11617 - [C++][Gandiva] Fix nested if-else optimisation in gandiva
  • ARROW-11620 - [Rust][DataFusion] Consistently use Arc<dyn TableProvider> rather than Box and Arc
  • ARROW-11630 - [Rust] Introduce limit option for sort kernel
  • ARROW-11632 - [Rust] Make csv::Reader propagate schema metadata to generated RecordBatches
  • ARROW-11639 - [C++][Gandiva] Fix signbit compilation issue in Ubuntu nightly build
  • ARROW-11642 - [C++] Fix preprocessor directive for Windows in JVM detection
  • ARROW-11657 - [R] group_by with .drop specified errors
  • ARROW-11658 - [R] Handle mutate/rename inside group_by
  • ARROW-11663 - [Rust][DataFusion] Fixed error.
  • ARROW-11668 - [C++] Sporadic UBSAN error in FutureStessTest.TryAddCallback
  • ARROW-11672 - [R] Fix string function test failure on R 3.3
  • ARROW-11681 - [Rust] Don't unwrap in IPC writers
  • ARROW-11686 - [C++] Call ArrowLog::InstallFailureSignalHandler to show stack trace
  • ARROW-11687 - [Rust][DataFusion] RepartitionExec Hanging
  • ARROW-11694 - [C++] Fix Take() with no validity bitmap but unknown null count
  • ARROW-11695 - [C++][FlightRPC] fix option to disable TLS verification
  • ARROW-11717 - [Integration] Fix intermittent flight integration failures with rust
  • ARROW-11718 - [Rust] Don't write IPC footers on drop
  • ARROW-11741 - [C++] Fix decimal casts on big endian platforms
  • ARROW-11743 - [R] Use pkgdown's new found ability to autolink Jiras
  • ARROW-11746 - [Developer][Archery] Fix prefer real time check
  • ARROW-11756 - [R] passing a partition as a schema leads to segfaults
  • ARROW-11758 - [C++][Compute] Improve summation kernel percision
  • ARROW-11767 - [C++] Scalar::Hash may segfault
  • ARROW-11771 - [Developer][Archery] Move benchmark tests (so CI runs them)
  • ARROW-11781 - [Python] Reading small amount of files from a partitioned dataset is unexpectedly slow
  • ARROW-11784 - [Rust][DataFusion] CoalesceBatchesStream doesn't honor Stream interface
  • ARROW-11785 - [R] Fallback when filtering Table with unsupported expression fails
  • ARROW-11786 - [C++] Remove noisy CMake message
  • ARROW-11788 - [Java] Fix appending empty delta vectors
  • ARROW-11791 - [Rust][DataFusion] Fix RepartitionExec Blocking
  • ARROW-11802 - [Rust][DataFusion] Remove use of crossbeam channels to avoid potential deadlocks
  • ARROW-11819 - [Rust] Add link to the doc
  • ARROW-11821 - [Rust] Edit Rust README
  • ARROW-11830 - [C++] Don't re-detect gRPC every time
  • ARROW-11832 - [R] Handle conversion of extra nested struct column
  • ARROW-11836 - [C++] Avoid requiring arrow_bundled_dependencies when it doesn't exist for arrow_static.
  • ARROW-11845 - [Rust] Implement to_isize() for ArrowNativeTypes
  • ARROW-11850 - [GLib] Add GARROW_VERSION_0_16
  • ARROW-11855 - [C++][Python] Memory leak in to_pandas when converting chunked struct array
  • ARROW-11857 - [Python] Resource temporarily unavailable when using the new Dataset API with Pandas
  • ARROW-11860 - [Rust][DataFusion] Add DataFusion logos
  • ARROW-11866 - [C++] Arrow Flight SetShutdownOnSignals cause potential mutex deadlock in gRPC
  • ARROW-11872 - [C++] Fix Array validation when Array contains non-CPU buffers
  • ARROW-11880 - [R] Handle empty or NULL transmute() args properly
  • ARROW-11881 - [Rust][DataFusion] Fix clippy lint
  • ARROW-11896 - [Rust] Disable Debug symbols on CI test builds
  • ARROW-11904 - [C++] Try to fix crash on test tear down
  • ARROW-11905 - [C++] Fix SIMD detection on macOS
  • ARROW-11914 - [R][CI] r-sanitizer nightly is broken
  • ARROW-11918 - [R][Documentation] Docs cleanups
  • ARROW-11923 - [CI] Update branch name for dask dev integration tests
  • ARROW-11937 - [C++] Fix GZip codec hanging if flushed twice
  • ARROW-11941 - [Dev] Don't update Jira if run "DEBUG=1 merge_arrow_pr.py"
  • ARROW-11942 - [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads
  • ARROW-11945 - [R] filter doesn't accept negative numbers as valid
  • ARROW-11956 - [C++] Fix system re2 dependency detection for static library
  • ARROW-11965 - [R][Docs] Simplify install.packages command in R dev docs
  • ARROW-11970 - [C++][CI] Fix Valgrind error in arrow-csv-test
  • ARROW-11971 - [Packaging] Vcpkg patch doesn't apply on windows due to line endings
  • ARROW-11975 - [CI][GLib] Remove needless libgccjit
  • ARROW-11976 - [C++] Fix sporadic TSAN error with GatingTask
  • ARROW-11983 - [Python] Avoid ImportError calling from_pandas in threaded code
  • ARROW-11997 - [Python] concat_tables crashes python interpreter
  • ARROW-12003 - [R] Fix NOTE re undefined global function group_by_drop_default
  • ARROW-12006 - [Java] Fix checkstyle config to work on Windows
  • ARROW-12012 - [Java][JDBC] Fix BinaryConsumer reallocation
  • ARROW-12013 - [C++][FlightRPC] Fix bundled gRPC version probing
  • ARROW-12015 - [Rust][DataFusion] Integrate doc-comment crate to ensure readme examples remain valid
  • ARROW-12028 - ARROW-11940: [Rust][DataFusion] Add TimestampMillisecond support to GROUP BY/hash aggregates
  • ARROW-12029 - [R] Remove args from FeatherReader$create v2
  • ARROW-12033 - [Minor][Docs] Fix link in developers/benchmarks.html
  • ARROW-12041 - [C++][Python] Fix type property of tensor and sparse tensor IPC messages
  • ARROW-12051 - [GLib] Keep input stream reference of GArrowCSVReader
  • ARROW-12057 - [Python] Remove direct usage of pandas' Block subclasses (partly)
  • ARROW-12065 - [C++][Python] Fix segfault reading JSON file
  • ARROW-12067 - [Python][Doc] Document pyarrow_(un)wrap_scalar
  • ARROW-12073 - [R] Fix R CMD check NOTE about ‘X_____X’
  • ARROW-12076 - [Rust] Fix build
  • ARROW-12077 - [C++] Fix out-of-bounds write in ListArray::FromArrays
  • ARROW-12086 - [C++] Fix environment variables for bzip2, utf8proc URLs
  • ARROW-12088 - [Python] Fix compiler warning about offsetof
  • ARROW-12089 - [Doc] Fix Sphinx warnings
  • ARROW-12100 - [C++][IPC] Allow null children field when num children is 0
  • ARROW-12103 - [C++] Correctly handle unaligned access in bit-unpacking code
  • ARROW-12112 - [CI] Reduce footprint of conda-integration image
  • ARROW-12112 - [Rust] Create and store less debug information in CI and integration tests
  • ARROW-12113 - [R] Fix rlang deprecation warning from check_select_helpers()
  • ARROW-12130 - [C++] Don't enable Neon if -DARROW_SIMD_LEVEL=NONE
  • ARROW-12138 - [Go][IPC] Update flatbuffers definitions
  • ARROW-12140 - [C++][CI] Fix Valgrind failures in Grouper tests
  • ARROW-12145 - [Developer][Archery] Flaky: test_static_runner_from_json
  • ARROW-12149 - [Dev] Archery benchmark test case is failing
  • ARROW-12154 - [C++][Gandiva] Fix gandiva crash in certain OS/CPU combinations
  • ARROW-12155 - [R] Require Table columns to be same length
  • ARROW-12161 - [C++][Dataset] Revert async CSV reader in datasets
  • ARROW-12161 - [C++] Async streaming CSV reader deadlocking when being run synchronously from datasets
  • ARROW-12169 - [C++] Fix decompressing file with empty stream at the end
  • ARROW-12171 - [Rust] clean up clippy lints
  • ARROW-12172 - [Python][Packaging] Pass python version as setuptools pretend version in the macOS wheel builds
  • ARROW-12178 - [CI] Update setuptools in the ubuntu images
  • ARROW-12186 - [Rust][DataFusion] Fix regexp_match test
  • ARROW-12209 - [JS] Copy all src files into the TypeScript package
  • ARROW-12220 - [C++][CI] Thread sanitizer failure
  • ARROW-12226 - [C++] Fix Address Sanitizer failures
  • ARROW-12227 - [R] Fix RE2 and median nightly build failures
  • ARROW-12235 - [Rust][DataFusion] LIMIT returns incorrect results when used with several small partitions
  • ARROW-12241 - [Python] Make CSV cancellation test more robust
  • ARROW-12250 - [Rust][Parquet] Fix failing arrow_writer test
  • ARROW-12254 - [Rust][DataFusion] Stop polling limit input once limit is reached
  • ARROW-12258 - [R] Never do as.data.frame() on collect(as_data_frame = FALSE)
  • ARROW-12262 - [Doc] Enable S3 and Flight in docs build
  • ARROW-12267 - [Rust] Implement support for timestamps in JSON writer
  • ARROW-12273 - [JS][Rust] Remove coveralls
  • ARROW-12279 - [Rust][DataFusion] Add test for null handling in hash join (ARROW-12266)
  • ARROW-12294 - [Rust] Fix boolean kleene kernels with no remainder
  • ARROW-12299 - [Python] Recognize new filesytems in pq.write_to_dataset
  • ARROW-12300 - [C++] Remove linking of cuda runtime library
  • ARROW-12313 - [Rust][Ballista] Update benchmark docs for Ballista
  • ARROW-12314 - [Python] Accept columns as set in parquet read_pandas
  • ARROW-12327 - [Dev] Use pull request's head remote when submitting crossbow jobs via the comment bot
  • ARROW-12330 - [Developer] Restore values at counters column of Archery benchmark
  • ARROW-12334 - [Rust][Ballista] Aggregate queries producing incorrect results
  • ARROW-12342 - [Packaging] Fix tabulation in crossbow templates for submitting nightly builds
  • ARROW-12357 - [Archery] Bump Jinja2 version requirement
  • ARROW-12379 - [C++] Fix ThreadSanitizer failure in SerialExecutor
  • ARROW-12382 - [C++] Bundle xsimd if runtime SIMD level is set
  • ARROW-12385 - [R][CI] fix cran picking in CI
  • ARROW-12390 - [Rust] Inline from_trusted_len_iter, try_from_trusted_len_iter, extend_from_slice
  • ARROW-12401 - [R] Fix guard around dataset___Scanner__TakeRows
  • ARROW-12405 - [Packaging] Fix apt artifact patterns and artifact uploading from travis
  • ARROW-12408 - [R] Delete Scan()
  • ARROW-12421 - [Rust][DataFusion] Fix topkexec failure
  • ARROW-12421 - [Rust][DataFusion] Disable repartition rule
  • ARROW-12429 - [C++] Fix incorrectly registered test
  • ARROW-12433 - [Rust] Update nightly rust version
  • ARROW-12437 - [Rust][Ballista] Create DataFusion context without repartition
  • ARROW-12440 - [Release][Packaging] Various packaging, release script and release verification script fixes
  • ARROW-12466 - [Python] Avoid AttributeError crash when comparing with None
  • ARROW-12475 - [C++] Fix 'warn_unused_result' warning
  • ARROW-12487 - [C++][Dataset] Fix ScanBatches() hanging
  • ARROW-12495 - [C++] Fix NumPyBuffer::mutable_data()
  • ARROW-12794 - C++/R: read_parquet halts process when accessed multiple times
  • PARQUET-1655 - [C++] Fix comparison of Decimal values in statistics
  • PARQUET-2008 - [C++] Fix information written in RowGroup::total_byte_size

New Features and Improvements

  • ARROW-951 - [JS] Upgrade to typedoc 0.20.19
  • ARROW-2229 - [C++][Python] Add WriteCsv functionality.
  • ARROW-3690 - [Rust] Add Rust to the format integration testing
  • ARROW-6103 - [Release][Java] Remove mvn release plugin
  • ARROW-6248 - [Python][C++] Raise better exception on HDFS file open error
  • ARROW-6455 - [C++] Implement ExtensionType for non-UTF8 Unicode data
  • ARROW-6604 - [C++] Add support for nested types to MakeArrayFromScalar
  • ARROW-7215 - [C++][Gandiva] Implement castVARCHAR(numeric_type) functions
  • ARROW-7364 - [Rust][DataFusion] Add cast options to cast kernel and TRY_CAST to DataFusion
  • ARROW-7633 - [C++][CI] Create fuzz targets for tensors and sparse tensors
  • ARROW-7808 - [Java][Dataset] Implement Dataset Java API by JNI to C++
  • ARROW-7906 - [C++][Python] Add ORC write support
  • ARROW-8049 - [C++] Bump thrift to 0.13 and require cmake 3.10 for it
  • ARROW-8282 - [C++/Python][Dataset] Support schema evolution for integer columns
  • ARROW-8284 - [C++][Dataset] Schema evolution for timestamp columns
  • ARROW-8630 - [C++][Dataset] Pass schema including all materialized fields to catch CSV edge cases
  • ARROW-8631 - [C++][Python][Dataset] Add ReadOptions to CsvFileFormat, expose options to python
  • ARROW-8658 - [C++][Dataset] Implement subtree pruning for FileSystemDataset
  • ARROW-8672 - [Java] Implement RecordBatch IPC buffer compression from ARROW-300
  • ARROW-8732 - [C++] Add basic cancellation API
  • ARROW-8771 - [C++] Add boost/process library to build support
  • ARROW-8796 - [Rust] Allow parquet to be written directly to memory
  • ARROW-8797 - [C++] Read RecordBatch in a different endian
  • ARROW-8900 - [C++][Python] Expose Proxy Options as parameters for S3FileSystem
  • ARROW-8919 - [C++][Compute][Dataset] Add Function::DispatchBest to accomodate implicit casts
  • ARROW-9128 - [C++] Implement string space trimming kernels: trim, ltrim, and rtrim
  • ARROW-9149 - [C++] Improve configurability of RandomArrayGenerator::ArrayOf
  • ARROW-9196 - [C++][Compute] All casts accept scalar and sliced inputs
  • ARROW-9318 - [C++] Parquet encryption key management
  • ARROW-9731 - [C++][Python][R][Dataset] Implement Scanner::Head
  • ARROW-9749 - [C++][GLib][Python][R][Ruby][Dataset] Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions
  • ARROW-9777 - [Rust] Implement IPC changes to catch up to 1.0.0 format
  • ARROW-9856 - [R] Add bindings for string compute functions
  • ARROW-10014 - [C++] TaskGroup::Finish should execute tasks
  • ARROW-10089 - [R] inject base class for Array, ChunkedArray and Scalar
  • ARROW-10183 - [C++] Apply composable futures to CSV
  • ARROW-10195 - [C++] Add string struct extract kernel using re2
  • ARROW-10250 - [C++][FlightRPC] Consistently use FlightClientOptions::Defaults
  • ARROW-10255 - [JS] Reorganize exports for ESM tree-shaking
  • ARROW-10297 - [Rust] Parameter for parquet-read to output data in json format, add "cli" feature to parquet crate
  • ARROW-10299 - [Rust] Use IPC Metadata V5 as default
  • ARROW-10305 - [R] Filter with regular expressions
  • ARROW-10306 - [C++] Add string replacement kernel
  • ARROW-10349 - [Python] Build and publish aarch64 wheels
  • ARROW-10354 - [Rust][DataFusion] regexp_extract function to select regex groups from strings
  • ARROW-10360 - [CI] Bump Github Actions cache version
  • ARROW-10372 - [Dataset][C++][Python][R] Support reading compressed CSV
  • ARROW-10406 - [C++] Unify dictionaries when writing IPC file in a single shot
  • ARROW-10420 - [C++] Refactor io and filesystem APIs to take an IOContext
  • ARROW-10421 - [R] Use gc_memory_pool in more places
  • ARROW-10438 - [C++][Dataset] Partitioning::Format on nulls
  • ARROW-10520 - [C++][R] Implement add/remove/replace for RecordBatch
  • ARROW-10570 - [R] Use Converter API to convert SEXP to Array/ChunkedArray
  • ARROW-10580 - [C++] Disallow non-monotonic dense union offsets
  • ARROW-10606 - [C++] Implement Decimal256 casts
  • ARROW-10655 - [C++] Add cache and memoization facility
  • ARROW-10734 - [R] Build and test on Solaris
  • ARROW-10735 - [R] Remove arrow-without-arrow wrapping
  • ARROW-10766 - [Rust][Parquet] Compute nested list definitions
  • ARROW-10816 - [Rust][DataFusion] Initial support for Interval expressions
  • ARROW-10831 - [C++][Compute] Implement quantile kernel
  • ARROW-10846 - [C++] Add async filesystem operations
  • ARROW-10880 - [Java] Support compressing RecordBatch IPC buffers by LZ4
  • ARROW-10882 - [Python] Allow writing dataset from iterator of batches
  • ARROW-10895 - [C++][Gandiva] Implement bool to varchar cast function in Gandiva
  • ARROW-10903 - [Rust] Implement FromIter<Option<Vec<u8>>> constructor for FixedSizeBinaryArray
  • ARROW-11022 - [Rust] Upgrade to Tokio 1.0
  • ARROW-11070 - [C++][Compute] Implement power kernel
  • ARROW-11074 - [Rust][DataFusion] Implement predicate push-down for parquet tables
  • ARROW-11081 - [Java] Make IPC option immutable
  • ARROW-11108 - [Rust] Fixed performance issue in mutableBuffer.
  • ARROW-11141 - [Rust] Add basic Miri checks to CI pipeline
  • ARROW-11149 - [Rust] DF Support List/LargeList/FixedSizeList in create_batch_empty
  • ARROW-11150 - [Rust] Add Arrow Rust Community section to Rust README
  • ARROW-11154 - [CI][C++] Move homebrew crossbow tests off of Travis-CI
  • ARROW-11156 - [Rust][DataFusion] Create hashes vectorized in hash join
  • ARROW-11174 - [C++][Dataset] Make expressions available to projection
  • ARROW-11179 - [Format] Make FB comments friendly to rust
  • ARROW-11183 - [Rust] [Parquet] LogicalType::TIMESTAMP_NANOS missing
  • ARROW-11191 - [C++] Use FnOnce for TaskGroup's tasks instead of std::function
  • ARROW-11216 - [Rust] add doc example for StringDictionaryBuilder
  • ARROW-11220 - [Rust] Implement GROUP BY support for Boolean
  • ARROW-11222 - [Rust] Catch up with flatbuffers 0.8.1 which had some UB problems fixed
  • ARROW-11246 - [Rust] Add type to Unexpected accumulator state error
  • ARROW-11254 - [Rust][DataFusion] Add SIMD and snmalloc flags as options to benchmarks
  • ARROW-11260 - [C++][Dataset] Don't require dictionaries when specifying explicit partition schema
  • ARROW-11265 - [Rust] Made bool not ArrowNativeType
  • ARROW-11268 - [Rust][DataFusion] MemTable::load output partition support
  • ARROW-11270 - [Rust] Array slice accessors
  • ARROW-11279 - [Rust][Parquet] ArrowWriter Definition Levels Memory Usage
  • ARROW-11284 - [R] Support dplyr verb transmute()
  • ARROW-11289 - [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns
  • ARROW-11290 - [Rust][DataFusion] Address hash aggregate performance issue with high number of groups
  • ARROW-11291 - [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)
  • ARROW-11300 - [Rust][DataFusion] Further performance improvements on hash aggregation with small groups
  • ARROW-11308 - [Rust][Parquet] Support decimal when writing parquet files
  • ARROW-11309 - [Release][C#] Use .NET 3.1 for verification
  • ARROW-11310 - [Rust] implement JSON writer
  • ARROW-11314 - [Release][APT][Yum] Add support for verifying arm64 packages
  • ARROW-11317 - [Rust] Include the prettyprint feature in CI Coverage
  • ARROW-11318 - [Rust] Support pretty printing timestamp, date, and timestamp types
  • ARROW-11319 - [Rust][DataFusion] Improve test comparisons to record batch, remove test::format_batch
  • ARROW-11321 - [Rust][DataFusion] Fix DataFusion compilation error
  • ARROW-11325 - [Packaging][C#] Release Apache.Arrow.Flight and Apache.Arrow.Flight.AspNetCore
  • ARROW-11329 - [Rust] Don't rerun build.rs on every file change
  • ARROW-11330 - [Rust][DataFusion] add ExpressionVisitor to encode expression walking
  • ARROW-11332 - [Rust] Use MutableBuffer in take_string instead of Vec
  • ARROW-11333 - [Rust] Generalized creation of empty arrays.
  • ARROW-11336 - [C++][Doc] Improve Developing on Windows docs
  • ARROW-11338 - [R] Bindings for quantile and median
  • ARROW-11340 - [C++] Add vcpkg.json manifest to cpp project root
  • ARROW-11343 - [Rust][DataFusion] Simplified example with UDF.
  • ARROW-11346 - [C++][Compute] Implement quantile kernel benchmark
  • ARROW-11349 - [Rust] Add from_iter_values to create arrays from (non null) values
  • ARROW-11350 - [C++] Bump dependency versions
  • ARROW-11354 - [Rust] Speed-up cast of dates and times (2-4x)
  • ARROW-11355 - [Rust] Aligned Date DataType with specification.
  • ARROW-11358 - [Rust] Add benchmark for concatenating small arrays
  • ARROW-11360 - [Rust][DataFusion] Improve CSV "No files found" error message
  • ARROW-11361 - [Rust] Build MutableBuffer/Buffer from iterator of bools
  • ARROW-11362 - [Rust][DataFusion] Use iterator APIs in to_array_of_size to improve performance
  • ARROW-11365 - [Rust][Parquet] Logical type printer and parser
  • ARROW-11366 - [Datafusion] Implement constant folding for boolean literal expressions
  • ARROW-11367 - [C++] Implement t-digest approximate quantile utility
  • ARROW-11369 - [DataFusion] Split physical_plan/expressions.rs
  • ARROW-11372 - [Release] Support RC verification on macOS-ARM64
  • ARROW-11373 - [Python][Docs] Add example of specifying type for a column when reading csv file
  • ARROW-11374 - [Python] Make legacy pyarrow.filesystem / pyarrow.serialize warnings more visisble (DeprecationWarning -> FutureWarning)
  • ARROW-11375 - [Rust] Fix deprecation warning in clippy
  • ARROW-11377 - [C++][CI] Add Thread Sanitizer nightly build
  • ARROW-11383 - [Rust] Faster bit AND and OR (2x)
  • ARROW-11386 - [Release] Fix post documents update script
  • ARROW-11389 - [Rust] make comments more consistent and fix typos
  • ARROW-11395 - [DataFusion] Support custom optimizers
  • ARROW-11401 - [Rust][DataFusion] Pass slices instead of Vec in DataFrame API
  • ARROW-11404 - [Rust][DataFusion] Upgrade to aHash 0.7 + minor cleanup
  • ARROW-11405 - [DataFusion] Support multiple custom logical nodes
  • ARROW-11406 - [CI][C++] Fix ccache caching on Travis-CI
  • ARROW-11408 - [Rust] Add window support to datafusion readme
  • ARROW-11411 - [Packaging][Linux] Disable arm64 nightly builds
  • ARROW-11414 - [Rust] Reduce copies in Schema::try_merge
  • ARROW-11417 - [Integration] Add integration tests for buffer compression
  • ARROW-11418 - [Doc] Add buffer compression to IPC support matrix
  • ARROW-11421 - [Rust][DataFusion] Support GROUP BY Date32
  • ARROW-11422 - [C#] add decimal support
  • ARROW-11423 - [R] value_counts and some StructArray methods
  • ARROW-11425 - [C++][Compute] Optimize quantile kernel for integers
  • ARROW-11426 - [Rust][DataFusion] EXTRACT support
  • ARROW-11428 - [Rust] Add power_scalar kernel
  • ARROW-11429 - Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11430 - [Rust] zip kernel: combine arrays based on boolean mask
  • ARROW-11431 - [Rust][DataFusion] Support the HAVING clause.
  • ARROW-11435 - [Datafusion] allow creating ParquetPartition from external crate, make combine_filters public
  • ARROW-11436 - [Rust] Improved from_iter for primitive arrays (-20-30% for cast)
  • ARROW-11437 - [Rust] Removed duplicated code in benches
  • ARROW-11438 - [Rust][DataFusion] Support literal boolean values in DataFusion SQL
  • ARROW-11439 - [Rust] Add year support to temporal kernels
  • ARROW-11440 - [Rust][DataFusion] Add method to CsvExec to get CSV schema
  • ARROW-11442 - [Rust] Expose datetime conversion logic independently
  • ARROW-11443 - [Rust] Write datetime information for Date64 Type in csv writer
  • ARROW-11444 - [Rust][DataFusion] Accept slices as parameters
  • ARROW-11446 - [DataFusion] Added support for scalarValue in Builtin functions.
  • ARROW-11447 - [Rust] Add shift kernel for primitive types
  • ARROW-11449 - [CI][R][Windows] Use ccache
  • ARROW-11457 - [Rust] Make string comparisson kernels generic over Utf8 and LargeUtf8
  • ARROW-11459 - [Rust] Added API to build ListArray of Primitives from an iterator
  • ARROW-11462 - [Developer] Remove needless quote from the default DOCKER_VOLUME_PREFIX
  • ARROW-11463 - [Python] Expose "allow_64bit" to IpcWriteOptions in pyarrow.
  • ARROW-11466 - [Go][Flight] adding Basic Auth handling for go flight client and server
  • ARROW-11467 - [R] Fix reference to json_table_reader() in R docs
  • ARROW-11468 - [R] Allow user to pass schema to read_json_arrow()
  • ARROW-11474 - [C++] Update bundled re2 version
  • ARROW-11476 - [Rust][DataFusion] Test running of TPCH benchmarks in CI
  • ARROW-11477 - [R][Doc] Reorganize and improve README and vignette content
  • ARROW-11478 - [R] Consider ways to make arrow.skip_nul option more user-friendly
  • ARROW-11479 - [Rust][Parquet] Add Method to get compressed size of columns from row group metadata
  • ARROW-11481 - [Rust] More cast implementations
  • ARROW-11484 - [Rust][DataFusion] Derive Clone for ExecutionContext
  • ARROW-11486 - [Website] Use Jekyll 4 and webpack to support Ruby 3.0 or later
  • ARROW-11489 - [Rust][DataFusion] Make DataFrame be Send + Sync
  • ARROW-11491 - [Rust] support JSON schema inference for nested list and struct
  • ARROW-11493 - [CI][Packaging][deb][RPM] Test built packages
  • ARROW-11500 - [R] Allow bundled build script to run on Solaris
  • ARROW-11501 - [C++] endianness check does not work on Solaris
  • ARROW-11504 - [Rust] Added checks to List DataType.
  • ARROW-11505 - [Rust] Add support for LargeUtf8 in csv-writer
  • ARROW-11507 - [R] Bindings for GetRuntimeInfo
  • ARROW-11510 - [Python] Add note that pip >= 19.0 is required to get binary packages
  • ARROW-11511 - [Rust] Replace Arc<ArrayData> by ArrayData in all arrays
  • ARROW-11512 - [Packaging][deb] Add missing gRPC dependency for Ubuntu 21.04
  • ARROW-11513 - [R] Bindings for sub/gsub
  • ARROW-11516 - [R] Allow all C++ compute functions to be called by name in dplyr
  • ARROW-11539 - [Developer][Archery] Change items_per_seconds units
  • ARROW-11541 - [C++][Compute] Implement tdigest kernel
  • ARROW-11542 - [Rust] fix validity bitmap buffer length count in json reader
  • ARROW-11544 - [Rust][DataFusion] Implement as_any for AggregateExpr
  • ARROW-11545 - [Rust][DataFusion] SendableRecordBatchStream should implement Sync
  • ARROW-11556 - [C++] Assorted benchmark-related improvements
  • ARROW-11557 - [Rust][Datafusion] Add deregister_table
  • ARROW-11559 - [C++] Add regression file
  • ARROW-11559 - [C++] Use smarter Flatbuffers verification parameters
  • ARROW-11561 - [Rust][DataFusion] Add Send + Sync to MemTable::load
  • ARROW-11563 - [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))
  • ARROW-11568 - [C++][Compute] Rewrite mode kernel
  • ARROW-11570 - [Rust] ScalarValue - support Date64
  • ARROW-11571 - [CI] Cancel stale Github Actions workflow runs
  • ARROW-11572 - [Rust] Add a kernel for division by single scalar
  • ARROW-11573 - [Developer][Archery] Google benchmark now reports run type
  • ARROW-11574 - [Rust][DataFusion] Upgrade sqlparser to support parsing all TPC-H queries
  • ARROW-11575 - [Developer][Archery] Expose execution time in benchmark results
  • ARROW-11576 - [Rust] Fix unused variable in Rust code example
  • ARROW-11580 - [C++] Add CMake option ARROW_DEPENDENCY_SOURCE=VCPKG
  • ARROW-11581 - [Packaging][C++] Formalize distribution through vcpkg
  • ARROW-11589 - [R] Add methods for modifying Schemas
  • ARROW-11590 - [C++] Move CSV background generator to IO thread pool
  • ARROW-11591 - [C++][Compute] Grouped aggregation
  • ARROW-11592 - [Rust] Fix typo in comment
  • ARROW-11594 - [Rust] Support pretty printing of NullArray
  • ARROW-11597 - [Rust] Split file in smaller ones.
  • ARROW-11598 - [Rust] Split buffer.rs in smaller files
  • ARROW-11599 - [Rust] Add function to create array with all nulls
  • ARROW-11601 - [C++][Python][Dataset] expose Parquet pre-buffer option
  • ARROW-11606 - [Rust][DataFusion] Add input schema to HashAggregateExec
  • ARROW-11610 - [C++] Download boost from sourceforge instead of bintray
  • ARROW-11611 - [C++] Update third party dependency mirrors
  • ARROW-11612 - [C++] Rebuild trimmed boost bundle for 1.75.0
  • ARROW-11613 - [R] Move nightly C++ builds off of bintray
  • ARROW-11616 - [Rust][DataFusion] Add collect_partitioned on DataFrame
  • ARROW-11621 - [CI][Gandiva][Linux] Fix Crossbow setup failure
  • ARROW-11626 - [Rust][DataFusion][DataFusion] examples to own project
  • ARROW-11627 - [Rust] Make allocator be a generic over type T
  • ARROW-11637 - [CI][Conda] Update nightly clean target platforms and packages list
  • ARROW-11641 - [CI] Use docker buildkit's inline cache to reuse build cache across different hosts
  • ARROW-11649 - [R] Add support for null_fallback to R
  • ARROW-11651 - [Rust][DataFusion] Implement Postgres String Functions: Length Functions
  • ARROW-11653 - [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex
  • ARROW-11655 - [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
  • ARROW-11656 - [Rust][DataFusion] Remaining Postgres String functions
  • ARROW-11659 - [R] Preserve group_by .drop argument
  • ARROW-11662 - [C++] Support sorting decimal and fixed size binary data
  • ARROW-11664 - [Rust] cast to LargeUtf8
  • ARROW-11665 - [C++][Python] Improve docstrings for decimal and union types
  • ARROW-11666 - [Integration] Add endianness "gold" integration file for decimal256
  • ARROW-11667 - [Rust] Add documentation for utf8 comparison kernels
  • ARROW-11669 - [Rust][DataFusion] Remove concurrency field from GlobalLimitExec and SortExec
  • ARROW-11671 - [Rust][DataFusion] Clean up Expr doc comments and examples
  • ARROW-11677 - [C++][Docs] Add basic C++ datasets documentation
  • ARROW-11680 - [C++] Add vendored version of folly's spsc queue
  • ARROW-11683 - [R] Support dplyr::mutate()
  • ARROW-11685 - [C++] Fix typo: FutureStessTest -> FutureStressTest
  • ARROW-11688 - [Rust] Casts between Utf8 and LargeUtf8
  • ARROW-11690 - [Rust][DataFusion] Avoid expr copies while using builder methods
  • ARROW-11692 - [Rust][DataFusion] Improve OptimizerRule comments
  • ARROW-11693 - [C++] Add string length kernel
  • ARROW-11700 - [R] Internationalize error handling in tidy eval
  • ARROW-11701 - [R] Implement dplyr::relocate()
  • ARROW-11703 - [R] Implement dplyr::arrange()
  • ARROW-11704 - [R] Wire up dplyr::mutate() for datasets
  • ARROW-11707 - [Rust] support CSV schema inference without file IO
  • ARROW-11708 - [Rust] fix Rust 2021 linting warnings
  • ARROW-11709 - [Rust][DataFusion] Move expressions and inputs into LogicalPlan ratherthan helpers in util
  • ARROW-11710 - [Rust][DataFusion] Implement ExpressionRewriter
  • ARROW-11719 - [Rust][Datafusion] support creating memory table with merged schema
  • ARROW-11721 - [Rust] json schema inference to return Schema instead of SchemaRef
  • ARROW-11722 - [Rust] Improve error message in FFI cast.
  • ARROW-11724 - [C++] Resolve namespace collisions with protobuf 3.15
  • ARROW-11725 - [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow
  • ARROW-11727 - [C++][FlightRPC] Estimate latency quantiles with TDigest
  • ARROW-11730 - [C++] Add implicit convenience constructors for constructing Future from Status/Result
  • ARROW-11733 - [Rust][DataFusion] Implement hash partitioning
  • ARROW-11734 - [C++] vendored safe-math.h does not compile on Solaris
  • ARROW-11735 - [R] Allow Parquet and Arrow Dataset to be optional components
  • ARROW-11736 - [R] Allow string compute functions to be optional
  • ARROW-11737 - [C++] Patch vendored xxhash for Solaris
  • ARROW-11738 - [Rust][DataFusion] Fix Concat and Trim Functions
  • ARROW-11740 - [C++] posix_memalign not declared in scope on Solaris
  • ARROW-11742 - [Rust][DataFusion] Add Expr::is_null and Expr::is_not_nu…
  • ARROW-11744 - [C++] Add xsimd dependency
  • ARROW-11745 - [C++] Add helper to generate random record batches by schema
  • ARROW-11750 - [Python][Dataset] Add support for project expressions
  • ARROW-11752 - [R] Replace usage of testthat::expect_is()
  • ARROW-11753 - [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved
  • ARROW-11754 - [R] Support dplyr::compute()
  • ARROW-11761 - [C++] Increase public API testing
  • ARROW-11766 - [R] Better handling for missing compression codecs on Linux
  • ARROW-11768 - [CI][C++] Make s390x job required
  • ARROW-11773 - [Rust] Support writing well formed JSON arrays as well as newline delimited json streams
  • ARROW-11774 - [R] macos one line install
  • ARROW-11775 - [Rust][DataFusion] Feature Flags for Dependencies
  • ARROW-11777 - [Rust] impl AsRef for StringBuilder/BinaryBuilder
  • ARROW-11778 - [Rust] Cast from LargeUtf8 to Numerical and temporal types
  • ARROW-11779 - [Rust] make alloc module public
  • ARROW-11790 - [Rust][DataFusion][Expr]
  • ARROW-11794 - [Go] Add concurrent-safe ipc.FileReader.RecordAt(i)
  • ARROW-11795 - [MATLAB] Migrate MATLAB Interface for Apache Arrow design doc to Markdown
  • ARROW-11797 - [C++][Dataset] Provide batch stream Scanner methods
  • ARROW-11798 - [Integration] Update testing submodule
  • ARROW-11799 - [Rust] fix len of string and binary arrays created from unbound iterator
  • ARROW-11801 - [C++] Remove bad header guard in filesystem/type_fwd.h
  • ARROW-11803 - [Rust][Parquet] Support v2 LogicalType
  • ARROW-11806 - [Rust][DataFusion] Optimize join / inner join creation of indices
  • ARROW-11820 - [Rust] Added macro to create native types
  • ARROW-11822 - [Rust][Datafusion] Support case sensitive comparisons for functions and aggregates
  • ARROW-11824 - [Rust][Parquet] Use logical types in Arrow schema conversion
  • ARROW-11825 - [Rust][DataFusion] Add mimalloc as option to benchmarks
  • ARROW-11833 - [C++] Bump vendored fast_float
  • ARROW-11837 - [C++][Dataset] expose originating Fragment on ScanTask
  • ARROW-11838 - [C++] Support IPC reads with shared dictionaries.
  • ARROW-11839 - [C++] Use xsimd for generation of accelerated bit-unpacking
  • ARROW-11842 - [Rust][Parquet] Use clone_from in get_batch_with_dict
  • ARROW-11852 - [Docs] Update CONTRIBUTING to explain Contributor role
  • ARROW-11856 - [C++] Remove unused reference to RecordBatchStreamWriter
  • ARROW-11858 - [GLib][Gandiva] Add Gandiva::Filter and related functions
  • ARROW-11859 - [GLib][Ruby] Add garrow_array_concatenate()
  • ARROW-11861 - [R][Packaging] Apply changes in r/tools/autobrew upstream
  • ARROW-11864 - [R] Document arrow.int64_downcast option
  • ARROW-11870 - [Dev] Automatically run merge script in virtual environment
  • ARROW-11876 - [Website] Update governance page
  • ARROW-11877 - [C++] Add microbenchmark for SimplifyWithGuarantee
  • ARROW-11879 - [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan
  • ARROW-11883 - [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map
  • ARROW-11887 - [C++] Add asynchronous read to streaming CSV reader
  • ARROW-11894 - [Rust][DataFusion] Change flight server example to use DataFrame API
  • ARROW-11895 - [Rust][DataFusion] Add support for more column statistics
  • ARROW-11898 - [Rust] Pretty print columns
  • ARROW-11899 - [Java] Refactor the compression codec implementation into core/Arrow specific parts
  • ARROW-11900 - [Website] Add Yibo to committer list
  • ARROW-11906 - [R] : Make FeatherReader print method more informative
  • ARROW-11907 - [C++] Use our own executor in S3FileSystem
  • ARROW-11910 - [Packaging][Ubuntu] Drop support for 16.04
  • ARROW-11911 - [Website] Add protobuf vs arrow to FAQ
  • ARROW-11912 - [R] Remove args from FeatherReader$create
  • ARROW-11913 - [Rust] Improve performance of StringBuilder by delaying bitmap creation
  • ARROW-11920 - [R] Remove r/libarrow when make cleaning
  • ARROW-11921 - [R] Set LC_COLLATE in r/data-raw/codegen.R
  • ARROW-11924 - [C++] Add streaming version of FileSystem::GetFileInfo
  • ARROW-11925 - [R] : Add between method for arrow_dplyr_query
  • ARROW-11927 - [Rust][DataFusion] Support Limit push down optimization
  • ARROW-11931 - [Go] bump to go1.15
  • ARROW-11935 - [C++] Add push generator
  • ARROW-11944 - [Developer] Fix archery's comparison of cached benchmark runs
  • ARROW-11949 - [Ruby] Accept raw Ruby objects as sort key and options
  • ARROW-11951 - [Rust] Remove OffsetSize::prefix
  • ARROW-11952 - [Rust] Make ArrayData --> GenericListArray fallable instead of panic!
  • ARROW-11954 - [C++] arrow/util/io_util.cc does not compile on Solaris
  • ARROW-11955 - [Rust][DataFusion] Support Union
  • ARROW-11958 - [GLib] Add garrow_chunked_array_combine()
  • ARROW-11959 - [Rust][DataFusion] Fix log line
  • ARROW-11962 - [Rust][DataFusion] Improve DataFusion docs
  • ARROW-11969 - [Rust][DataFusion] Improve Examples in documentation
  • ARROW-11972 - [C++][R][Python][Dataset] Extract IPC/Parquet fragment scan options
  • ARROW-11973 - [Rust][DataFusion] Boolean kleene kernels
  • ARROW-11977 - [Rust] Add documentation examples for sort kernel
  • ARROW-11982 - [Rust] Donate Ballista Distributed Compute Platform
  • ARROW-11984 - [C++][Gandiva] Implement SHA1 and SHA256 functions
  • ARROW-11987 - [C++][Gandiva] Implement trigonometric functions
  • ARROW-11988 - [C++][Gandiva] Implements last_day function
  • ARROW-11992 - [Rust][Parquet] Add upgrade notes on 4.0 rename of LogicalType
  • ARROW-11993 - [C++] Don't download xsimd if ARROW_SIMD_LEVEL=NONE
  • ARROW-11996 - [R] Make r/configure run successfully on Solaris
  • ARROW-11999 - [Java] Support parallel vector element search with user-specified comparator
  • ARROW-12000 - [Documentation] Add note about deviation from style guide on struct/classes
  • ARROW-12005 - [R] Fix a bash typo in configure
  • ARROW-12017 - [R][Documentation] Make proper developing arrow docs
  • ARROW-12019 - [Rust][Parquet] Update README for 2.6.0 support
  • ARROW-12020 - [Rust][DataFusion] Adding SHOW TABLES and SHOW COLUMNS + partial information_schema support to DataFusion
  • ARROW-12031 - [C++][CSV] infer CSV timestamps columns with fractional seconds
  • ARROW-12032 - [Rust] Optimize comparison kernels
  • ARROW-12034 - [Developer Tools] Formalize Minor PRs
  • ARROW-12037 - [Rust][DataFusion] Support catalogs and schemas for table namespacing
  • ARROW-12038 - [Rust][DataFusion] Upgrade hashbrown to 0.11
  • ARROW-12039 - [Nightly][Gandiva] Fix gandiva-jar-ubuntu nightly build failure
  • ARROW-12040 - [C++] Fix potential deadlock in recursive S3 walks
  • ARROW-12043 - [Rust][Parquet] Write FSB arrays
  • ARROW-12045 - [Go][Parquet] Initial Chunk of Parquet port to Go
  • ARROW-12047 - [Rust][Parquet] Cleanup clippy
  • ARROW-12048 - [Rust][DataFusion] Support Common Table Expressions
  • ARROW-12052 - [Rust] Add Child Data to Arrow's C FFI implementation. …
  • ARROW-12056 - [C++] Create sequencing AsyncGenerator
  • ARROW-12058 - [Python] Enable arithmetic operations on Expressions
  • ARROW-12068 - [Python] Stop using distutils
  • ARROW-12069 - [C++][Gandiva] Implement IN expressions for Decimal type
  • ARROW-12070 - [GLib] Drop support for GNU Autotools
  • ARROW-12071 - [GLib] Keep input stream reference of GArrowJSONReader
  • ARROW-12075 - [Rust][DataFusion] Add CTE + UNION ALL to supported list of SQL features
  • ARROW-12081 - [R] Bindings for utf8_length
  • ARROW-12082 - [R][Dataset] Allow create dataset from vector of file paths
  • ARROW-12094 - [C++][R] Fix re2 building on clang/libc++
  • ARROW-12097 - [C++] Modify BackgroundGenerator so it creates fewer threads
  • ARROW-12098 - [R] Catch cpp build failures on linux
  • ARROW-12104 - [Go][Parquet] Second chunk of Ported Go Parquet code
  • ARROW-12106 - [Rust][DataFusion] Support SELECT * from information_schema.tables
  • ARROW-12107 - [Rust][DataFusion] Support SELECT * from information_schema.columns
  • ARROW-12108 - [Rust][DataFusion] Implement SHOW TABLES
  • ARROW-12109 - [Rust][DataFusion] Implement SHOW COLUMNS
  • ARROW-12110 - [Java] Implement ZSTD compression
  • ARROW-12111 - [Java] Generate flatbuffer files using flatc 1.12.0
  • ARROW-12116 - [Rust] Fix and ignore 1.51 clippy lints
  • ARROW-12119 - [Rust][DataFusion] Improve performance of to_array_of_size for primitives
  • ARROW-12120 - [Rust] Generate random arrays and batches
  • ARROW-12121 - [Rust][Parquet] Arrow writer benchmarks
  • ARROW-12123 - [Rust][DataFusion] Use smallvec for indices for better join performance
  • ARROW-12128 - [CI][Crossbow] Remove test-ubuntu-16.04-cpp job
  • ARROW-12131 - [CI][GLib] Ensure upgrading MSYS2
  • ARROW-12133 - [C++][Gandiva] Add option to disable targeting host cpu during llvm ir compilation
  • ARROW-12134 - [C++] Add match_substring_regex kernel
  • ARROW-12136 - [Rust][DataFusion] Reduce default batch_size to 8192
  • ARROW-12139 - [Python][Packaging] Use vcpkg to build macOS wheels
  • ARROW-12141 - [R] Bindings for grepl
  • ARROW-12143 - [CI] R builds should timeout and fail after some threshold and dump the output.
  • ARROW-12146 - [C++][Gandiva] Implement CONVERT_FROM(expression, replacement char) function
  • ARROW-12151 - [Docs] Add Jira component + summary conventions to the docs
  • ARROW-12153 - [Rust][Parquet] Return file stats after writing file
  • ARROW-12160 - [Rust] Add into_inner() to StreamWriter
  • ARROW-12164 - [Java] Make BaseAllocator.Config public
  • ARROW-12165 - [Rust] inline append functions of builders
  • ARROW-12168 - [Go][IPC] Implement Compression handling for Arrow IPC
  • ARROW-12170 - [Rust][DataFusion] Introduce repartition optimization
  • ARROW-12173 - [GLib] Remove #include <config.h>
  • ARROW-12176 - [C++] Fix some typos of cpp examples
  • ARROW-12187 - [C++][FlightRPC] Add compression benchmark for stream writing
  • ARROW-12188 - [Docs] Switch to pydata-sphinx-theme for the main sphinx docs
  • ARROW-12190 - [Rust][DataFusion] Implement parallel / partitioned hash join
  • ARROW-12192 - [Website] Use downloadable URL for archive download
  • ARROW-12193 - [Dev][Release] Use downloadable URL for archive download
  • ARROW-12194 - [Rust][Parquet] Bump zstd to v0.7
  • ARROW-12197 - [R] dplyr bindings for cast, dictionary_encode
  • ARROW-12200 - [R] Export and document list_compute_functions
  • ARROW-12204 - [Rust][CI] Reduce size of Rust build artifacts in integration test
  • ARROW-12206 - [Python][Docs] Fix Table docstrings
  • ARROW-12208 - [C++] Add the ability to run async tasks without using the CPU thread pool
  • ARROW-12210 - [Rust][DataFusion] Document SHOW TABLES / SHOW COLUMNS / Information Schema
  • ARROW-12214 - [Rust][DataFusion] Add tests for limit
  • ARROW-12215 - [C++] Allow null values in fixed-size binary columns read from CSV
  • ARROW-12217 - [C++] Cleanup cpp examples source files naming
  • ARROW-12222 - [Dev][Packaging] Include build url in the crossbow console report
  • ARROW-12224 - [Rust] Use stable rust for no default test, clean up CI tests
  • ARROW-12228 - [CI] Create base image for conda environments
  • ARROW-12236 - [R][CI] Add check that all docs pages are listed in _pkgdown.yml
  • ARROW-12237 - [Packaging][Debian] Add support for bullseye
  • ARROW-12238 - [JS] Remove trailing spaces and consistently add space after //
  • ARROW-12239 - [JS] Switch to yarn
  • ARROW-12242 - [Python][Doc] Tweak nightly build instructions
  • ARROW-12246 - [CI] Sync conda recipes with upstream feedstock
  • ARROW-12248 - [C++] Avoid looking up ARROW_DEFAULT_MEMORY_POOL environment variable too late
  • ARROW-12249 - [R][CI] Fix test-r-install-local nightlies
  • ARROW-12251 - [Rust] Add Ballista to CI
  • ARROW-12263 - [Dev][Packaging] Move Crossbow to Archery
  • ARROW-12269 - [JS] Move to eslint
  • ARROW-12274 - [JS] Document how to run tests without building bundles
  • ARROW-12277 - [Rust][DataFusion] Implement Sum/Count/Min/Max aggregates for Timestamp(,)
  • ARROW-12278 - [Rust][DataFusion] Use Timestamp(Nanosecond, None) for SQL TIMESTAMP Type
  • ARROW-12280 - [Developer] Remove @-mentions from commit messages in merge tool
  • ARROW-12281 - [JS] Remove shx, trash, and rimraf and update learna for yarn
  • ARROW-12283 - [R] Bindings for basic type convert functions in dplyr verbs
  • ARROW-12286 - [C++] Create AsyncGenerator from Future<AsyncGenerator<T>>
  • ARROW-12287 - [C++] Create enumerating generator
  • ARROW-12288 - [C++] Create Scanner interface
  • ARROW-12289 - [C++] Create basic AsyncScanner implementation
  • ARROW-12303 - [JS] Use iterator instead of yield
  • ARROW-12304 - [R] Update news and polish docs for 4.0
  • ARROW-12305 - [JS] Update generate.py to python3 and new versions of pyarrow
  • ARROW-12309 - [JS] Make es2015 bundles the default
  • ARROW-12316 - [C++] Prefer mimalloc on Apple
  • ARROW-12317 - [Rust] JSON writer support for time, duration and date
  • ARROW-12320 - [CI] REPO arg missing from conda-cpp-valgrind
  • ARROW-12323 - [C++][Gandiva] Implement castTIME(timestamp) function
  • ARROW-12325 - [C++][CI] Nightly gandiva build failing due to failure of compiler to move return value
  • ARROW-12326 - [C++] Avoid needless c-ares detection
  • ARROW-12328 - [Rust][Ballista] Fix formatting
  • ARROW-12329 - [Rust][Ballista] Add Ballista README
  • ARROW-12332 - [Rust][Ballista] Add simple api server in scheduler
  • ARROW-12333 - [JS] Remove jest-environment-node-debug and do not emit from typescript by default
  • ARROW-12335 - [Rust][Ballista] Use latest DataFusion
  • ARROW-12337 - [Rust] add DoubleEndedIterator and ExactSizeIterator traits
  • ARROW-12351 - [CI][Ruby] Use ruby/setup-ruby instead of actions/setup-ruby
  • ARROW-12352 - [CI][R][Windows] Remove needless workaround for MSYS2
  • ARROW-12353 - [Packaging][deb] Rename -archive-keyring to -apt-source
  • ARROW-12354 - [Packaging][RPM] Use apache.jfrog.io/artifactory/ instead of apache.bintray.com/
  • ARROW-12356 - [Website] Update install page instructions to point to artifactory
  • ARROW-12361 - [Rust][DataFusion] Allow users to override physical optimization rules
  • ARROW-12367 - [C++] Stop producing when PushGenerator was destroyed
  • ARROW-12370 - [R] Bindings for power kernel
  • ARROW-12374 - [CI][C++][cron] Use Ubuntu 20.04 instead of 16.04
  • ARROW-12375 - [Release] Remove rebase post-release scripts
  • ARROW-12376 - [Dev] Log traceback for unexpected exceptions in archery trigger-bot
  • ARROW-12380 - [Rust][Ballista] Basic scheduler ui
  • ARROW-12381 - [Packaging][Python] macOS wheels are built with wrong package kind
  • ARROW-12383 - [JS] Upgrade dependencies
  • ARROW-12384 - [JS] Use let/const and clean up eslint rules
  • ARROW-12389 - [R][Docs] Add note about autocasting
  • ARROW-12395 - Create RunInSerialExecutor benchmark
  • ARROW-12396 - [Python][Docs] Clarify serialization/filesystem docstrings about deprecated status
  • ARROW-12397 - [Rust][DataFusion] Simplify readme example
  • ARROW-12398 - [Rust] remove redundant bound check in iterators
  • ARROW-12400 - [Rust] Re-enable tests in arrow::array::transform
  • ARROW-12402 - [Rust][DataFusion] Implement SQL metrics example
  • ARROW-12406 - [R] Fix checkbashism violation in configure
  • ARROW-12409 - [R] Remove LazyData from DESCRIPTION
  • ARROW-12419 - [Java] Remove to download flatc binary for s390x
  • ARROW-12420 - [C++/Dataset] Reading null columns as dictionary not longer possible
  • ARROW-12423 - [Docs] Remove Codecov badge
  • ARROW-12425 - [Rust] Fix new_null_array dictionary creation
  • ARROW-12432 - [Rust][DataFusion] Add metrics to SortExec
  • ARROW-12436 - [Rust][Ballista] Add watch capabilities to config backend trait
  • ARROW-12467 - [C++][Gandiva] Add support for LLVM12
  • ARROW-12477 - [Release] Download aarch64 miniforge
  • ARROW-12485 - [C++] Use mimalloc as the default memory allocator on macOS
  • ARROW-12488 - [GLib] Use g_memdup2() with GLib 2.68 or later
  • ARROW-12494 - [C++] ORC adapter fails to compile on GCC 4.8
  • ARROW-12506 - [Python] Improve modularity of pyarrow codebase to speedup compile time
  • ARROW-12652 - disable conda arm64 in nightly
  • PARQUET-1846 - [C++] Remove deprecated IO classes
  • PARQUET-1899 - [C++] Deprecated ReadBatchSpaced
  • PARQUET-1990 - [C++] Refuse to write ConvertedType::NA
  • PARQUET-1993 - [C++] expose way to wait for I/O to complete

Apache Arrow 3.0.0 (2021-01-25)

New Features and Improvements

  • ARROW-1846 - [C++][Compute] Implement "any" reduction kernel for boolean data
  • ARROW-4193 - [Rust] Add support for decimal data type
  • ARROW-4544 - [Rust] JSON nested struct reader
  • ARROW-4804 - [Rust] Parse Date32 and Date64 in CSV reader
  • ARROW-4960 - [R] Build r-arrow conda package in crossbow
  • ARROW-4970 - [C++][Parquet] Implement parquet::FileMetaData::Equals
  • ARROW-5336 - [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries
  • ARROW-5350 - [Rust] Allow filtering on simple lists
  • ARROW-5394 - [C++][Benchmark] IsIn and IndexIn benchmark for integer and string types
  • ARROW-5679 - [Python][CI] Remove Python 3.5 support
  • ARROW-5950 - [Rust][DataFusion] Add logger
  • ARROW-6071 - [C++] Generic binary-to-binary casts
  • ARROW-6697 - [Rust] [DataFusion] Validate that all parquet partitions have the same schema
  • ARROW-6715 - [Website] Describe "non-free" component is needed for Plasma packages in install page
  • ARROW-6883 - [C++][Python] Allow writing dictionary deltas
  • ARROW-6995 - [Packaging][Crossbow] The windows conda artifacts are not uploaded to GitHub releases
  • ARROW-7531 - [C++] Reduce header inclusion cost slightly
  • ARROW-7800 - [Python] implement iter_batches() method for ParquetFile and ParquetReader
  • ARROW-7842 - [Rust][Parquet] Arrow list reader
  • ARROW-8113 - [C++] Lighter weight variant<>
  • ARROW-8199 - [C++] Add support for multi-column sort indices on Table
  • ARROW-8289 - [Rust] Parquet Arrow writer with nested support
  • ARROW-8423 - [Rust][Parquet] Serialize Arrow schema metadata
  • ARROW-8425 - [Rust][Parquet] Correct temporal IO
  • ARROW-8426 - [Rust][Parquet] - Add more support for converting Dicts
  • ARROW-8426 - [Rust][Parquet] Add support for writing dictionary types
  • ARROW-8853 - [Rust][Integration Testing] Enable Flight tests
  • ARROW-8876 - [C++] Implement casts from date types to Timestamp
  • ARROW-8883 - [Rust][Integration] Enable more tests
  • ARROW-9001 - [R] Box outputs as correct type in call_function
  • ARROW-9164 - [C++] Add embedded documentation to compute functions
  • ARROW-9187 - [R] Add bindings for arithmetic kernels
  • ARROW-9296 - [Rust][DataFusion] Address clippy errors clippy::unnecessary_unwrap, clippy::useless_format,
  • ARROW-9304 - [C++] Add "AppendEmpty" builder APIs for use inside StructBuilder::AppendNull
  • ARROW-9361 - [Rust] Move array types into their own modules
  • ARROW-9367 - [Python] Sorting on pyarrow data structures ?
  • ARROW-9400 - [Python] Do not depend on conda-forge static libraries in Windows wheel builds
  • ARROW-9475 - [Java] Clean up usages of BaseAllocator, use BufferAllocator in…
  • ARROW-9489 - [C++][string][string] )
  • ARROW-9555 - [Rust][DataFusion] Implement physical node for inner join
  • ARROW-9564 - [Packaging] Vendor r-arrow-feedstock conda-forge recipe
  • ARROW-9674 - [Rust] Make the parquet read and writers Send
  • ARROW-9704 - [Java] TestEndianness.testLittleEndian supports little- and big-endian platforms
  • ARROW-9707 - [Rust] [DataFusion] Re-implement threading model
  • ARROW-9709 - [Java] Test cases in arrow-vector takes care of endianness
  • ARROW-9728 - [Rust][Parquet] Nested definition & repetition for structs
  • ARROW-9747 - [Java][C++] Initial Support for 256-bit Decimals
  • ARROW-9771 - [Rust][DataFusion] treat predicates separated by AND separately in predicate pushdown
  • ARROW-9803 - [Go] Add initial support for s390x
  • ARROW-9804 - [FlightRPC] Flight auth redesign
  • ARROW-9828 - [Rust][DataFusion] Support filter pushdown optimisation for TableProvider implementations
  • ARROW-9861 - [Java] Support big-endian in DecimalVector
  • ARROW-9862 - [Java] Enable UnsafeDirectLittleEndian on a big-endian platform
  • ARROW-9911 - [Rust][DataFusion] SELECT <expression> with no FROM clause should produce a single row of output
  • ARROW-9945 - [C++][Dataset] Refactor Expression::Assume to return a Result
  • ARROW-9991 - [C++] Split kernels for strings/binary
  • ARROW-10002 - [Rust] Remove trait specialization from arrow crate
  • ARROW-10021 - [C++][Compute] Return top-n modes in mode kernel
  • ARROW-10032 - [Documentation] update C++ windows docs
  • ARROW-10079 - [Rust] Benchmark and improve count bits
  • ARROW-10095 - [Rust] Update rust-parquet-arrow-writer branch's encode_arrow_schema with ipc changes
  • ARROW-10097 - [C++] Persist SetLookupState in between usages of IsIn when filtering dataset batches
  • ARROW-10106 - [FlightRPC][Java] Expose onIsReady() callback
  • ARROW-10108 - [Rust] [Parquet] Fix compiler warning about unused return value
  • ARROW-10109 - [Rust] Add support to the C data interface for primitive types and utf8
  • ARROW-10110 - [Rust] Add support to consume C Data Interface
  • ARROW-10131 - [C++][Dataset][Python] Lazily parse parquet metadata
  • ARROW-10135 - [Rust][Parquet] Refactor file module to help adding sources
  • ARROW-10143 - [C++] Rewrite Array(Range)Equals
  • ARROW-10144 - [Flight] Add support for using the TLS_SNI extension
  • ARROW-10149 - [Rust] Add support to external release of un-owned buffers
  • ARROW-10163 - [Rust][DataFusion] Add DictionaryArray coercion support
  • ARROW-10168 - [Rust][Parquet] Schema roundtrip - use Arrow schema from Parquet metadata when available
  • ARROW-10173 - [Rust][DataFusion] Implement support for direct comparison to scalar values
  • ARROW-10180 - [C++][Doc] Update dependency management docs
  • ARROW-10182 - [C++] Add basic continuation support to Future
  • ARROW-10191 - [Rust][Parquet] Add roundtrip Arrow -> Parquet tests for all supported Arrow DataTypes
  • ARROW-10197 - [python][Gandiva] Execute expression on filtered data
  • ARROW-10203 - [Doc] Give guidance on big-endian support in the contributors docs
  • ARROW-10207 - [C++] Allow precomputing output string/list offsets in kernels
  • ARROW-10208 - [C++] Fix split string kernels on sliced input
  • ARROW-10216 - [Rust] Simd implementation for primitive min/max kernels
  • ARROW-10224 - [Python] Add support for Python 3.9 except macOS wheel and Windows wheel
  • ARROW-10225 - [Rust][Parquet] Fix null comparison in roundtrip
  • ARROW-10228 - [Julia] Contribute Julia implementation
  • ARROW-10236 - [Rust] Add can_cast_types to arrow cast kernel, use in DataFusion
  • ARROW-10241 - [C++][Compute] Add variance kernel benchmark
  • ARROW-10249 - [Rust] Support nested dictionaries inside list arrays
  • ARROW-10259 - [Rust] Add custom metadata to Field
  • ARROW-10261 - [Rust][Breaking] Change List datatype to Box<Field>
  • ARROW-10263 - [C++][Compute] Improve variance kernel numerical stability
  • ARROW-10268 - [Rust] Write out non-nested dictionaries in the IPC format
  • ARROW-10269 - [Rust] Update to 2020-11-14 nightly
  • ARROW-10277 - [C++] Support comparing scalars approximately
  • ARROW-10289 - [Rust] Read dictionaries in IPC streams
  • ARROW-10292 - [Rust][DataFusion] Simplify merge
  • ARROW-10295 - [Rust][DataFusion] Replace Rc<RefCell<>> by Box<> in accumulators.
  • ARROW-10300 - [Rust] Improve documentation for TPC-H benchmark
  • ARROW-10301 - [C++][Compute] Implement "all" reduction kernel for boolean data
  • ARROW-10302 - [Python] Don't double-package plasma-store-server
  • ARROW-10304 - [C++][Compute] Optimize variance kernel for integers
  • ARROW-10310 - [C++][Gandiva] Add single argument round() in Gandiva
  • ARROW-10311 - [Release] Update crossbow verification process
  • ARROW-10313 - [C++] Faster UTF8 validation for small strings
  • ARROW-10318 - [C++] Use pimpl idiom in CSV parser
  • ARROW-10319 - [Go][Flight] Add context to flight client auth handler
  • ARROW-10320 - [Rust][DataFusion] Migrated from batch iterators to batch streams.
  • ARROW-10322 - [C++][Dataset] Minimize Expression
  • ARROW-10323 - [Release][wheel] Add missing verification setup step
  • ARROW-10325 - [C++][Compute] Refine aggregate kernel registration
  • ARROW-10328 - [C++] Vendor fast_float number parsing library
  • ARROW-10330 - [Rust][DataFusion] Implement NULLIF() SQL function
  • ARROW-10331 - [Rust][DataFusion] Re-organize DataFusion errors
  • ARROW-10332 - [Rust] Allow CSV reader to iterate from start up to end
  • ARROW-10334 - [Rust][Parquet] NullArray roundtrip
  • ARROW-10336 - [Rust] Added FromIter and ToIter for string arrays
  • ARROW-10337 - [C++] More liberal parsing of ISO8601 timestamps with fractional seconds
  • ARROW-10338 - [Rust] Use const fn for applicable methods
  • ARROW-10340 - [Packaging][deb][RPM] Use Python 3.8 for pygit2
  • ARROW-10356 - [Rust][DataFusion] Add support for is_in
  • ARROW-10363 - [Python] Remove CMake bug workaround in manylinux
  • ARROW-10366 - [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate
  • ARROW-10375 - [Rust] Removed PrimitiveArrayOps
  • ARROW-10378 - [Rust] Update take() kernel with support for LargeList.
  • ARROW-10381 - [Rust] Generalized Ordering for inter-array comparisons
  • ARROW-10382 - [Rust] Fix typos
  • ARROW-10383 - [Doc] fix typos
  • ARROW-10384 - [C++] Fix typos
  • ARROW-10385 - [C++][Gandiva] Add support for LLVM 11
  • ARROW-10389 - [Rust][DataFusion] Make the custom source implementation API more explicit
  • ARROW-10392 - [C++][Gandiva] Avoid string copy while evaluating IN expression
  • ARROW-10396 - [Rust][Parquet] Publically export SliceableCursor and FileSource
  • ARROW-10398 - [Rust][Parquet] Re-Export parquet::record::api::Field
  • ARROW-10400 - [C++] Propagate TLS client peer_identity when using mutual TLS
  • ARROW-10402 - [Rust] Refactor array equality
  • ARROW-10407 - [C++] Add BasicDecimal256 division Support
  • ARROW-10408 - [Java] Bump Avro to 1.10.0
  • ARROW-10410 - [Rust] Some refactorings
  • ARROW-10416 - [R] Support Tables in Flight
  • ARROW-10422 - [Rust] Removed unused trait BinaryArrayBuilder
  • ARROW-10424 - [Rust] Minor simplification to the generic impl PrimitiveArray
  • ARROW-10428 - [FlightRPC][Java] Add support for HTTP cookies
  • ARROW-10445 - [Rust] Added doubleEnded iterator to PrimitiveArrayIter
  • ARROW-10449 - [Rust] Make Dictionary::keys be an array
  • ARROW-10454 - [Rust][Datafusion] support creating ParquetExec from filelist and schema
  • ARROW-10455 - [Rust][CI] Fixed error in caching files
  • ARROW-10458 - [Rust][Datafusion] create_logical_plan should not require mutable reference
  • ARROW-10464 - [Rust][DataFusion] Add utility to convert TPC-H data from tbl to CSV and Parquet
  • ARROW-10466 - [Rust] [Website] Update implementation status page
  • ARROW-10467 - [FlightRPC][Java] Add the ability to pass arbitrary client headers.
  • ARROW-10468 - [C++][Compute] Provide KernelExecutor instead of FunctionExecutor
  • ARROW-10476 - [Rust] Allow string arrays to be built from Option<&str> or Option<String>
  • ARROW-10477 - [Rust] Add iterator support for Binary arrays.
  • ARROW-10478 - [Dev][Release] Correct Java versions to 3.0.0-SNAPSHOT
  • ARROW-10481 - [R] Bindings to add, remove, replace Table columns
  • ARROW-10483 - [C++] Move Executor into a separate header
  • ARROW-10484 - [C++] Make Future<> more generic
  • ARROW-10487 - [FlightRPC][C++] Header-based auth in clients
  • ARROW-10490 - [C++][GLib] Fix range-loop-analysis warnings
  • ARROW-10492 - [Java][JDBC] Allow users to config the mapping between SQL types and Arrow types
  • ARROW-10504 - [C++] Suppress UBSAN pointer-overflow warning in RapidJSON
  • ARROW-10510 - [Rust][DataFusion] Benchmark COUNT(DISTINCT) queries.
  • ARROW-10515 - [Julia][Doc] Update lists of supported languages to include Julia
  • ARROW-10522 - [R] Allow rename Table and RecordBatch columns with names()
  • ARROW-10526 - [FlightRPC][C++] Client cookie middleware
  • ARROW-10530 - [R] Optionally use distro package in linuxlibs.R
  • ARROW-10531 - [Rust][DataFusion] : Add schema and graphviz formatting for LogicalPlans and a PlanVisitor
  • ARROW-10539 - [Packaging][Python] Use GitHub Actions to build wheels for Windows
  • ARROW-10540 - [Rust] Extended filter kernel to all types and improved performance
  • ARROW-10541 - [C++] Add re2 library to core arrow / ARROW_WITH_RE2
  • ARROW-10542 - [C#][Flight] Add beginning on flight code for net core
  • ARROW-10543 - [Developer] Add a note about being patient after gitbox is enabled
  • ARROW-10552 - [Rust] Removed un-used Result
  • ARROW-10559 - [Rust][DataFusion] Split up logical_plan/mod.rs into sub modules
  • ARROW-10561 - [Rust] Simplified Buffer's write and write_bytes and fixed undefined behavior
  • ARROW-10562 - [Rust] Potential UB on unsafe code
  • ARROW-10566 - [C++] Allow validating ArrayData directly
  • ARROW-10567 - [C++] Add multiple perf runs options for higher precision reporting
  • ARROW-10572 - [Rust][DataFusion] Use aHash instead of FnvHashMap
  • ARROW-10574 - [Python][Parquet] Allow collections for 'in' / 'not in' filter (in addition to sets)
  • ARROW-10575 - [Rust] Rename union.rs to be cosistent with other arrays
  • ARROW-10581 - [Doc] IPC dictionary reference to relevant section
  • ARROW-10582 - [Rust][DataFusion] Implement "repartition" operator
  • ARROW-10584 - [Rust][DataFusion] Add SQL support for JOIN ON syntax
  • ARROW-10585 - [Rust][DataFusion] Add join support to DataFrame and LogicalPlan
  • ARROW-10586 - [Rust] [DataFusion] Add join support to query planner
  • ARROW-10589 - [Rust] Implement AVX-512 bit and operation
  • ARROW-10590 - [Rust] Remove Date32(Millisecond) from casts
  • ARROW-10591 - [Rust] Add support for StructArray to MutableArrayData
  • ARROW-10595 - [Rust] Simplify inner loop of min/max kernels for non-null case
  • ARROW-10596 - [Rust] Improve take benchmark
  • ARROW-10598 - [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance
  • ARROW-10604 - [GLib][Ruby] Add support for 256-bit decimal
  • ARROW-10607 - [C++][Parquet] Add parquet support for decimal256.
  • ARROW-10609 - [Rust] Optimize min/max of non null strings
  • ARROW-10628 - [Rust] flag clippy warnings as errors
  • ARROW-10633 - [Rust][DataFusion] Dependency version updates
  • ARROW-10634 - [C#][CI] Change the build version from 2.2 to 3.1 in CI
  • ARROW-10636 - [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet
  • ARROW-10637 - [Rust] Added examples to some boolean kernels.
  • ARROW-10638 - [Rust] Improved tests of boolean kernel.
  • ARROW-10639 - [Rust] Added examples to is_null kernel and simplified signature.
  • ARROW-10644 - [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs
  • ARROW-10646 - [C++][FlightRPC] Disable flaky Flight test on Windows
  • ARROW-10648 - [Java] Prepare Java codebase for source release without requiring any git tags to be created or pushed
  • ARROW-10651 - [C++] Fix alloc-dealloc-mismatch in S3-related factory
  • ARROW-10652 - [C++][Gandiva] Make gandiva cache size configurable
  • ARROW-10653 - [Rust] Update toolchain nightly
  • ARROW-10654 - [Rust] Specialize parsing of floats / bools in CSV Reader
  • ARROW-10660 - [Rust] Implement AVX-512 bit or operation
  • ARROW-10665 - [Rust] like/nlike utf8 scalar fast paths, bug fixes in like/nlike
  • ARROW-10666 - [Rust][DataFusion] Support nested SELECT statements.
  • ARROW-10669 - [C++][Compute] Support scalar arguments to Boolean compute functions
  • ARROW-10672 - [Rust][DataFusion] Made Limit be computed on the stream.
  • ARROW-10673 - [Rust][DataFusion] Made sort not collect on execute.
  • ARROW-10674 - [Rust] Fix BigDecimal to be little endian; Add IPC Reader/Writer for Decimal type to allow integration tests
  • ARROW-10677 - [Rust] Fix CSV Boolean parsing + add tests to demonstrate supported csv parsing
  • ARROW-10679 - [Rust][DataFusion] Implement CASE WHEN physical expression
  • ARROW-10680 - [Rust][DataFusion] Add partial support for TPC-H query 12
  • ARROW-10682 - [Rust] Improve sort kernel performance by enabling inlining of is_valid calls
  • ARROW-10685 - [Rust][DataFusion] Added support for Join on filter-pushdown optimizer.
  • ARROW-10688 - [Rust][DataFusion] Implement CASE WHEN logical plan
  • ARROW-10689 - [Rust][DataFusion] Add SQL support for CASE WHEN
  • ARROW-10693 - [Rust][DataFusion] Add support to left join
  • ARROW-10696 - [C++] Add SetBitRunReader
  • ARROW-10697 - [C++] Add notes about bitmap readers
  • ARROW-10703 - [Rust][DataFusion] Compute build-side of hash join once
  • ARROW-10704 - [Rust][DataFusion] Remove Nested from expression enum
  • ARROW-10708 - [Packaging][deb] Add support for Ubuntu 20.10
  • ARROW-10709 - [C++][Python] Allow PyReadableFile::Read() to call pyobj.read_buffer()
  • ARROW-10712 - [Rust][DataFusion] Add tests to TPC-H benchmarks
  • ARROW-10717 - [Rust][DataFusion] Add support for right join
  • ARROW-10720 - [C++] Add Rescale support for BasicDecimal256
  • ARROW-10721 - [C#][CI] Use .NET 3.1 by default
  • ARROW-10722 - [Rust][DataFusion] Reduce overhead of some data types in aggregations / joins, improve benchmarks
  • ARROW-10723 - [Packaging][deb][RPM] Enable Parquet encription
  • ARROW-10724 - [Dev Tools] Added labeler to PRs that need rebase.
  • ARROW-10725 - [Python][Compute] Expose sort options in Python bindings
  • ARROW-10728 - [Rust][DataFusion] Support USING in SQL
  • ARROW-10729 - [Rust][DataFusion] Add SQL support for JOIN using implicit syntax
  • ARROW-10732 - [Rust][DataFusion] Integrate DFSchema as a step towards supporting qualified column names
  • ARROW-10733 - [R] Improvements to Linux installation troubleshooting
  • ARROW-10740 - [Rust][DataFusion] Remove redundant clones found by clippy
  • ARROW-10741 - [Rust] Apply previously ignored clippy suggestions
  • ARROW-10742 - [Python] Check mask when creating array from numpy
  • ARROW-10745 - [Rust] Directly allocate padding bytes in filter context
  • ARROW-10747 - [Rust] : CSV reader optimization
  • ARROW-10750 - [Rust][DataFusion] Add SQL support for LEFT and RIGHT join
  • ARROW-10752 - [GLib] Add garrow_schema_has_metadata()
  • ARROW-10754 - [GLib] Add support for metadata to GArrowField
  • ARROW-10755 - [Rust][Parquet] Add support for writing boolean type
  • ARROW-10756 - [Rust][DataFusion] Fix reduntant clones
  • ARROW-10759 - [Rust][DataFusion] Implement string to date cast
  • ARROW-10763 - [Rust] Speed up take for primitive / boolean for non-null arrays
  • ARROW-10765 - [Rust] Optimize take string for non-null arrays
  • ARROW-10767 - [Rust] Speed up sum with nulls (non-simd)
  • ARROW-10770 - [Rust] JSON nested list reader
  • ARROW-10772 - [Rust] Speed up take by writing to buffer
  • ARROW-10775 - [Rust][DataFusion] Use ahash in join hashmap
  • ARROW-10776 - [C++] Allow STL iteration over concrete primitive arrays
  • ARROW-10781 - [Rust][DataFusion] add the 'Statistics' interface in data source
  • ARROW-10783 - [Rust][DataFusion] Implement Statistics for Parquet TableProvider
  • ARROW-10785 - [Rust] Optimize take string
  • ARROW-10786 - [Packaging][RPM] Drop support for CentOS 6
  • ARROW-10788 - [C++] Make S3 recursive tree walks parallel
  • ARROW-10789 - [Rust][DataFusion] Make TableProvider dynamically typed
  • ARROW-10790 - [C++] Improve ChunkedArray and Table sort_indices performance
  • ARROW-10792 - [Rust][CI] Modularize builds for faster build and smaller caches
  • ARROW-10795 - [Rust] Optimize specialization for datatypes
  • ARROW-10796 - [C++] Implement optimized RecordBatch sorting
  • ARROW-10800 - [Rust][Parquet] Provide access to the elements of parquet::record::{List, Map}
  • ARROW-10802 - [C++][NullType] in parquet column writer
  • ARROW-10808 - [Rust][DataFusion] Support nested expressions in aggregations.
  • ARROW-10809 - [C++] Use Datum for SortIndices() input
  • ARROW-10812 - [Rust] Make BooleanArray not a PrimitiveArray
  • ARROW-10813 - [Rust][DataFusion] Implement DFSchema
  • ARROW-10814 - [Packaging][deb] Remove support for Debian GNU/Linux Stretch
  • ARROW-10817 - [Rust][DataFusion] Implement TypedString and DATE coercion
  • ARROW-10820 - [Rust][DataFusion] Complete TPC-H Benchmark Queries
  • ARROW-10821 - [Rust][Datafusion] support negative expression
  • ARROW-10822 - [Rust][Datafusion] add simd feature flag to datafusion
  • ARROW-10824 - [Rust] Added partialEq to null array
  • ARROW-10825 - [Rust] Added support for NullArray to MutableArrayData
  • ARROW-10826 - [Rust] Add support for FixedSizeBinaryArray to MutableArrayData
  • ARROW-10827 - [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)
  • ARROW-10828 - [Rust][DataFusion] Address / fix clippy lints
  • ARROW-10829 - [Rust][DataFusion] Implement Into<Schema> for DFSchema
  • ARROW-10832 - [Rust][Arrow] generate src/ipc/gen/* with latest snapshot flatc.
  • ARROW-10836 - [Rust] Extend take kernel to FixedSizeListArray
  • ARROW-10838 - [Rust][CI] Add arrow build targeting wasm32
  • ARROW-10839 - [Rust][Data Fusion] Implement BETWEEN operator
  • ARROW-10843 - [C++] Add support for temporal types in sort family kernels
  • ARROW-10845 - [Python][CI] Build with nightly numpy and pandas artifacts
  • ARROW-10849 - [Python] Handle numpy deprecation warnings for builtin type aliases
  • ARROW-10851 - [C++] Reduce size of generated code for sort kernels
  • ARROW-10857 - [Packaging] Follow PowerTools repository name change on CentOS 8
  • ARROW-10858 - [C++] Add missing Boost dependency with Visual C++
  • ARROW-10861 - [Python] Update minimal NumPy version to 1.16.6
  • ARROW-10864 - [Rust] Use standard ordering for floats
  • ARROW-10865 - [Rust] Easier to use Schema -> DFSchema conversion
  • ARROW-10867 - [C++] Workaround gcc internal compiler error
  • ARROW-10869 - [GLib] Add garrow_*sortindices() and related options
  • ARROW-10870 - [Julia][Doc] Include Julia in project documentation
  • ARROW-10871 - [Julia][CI] Setup Julia testing via Github Actions
  • ARROW-10873 - [C++] Apple Silicon is reported as arm64 in CMake
  • ARROW-10874 - [Rust][DataFusion] Add statistics for MemTable, change statistics struct
  • ARROW-10877 - [Rust] [DataFusion] Add benchmark based on kaggle movies
  • ARROW-10878 - [Rust] Simplify extend_from_slice
  • ARROW-10879 - [Packaging][deb] Restore Debian GNU/Linux Buster support
  • ARROW-10881 - [C++] Fix EXC_BAD_ACCESS in PutSpaced
  • ARROW-10885 - [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows
  • ARROW-10887 - [Doc][C++] Document C++ IPC API
  • ARROW-10889 - [Rust][Proposal] Add guidelines about usage of unsafe
  • ARROW-10890 - [Rust] [DataFusion] JOIN support
  • ARROW-10891 - [Rust][DataFusion] Enable / fix clone_on_copy, map_clone, or_fun_call
  • ARROW-10893 - [Rust][DataFusion] More clippy lints
  • ARROW-10896 - [C++][CMake] Rename internal RE2 package name to "re2" from "RE2"
  • ARROW-10900 - [Rust][DataFusion] Resolve TableScan provider eagerly
  • ARROW-10904 - [Python][CI][Packaging] Add support for Python 3.9 macOS wheels
  • ARROW-10905 - [Python] Add support for Python 3.9 windows wheels
  • ARROW-10908 - [Rust][DataFusion] Update relevant tpch-queries with BETWEEN
  • ARROW-10917 - [Doc] Update feature matrix for Rust
  • ARROW-10918 - [Doc][C++] Document supported Parquet features
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10927 - [Rust][Parquet][REVERT]
  • ARROW-10927 - [Rust][Parquet] Add Decimal to ArrayBuilderReader
  • ARROW-10929 - [Rust] Change CI to use Stable Rust
  • ARROW-10933 - [Rust] Update readme files in regard to nightly rust
  • ARROW-10934 - [Python] Skip filesystem tests for in-memory fs for fsspec 0.8.5
  • ARROW-10938 - [Rust] upgrade dependency "flatbuffers" to 0.8
  • ARROW-10940 - [Rust] Extend sort kernel to ListArray
  • ARROW-10941 - [Doc] Document supported Parquet encryption features
  • ARROW-10944 - [Rust] Implement min/max aggregate kernels for BooleanArray
  • ARROW-10946 - [Rust] Simplified bit chunk iterator
  • ARROW-10947 - [Rust][DataFusion] Optimize UTF8 to Date32 Conversion
  • ARROW-10948 - [C++] Always use GTestConfig.cmake
  • ARROW-10949 - [Rust] Removed un-needed clone
  • ARROW-10951 - [Python][CI] Fix nightly pandas builds (pytest monkeypatch issue)
  • ARROW-10952 - [Rust] Add pre-commit hook
  • ARROW-10966 - [C++] Use FnOnce for ThreadPool's tasks instead of std::function
  • ARROW-10968 - [Rust][DataFusion] Don't build hash table for right side of join
  • ARROW-10969 - [Rust][DataFusion] Implement basic String ANSI SQL Functions
  • ARROW-10985 - [Rust] Update unsafe guidelines for adding JIRA references
  • ARROW-10986 - [Rust][DataFusion] Add average stats to TPC-H benchmarks
  • ARROW-10988 - [C++] Require CMake 3.5 or later
  • ARROW-10989 - [Rust] Iterate primitive buffers by slice
  • ARROW-10993 - [CI][macOS] Fix Python 3.9 installation by Homebrew
  • ARROW-10995 - [Rust][DataFusion] Limit ParquetExec concurrency when reading large number of files
  • ARROW-11004 - [FlightRPC][Python] Header-based auth in clients
  • ARROW-11005 - [Rust] Remove indirection from take kernel
  • ARROW-11008 - [Rust][DataFusion] Simplify count accumulator
  • ARROW-11009 - [C++] Allow changing default memory pool with an environment variable
  • ARROW-11010 - [Python] `np.float` deprecation warning in `_pandas_logical_type_map`
  • ARROW-11012 - [Rust][DataFusion] Make write_csv and write_parquet concurrent
  • ARROW-11015 - [CI][Gandiva] Move gandiva nightly build from travis to github action
  • ARROW-11018 - [Rust][DataFusion] Add support for column-level statistics, null count.
  • ARROW-11026 - [Rust] : Run tests without requiring environment variables
  • ARROW-11028 - [Rust] Make a few pattern matches more idiomatic
  • ARROW-11029 - [Rust][DataFusion] Add documentation for code that determines number of rows per operator
  • ARROW-11032 - [C++][FlightRPC] Benchmark unix socket RPC
  • ARROW-11033 - [Rust] Csv writing performance improvements
  • ARROW-11034 - [Rust] remove rustfmt ignore list, fix format
  • ARROW-11035 - [Rust] Improved performance of casting to utf8
  • ARROW-11037 - [Rust] Optimized creation of string array from iterator.
  • ARROW-11038 - [Rust] Removed unused trait and Result.
  • ARROW-11039 - [Rust] Performance improvement for utf-8 to float cast
  • ARROW-11040 - [Rust] Simplified builders
  • ARROW-11042 - [Rust][DataFusion] Increase default batch size
  • ARROW-11043 - [C++] Add "is_nan" kernel
  • ARROW-11046 - [Rust][DataFusion] Support count_distinct in DataFrame API
  • ARROW-11049 - [Python] Expose alternate memory pools
  • ARROW-11052 - [Rust][DataFusion] Implement metrics for HashJoinExec
  • ARROW-11053 - [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches
  • ARROW-11054 - [Rust][DataFusion] Move to sqlparser 0.7.0
  • ARROW-11055 - [Rust][DataFusion] Support date_trunc function
  • ARROW-11058 - [Rust][DataFusion] Implement coalesce batches operator
  • ARROW-11063 - [Rust][Breaking] Validate null counts when building arrays
  • ARROW-11064 - [Rust][DataFusion] Speed up hash join on smaller batches
  • ARROW-11072 - [Rust][Parquet] Support reading decimal from physical int types
  • ARROW-11076 - [Rust][DataFusion] Refactor usage of right indices in hash join
  • ARROW-11079 - [R] Catch up on changelog since 2.0
  • ARROW-11080 - [C++][Dataset] Improvements to implicit casting
  • ARROW-11082 - [Rust] C data interface to largeUTF8
  • ARROW-11086 - [Rust] Extend take implementation to more index types
  • ARROW-11091 - [Rust][DataFusion] Fix new clippy linting errors
  • ARROW-11095 - [Python] access pyarrow.RecordBatch field() and column() by string name
  • ARROW-11096 - [Rust][Large] binary
  • ARROW-11097 - [Rust] Minor simplification of some tests.
  • ARROW-11099 - [Rust] Remove unsafe value_slice and raw_values methods from primitive and boolean arrays
  • ARROW-11100 - [Rust] Speed up numeric to string cast using lexical_core
  • ARROW-11101 - [Rust] rewrite pre-commit hook
  • ARROW-11104 - [GLib] Add append_null/append_nulls to GArrowArrayBuilder and use them
  • ARROW-11105 - [Rust] Migrated MutableBuffer::freeze to From<MutableBuffer> for Buffer
  • ARROW-11109 - [GLib] Add garrow_array_builder_append_empty_value() and values()
  • ARROW-11110 - [Rust][Datafusion] ExecutionContext.table should take immutable reference
  • ARROW-11111 - [GLib] Add GArrowFixedSizeBinaryArrayBuilder
  • ARROW-11121 - [Developer] Use pull_request_target for PR JIRA integration
  • ARROW-11122 - [Rust] Added FFI support for date and time.
  • ARROW-11124 - [Doc] Update status matrix for Decimal256
  • ARROW-11125 - [Rust] Logical equality for list arrays
  • ARROW-11126 - [Rust] Document and test ARROW-10656
  • ARROW-11127 - [C++] ifdef unused cpu_info on non-x86 platforms
  • ARROW-11129 - [Rust][DataFusion] Use tokio for loading parquet
  • ARROW-11130 - [Website][CentOS 8][RHEL 8] Enable all required repositories by default
  • ARROW-11131 - [Rust] Improve performance of boolean_equal
  • ARROW-11136 - [R] Bindings for is.nan
  • ARROW-11137 - [Rust][DataFusion] Clippy needless_range_loop,needless_lifetimes
  • ARROW-11138 - [Rust][DataFusion] Add ltrim, rtrim to built-in functions
  • ARROW-11139 - [GLib] Add support for extension type
  • ARROW-11155 - [C++][Packaging] Move gandiva crossbow jobs off of Travis-CI
  • ARROW-11158 - [Julia] Implement Decimal256 support for Julia
  • ARROW-11159 - [Developer] Consolidate pull request related jobs
  • ARROW-11165 - [Rust][DataFusion] Document Postgres as standard SQL dialect
  • ARROW-11168 - [Rust][Doc] Fix cargo doc warnings
  • ARROW-11169 - [Rust] Add a comment explaining where float total_order algorithm came from
  • ARROW-11175 - [R] Small docs fixes
  • ARROW-11176 - [R] Expose memory pool name and document setting it
  • ARROW-11187 - [Rust][Parquet] Fix Build error by Pin specific parquet-format-rs version
  • ARROW-11188 - [Rust] Support crypto functions from PostgreSQL dialect
  • ARROW-11193 - [Java][Documentation] Add Java ListVector Documentation
  • ARROW-11194 - [Rust] Enable packed_simd for aarch64
  • ARROW-11195 - [Rust] [DataFusion] Built-in table providers should expose relevant fields
  • ARROW-11196 - [GLib] Add support for mock, HDFS and S3 file systems with factory function
  • ARROW-11198 - [Packaging][Python] Ensure setuptools version during build supports markdown
  • ARROW-11200 - [Rust][DataFusion] Physical operators and expressions should have public accessor methods
  • ARROW-11201 - [Rust][DataFusion] create_batch_empty - support more types
  • ARROW-11203 - [Developer][Website] Enable JIRA and pull request integration
  • ARROW-11204 - [C++] Fix build failures with bundled gRPC and Protobuf
  • ARROW-11205 - [GLib][Dataset] Add GADFileFormat and its family
  • ARROW-11209 - [Rust] DF - Better error message on unsupported GROUP BY
  • ARROW-11210 - [CI] Restore workflows that had been blocked by INFRA
  • ARROW-11212 - [Packaging][Python] Use vcpkg as dependency source for manylinux and windows wheels
  • ARROW-11213 - [Packaging][Python] Dockerize wheel building on windows
  • ARROW-11215 - [CI] Use named volumes by default for caching in docker-compose
  • ARROW-11218 - [R] Make SubTreeFileSystem print method more informative
  • ARROW-11219 - [CI][Ruby][MinGW] Reduce CI time
  • ARROW-11221 - [Rust] DF Implement GROUP BY support for Float32/Float64
  • ARROW-11231 - [Packaging][deb][RPM] Add support for mimalloc
  • ARROW-11234 - [CI][Ruby][macOS] Reduce CI time
  • ARROW-11236 - Bump Jackson to 2.11.4
  • ARROW-11240 - [Packaging][R] Add mimalloc to R packaging
  • ARROW-11242 - [CI] Remove CMake 3.2 job
  • ARROW-11245 - [C++][Gandiva] Add support for LLVM 11.1
  • ARROW-11247 - [C++] Infer date32 columns in CSV
  • ARROW-11256 - [Packaging][Linux] Don't buffer packaging output
  • ARROW-11272 - [Release][wheel] Remove unsupported Python 3.5 and manylinux1
  • ARROW-11273 - [Release][deb] Remove unsupported Debian GNU/Linux stretch
  • ARROW-11278 - [Release][NodeJS] Don't touch ~/.bash_profile
  • ARROW-11280 - [Release][APT] Fix minimal build example check
  • ARROW-11281 - [C++] Remove needless runtime RapidJSON dependency
  • ARROW-11282 - [Packaging][deb] Add missing libgflags-dev dependency
  • ARROW-11285 - [Release][APT] Add support for Ubuntu Groovy
  • ARROW-11292 - [Release][JS] Use Node.JS LTS
  • ARROW-11293 - [C++] Don't require Boost and gflags with find_package(Arrow)
  • ARROW-11307 - [Release][Ubuntu][20.10] Add workaround for dependency issue
  • ARROW-11454 - [Website] [Rust] 3.0.0 Blog Post
  • PARQUET-1566 - [C++] Indicate if null count, distinct count are present in column statistics

Bug Fixes

  • ARROW-2616 - [Python] Cross-compiling Pyarrow
  • ARROW-6582 - [R] Arrow to R fails with embedded nuls in strings
  • ARROW-7363 - [Python] add combine_chunks method to ChunkedArray
  • ARROW-7909 - [Website] Add how to install on Red Hat Enterprise Linux
  • ARROW-8258 - [Rust] [Parquet] ArrowReader fails on some timestamp types
  • ARROW-9027 - [Python][Testing] Split parquet tests into multiple files + clean-up
  • ARROW-9479 - [JS] Fix Table.from for zero-item serialized tables, Table.empty for schemas containing compound types (List, FixedSizeList, Map)
  • ARROW-9636 - [Python] Update documentation about 'LZO' compression in parquet.write_table
  • ARROW-9690 - [Go] tests failing on s390x
  • ARROW-9776 - [R] read_feather causes segfault in R if file doesn't exist
  • ARROW-9897 - [C++][Gandiva] Added to_date function
  • ARROW-9897 - [C++][Gandiva] Revert - to_date function
  • ARROW-9898 - [C++][Gandiva] Fix linking issue with castINT/FLOAT functions
  • ARROW-9903 - [R] open_dataset freezes opening feather files on Windows
  • ARROW-9963 - [Python] Recognize datetime.timezone.utc as UTC on conversion python->pyarrow
  • ARROW-10039 - [Rust] Do not require memory alignment of buffers
  • ARROW-10042 - [Rust] Fix tests involving ArrayData/Buffer equality
  • ARROW-10080 - [R] Call gc() and try again in MemoryPool
  • ARROW-10122 - [Python] Fix to_pandas conversion with subset of columns and MultiIndex
  • ARROW-10145 - [C++][Dataset] Assert integer overflow in partitioning falls back to string
  • ARROW-10146 - [Python] Fix parquet FileMetadata.to_dict in case statistics is not set
  • ARROW-10174 - [Java] Fix reading/writing dict structs
  • ARROW-10177 - [CI][Gandiva] Nightly gandiva-jar-xenial fails
  • ARROW-10186 - [Rust] Tests fail when following instructions in README
  • ARROW-10247 - [C++][Dataset] Support writing datasets partitioned on dictionary columns
  • ARROW-10264 - [Python] Fix failing hdfs test
  • ARROW-10270 - [R] Fix CSV timestamp_parsers test on R-devel
  • ARROW-10283 - [Python] Define PY_SSIZE_T_CLEAN to deal with Python deprecation warning
  • ARROW-10293 - [Rust][DataFusion] Fixed benchmarks
  • ARROW-10294 - [Java] Resolve problems of DecimalVector APIs on ArrowBufs
  • ARROW-10298 - [Rust] Incorrect offset handling in iterator over dictionary keys
  • ARROW-10321 - [C++] Use check_cxx_source_compiles for AVX512 detect in compiler
  • ARROW-10333 - [Java] Get rid of org.apache.arrow.util in vector
  • ARROW-10345 - [C++][Compute] Fix NaN handling in sorting and topn kernels
  • ARROW-10346 - [Python] Ensure tests aren't affected by user-supplied AWS config
  • ARROW-10348 - [C++] Fix crash on invalid Parquet data
  • ARROW-10350 - [Rust] Fixes to publication metadata in Cargo.toml
  • ARROW-10353 - [C++] Fix handling of compression in Parquet data pages v2
  • ARROW-10358 - [R] Followups to 2.0.0 release
  • ARROW-10365 - [R] Remove duplicate setting of S3 flag on macOS
  • ARROW-10369 - [Dev] Fix archery release utility test cases
  • ARROW-10371 - [R] Linux system requirements check needs to support older cmake versions
  • ARROW-10386 - [R] List column class attributes not preserved in roundtrip
  • ARROW-10388 - [Java] Fix Spark integration build failure
  • ARROW-10390 - [Rust][Parquet] Ensure it is possible to create custom parquet writers
  • ARROW-10393 - [Rust] Apply fix for null reading in json reader for nested
  • ARROW-10394 - [Rust][Large] BinaryArray creation
  • ARROW-10397 - [C++] Update comment to match change made in b1a7a73ff2
  • ARROW-10399 - [R] Fix performance regression from cpp11::r_string
  • ARROW-10411 - [C++] Fix incorrect child array lengths for Concatenate of FixedSizeList
  • ARROW-10412 - [C++] Improve grpc_cpp_plugin detection
  • ARROW-10413 - [Rust][Parquet] Unignore some tests that are passing now
  • ARROW-10414 - [R] open_dataset doesn't work with absolute/expanded paths on Windows
  • ARROW-10426 - [C++] Allow writing large strings to Parquet
  • ARROW-10433 - [Python] Swopped the conditions for checking for fsspec filesystems
  • ARROW-10434 - [Rust] Fix debug formatting for arrays with lengths between 10 and 20.
  • ARROW-10441 - [Java] Prevent closure of shared channels for FlightClient
  • ARROW-10446 - [C++][Python] Roundtrip Timestamp ns with TzInfo correctly
  • ARROW-10448 - [Rust] Remove PrimitiveArray::new that can cause UB
  • ARROW-10453 - [Rust] [DataFusion] Performance degredation after removing specialization
  • ARROW-10461 - [Rust] Fix offset bug in remainder bits
  • ARROW-10462 - [Python] Fix usage of fsspec in ParquetDataset causing path issue on Windows
  • ARROW-10463 - [R] Better messaging for currently unsupported CSV options in open_dataset
  • ARROW-10470 - [R] Fix missing file error causing NYC taxi example to fail
  • ARROW-10471 - [CI][Python] Ensure we have tests with s3fs and run those on CI
  • ARROW-10472 - [Python] Test to confirm casting timestamp scalars to date type works
  • ARROW-10475 - [C++][FlightRPC] handle IPv6 hosts
  • ARROW-10480 - [Python] don't infer compression by extension for Parquet
  • ARROW-10482 - [Python] Fix compression per column in Parquet writing
  • ARROW-10491 - [FlightRPC][Java] Fix NPE when using makeContext
  • ARROW-10493 - [C++][Parquet] Fix offset lost in MaybeReplaceValidity
  • ARROW-10495 - [Packaging][deb] Move FindRE2.cmake to libarrow-dev
  • ARROW-10496 - [R][CI] Fix conda-r job
  • ARROW-10499 - [C++][Java] Fix ORC Java JNI Crash
  • ARROW-10502 - [C++/Python] CUDA detection messes up nightly conda-win builds
  • ARROW-10503 - [C++] Uriparser will not compile using Intel compiler
  • ARROW-10508 - [Java] Allow FixedSizeListVector to have empty children
  • ARROW-10509 - [C++] Define operator<<(ostream, ParquetException) for clang+Windows
  • ARROW-10511 - [Python] Fix to_pandas() conversion in case of metadata mismatch about timezone
  • ARROW-10518 - [C++][Gandiva] Adding NativeFunction::kCanReturnErrors to cast function in gandiva
  • ARROW-10519 - [Python] Fix deadlock when importing pandas from several threads
  • ARROW-10525 - [C++] Fix crash on unsupported IPC stream
  • ARROW-10532 - [Python] Fix metadata in Table.from_pandas conversion with specified schema with different column order
  • ARROW-10545 - [C++] Fix crash on invalid Parquet file (OSS-Fuzz)
  • ARROW-10546 - [Python] Deprecate DaskFileSystem/S3FSWrapper + stop using it internally
  • ARROW-10547 - [Rust][DataFusion] Do not lose Filters with UserDefined plan nodes
  • ARROW-10551 - [Rust] Fix unreproducible benches by seeding random number generator
  • ARROW-10558 - [Python] Fix python S3 filesystem tests interdependence
  • ARROW-10560 - [Python] Fix crash when creating array from huge string
  • ARROW-10563 - [Packaging][deb][RPM] Add missing dev package dependencies
  • ARROW-10565 - [Python] Table.from_batches and Table.from_pandas have argument Schema_schema in documentation instead of schema
  • ARROW-10568 - [C++][Parquet] Avoid crashing when OutputStream::Tell fails
  • ARROW-10569 - [C++] Improve table filtering performance
  • ARROW-10577 - [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests
  • ARROW-10578 - [C++] Comparison kernels crashing for string array with null string scalar
  • ARROW-10610 - [C++] Updated vendored fast_float version to latest
  • ARROW-10616 - [Developer] Expand PR labeler to all supported languages
  • ARROW-10617 - [Python] Fix RecordBatchStreamReader iteration with Python 3.8
  • ARROW-10619 - [C++] Fix IPC validation regressions
  • ARROW-10620 - [Rust][Parquet] move column chunk range logic to metadata.rs
  • ARROW-10621 - [Java] Put required libraries into the common directory
  • ARROW-10622 - [R] Nameof should not use "void" as the crib
  • ARROW-10623 - [CI][R] Version 1.0.1 breaks data.frame attributes when reading file written by 2.0.0
  • ARROW-10624 - [R] Proactively remove "problems" attributes
  • ARROW-10627 - [Rust] Loosen cfg restrictions for wasm32
  • ARROW-10629 - [CI] Fix MinGW Github Actions jobs
  • ARROW-10631 - [Rust] Fixed error in computing equality of fixed-sized binary.
  • ARROW-10642 - [R] Can't get Table from RecordBatchReader with 0 batches
  • ARROW-10656 - [Rust] Allow schema validation to ignore field names and only check data types on new batch
  • ARROW-10656 - [Rust] Use DataType comparison without values
  • ARROW-10661 - [C#] Fix benchmarking project
  • ARROW-10662 - [Java] Avoid integer overflow for Json file reader
  • ARROW-10663 - [C++] Fix is_in and index_in behaviour
  • ARROW-10667 - [Rust][Parquet] Add a convenience type for writing Parquet to memory
  • ARROW-10668 - [R] Support for the .data pronoun
  • ARROW-10681 - [Rust] [DataFusion] TPC-H Query 12 fails with scheduler error
  • ARROW-10684 - [Rust] Inherit struct nulls in child null equality
  • ARROW-10690 - [Java] Fix ComplexCopier bug for list vector
  • ARROW-10692 - [Rust] Removed undefined behavior derived from null pointers
  • ARROW-10694 - [Python] ds.write_dataset() generates empty files for each final partition
  • ARROW-10699 - [C++] Fix BitmapUInt64Reader on big endian
  • ARROW-10701 - [Rust] Fix sort_limit_query_sql benchmark
  • ARROW-10705 - [Rust] Loosen restrictions on some lifetime annotations
  • ARROW-10710 - [Rust] Revert tokio upgrade, go back to 0.2
  • ARROW-10711 - [CI] Remove set-env from auto-tune to work with new GHA settings
  • ARROW-10719 - [C#] ArrowStreamWriter doesn't write schema metadata
  • ARROW-10746 - [C++] Bump gtest version + use GTEST_SKIP in tests
  • ARROW-10748 - [Java][JDBC] Support consuming timestamp data when time zone is not available
  • ARROW-10749 - [C++] Incorrect string format for Datum with the collection type
  • ARROW-10751 - [C++] Add RE2 to minimal build example
  • ARROW-10753 - [Rust][DataFusion] Fix parsing of negative numbers in DataFusion
  • ARROW-10757 - [Rust][CI] Fix CI failures
  • ARROW-10760 - [Rust][DataFusion] Fixed error in filter push down over joins
  • ARROW-10769 - [Rust][Rust] Use DataType comparison without values"
  • ARROW-10774 - [R] Set minimum cpp11 version
  • ARROW-10777 - [Packaging][Python] Build sdist by Crossbow
  • ARROW-10778 - [Python] Fix RowGroupInfo.statistics for empty row groups
  • ARROW-10779 - [Java] Fix writeNull method in UnionListWriter
  • ARROW-10780 - [R] Update known R installation issues for CentOS 7
  • ARROW-10791 - [Rust] StreamReader, read_dictionary duplicating schema info
  • ARROW-10801 - [Rust][Flight] Support sending FlightData for Dictionaries with that of a RecordBatch
  • ARROW-10803 - Support R >= 3.3 and add CI
  • ARROW-10804 - [Rust] Removed some unsafe code from the parquet crate
  • ARROW-10807 - [Rust][DataFusion] Avoid double hashing
  • ARROW-10810 - [Rust] Improve comparison kernels performance
  • ARROW-10811 - [R][CI] Remove nightly centos6 build
  • ARROW-10823 - [Rust] Fixed error in MutableArrayData
  • ARROW-10830 - [Rust] avoid hard crash in json reader
  • ARROW-10833 - [Python] Allow pyarrow to be compiled on NumPy <1.16.6 and work on 1.20+
  • ARROW-10834 - [R] Fix print method for SubTreeFileSystem
  • ARROW-10837 - [Rust][DataFusion] Use Vec<u8> for hash keys
  • ARROW-10840 - [C++] FileMetaData does not have key_value_metadata when built from FileMetaDataBuilder
  • ARROW-10842 - [Rust] decouple IO from json reader, fix crash during json schema inference with invalid json
  • ARROW-10844 - [Rust][DataFusion] Allow joins after a table registration
  • ARROW-10850 - [R] Unrecognized compression type: LZ4
  • ARROW-10852 - [C++] AssertTablesEqual(verbose=true) segfaults if the le…
  • ARROW-10854 - [Rust][DataFusion] Simplify logical plan scans
  • ARROW-10855 - [Python][Numpy] ArrowTypeError after upgrading NumPy to 1.20.0rc1
  • ARROW-10856 - [R] CC and CXX environment variables passing to cmake
  • ARROW-10859 - [Rust][DataFusion] Made collect not require ExecutionContext
  • ARROW-10860 - [Java] Avoid integer overflow for generated classes in Vector
  • ARROW-10863 - [Python] Fix pandas skip in ExtensionArray.to_pandas test
  • ARROW-10863 - [Python] Fix ExtensionArray.to_pandas to use underlying storage array
  • ARROW-10875 - [Rust] simplify simd cfg check with cfg_aliases
  • ARROW-10876 - [Rust] validate row value type in json reader
  • ARROW-10897 - [Rust] Removed level of indirection.
  • ARROW-10907 - [Rust] Fix Cast UTF8 to Date64
  • ARROW-10913 - [Python][Doc] Code block typo in filesystems docs
  • ARROW-10914 - [Rust] Refactor simd arithmetic kernels to use chunked iteration
  • ARROW-10915 - [Rust] README.md: set the Env vars as absolute dirs; several minor fixes.
  • ARROW-10921 - `TypeError: 'coroutine' object is not iterable` when reading parquet partitions via s3fs >= 0.5 with pyarrow
  • ARROW-10930 - [Python] Add value_field property to LargeListType / FixedSizeListType
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10932 - [C++] BinaryMemoTable::CopyOffsets access out-of-bound address when data is empty
  • ARROW-10942 - [C++] Fix S3FileSystem::Impl::IsEmptyDirectory on Amazon
  • ARROW-10943 - [Rust][Parquet] Always init new RleDecoder
  • ARROW-10954 - [C++][Doc] PlasmaClient is threadSafe now
  • ARROW-10955 - [C++] Fix JSON reading of list(null) values
  • ARROW-10960 - [C++][FlightRPC] Default to empty buffer instead of null
  • ARROW-10962 - [FlightRPC][Java] fill in empty body buffer if needed
  • ARROW-10967 - [Rust] Add functions for test data to mod arrow::util::test_util
  • ARROW-10990 - [Rust] Refactor simd comparison kernels to avoid out of bounds reads
  • ARROW-10994 - [Rust][DataFusion] Add support for compression when writing Parquet files
  • ARROW-10996 - [Rust][Parquet] change return value type of get_arrow_schema_from_metadata()
  • ARROW-10999 - [Rust][Benchmarks] Use signed ints for TPC-H schema
  • ARROW-11014 - [Rust][DataFusion] Use correct statistics for ParquetExec
  • ARROW-11023 - [C++][CMake] Fix gRPC build issue
  • ARROW-11024 - [Python] Add test for List<Struct> data Parquet roundtrip
  • ARROW-11025 - [Rust] Fixed bench for binary boolean kernels
  • ARROW-11030 - [Rust][DataFusion] Concatenate left side batches to single batch in HashJoinExec
  • ARROW-11048 - [Rust] Add bench to MutableBuffer
  • ARROW-11050 - [R] Handle RecordBatch in write_parquet()
  • ARROW-11067 - [C++] Fix CSV null detection on large values
  • ARROW-11069 - [C++] Parquet writer incorrect data being written when data type is struct
  • ARROW-11073 - [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs
  • ARROW-11083 - [CI] Ensure using Ubuntu 20.04 for dev.yml:release job
  • ARROW-11084 - [Rust] Fixed clippy
  • ARROW-11085 - [Rust] Migrated from action-rs to shell in github actions.
  • ARROW-11092 - [CI] (Temporarily) move offending workflows to separate files
  • ARROW-11102 - [Rust][DataFusion] fmt::Debug for ScalarValue(Utf8) is always quoted
  • ARROW-11113 - [Rust] support as_struct_array cast
  • ARROW-11114 - [Java] Fix Schema and Field metadata JSON serialization
  • ARROW-11132 - [CI] Use pip to install crossbow's dependencies for the comment bot
  • ARROW-11144 - [CI][C++][Python] Move to newer Hadoop version
  • ARROW-11152 - [CI][C++] Fix Homebrew numpy installation on macOS builds
  • ARROW-11162 - [C++][Parquet] Fix invalid cast on Decimal256 Parquet data
  • ARROW-11163 - [C++] Fix reading of compressed IPC/Feather files written with Arrow 0.17
  • ARROW-11166 - [Python] Add binding for ProjectOptions
  • ARROW-11171 - [Go] Fix building on s390x with noasm
  • ARROW-11189 - [Developer] support benchmark diff between JSONs
  • ARROW-11190 - [C++] Clean up compiler warnings
  • ARROW-11202 - [R][CI] Nightly builds not happening (or artifacts not exported)
  • ARROW-11224 - [R] don't test metadata serialization on old R versions
  • ARROW-11226 - [Python] Skip/workaround failing filesystem test with s3fs 0.5
  • ARROW-11227 - [Python] Fix to_pandas with ExtensionArray tests for pandas 0.24
  • ARROW-11229 - [C++][Dataset] Fix static build failure
  • ARROW-11230 - [R] Fix build failures on Windows when multiple libarrow binaries found
  • ARROW-11232 - [C++] Make Table::CombineChunks() handle table with zero column correctly
  • ARROW-11233 - [C++][Flight] Fix link error with bundled gRPC and Abseil
  • ARROW-11237 - [C++] Restore DCHECK definitions after GLog
  • ARROW-11250 - [Python] Inconsistent behavior calling ds.dataset()
  • ARROW-11251 - [CI] Make sure that devtoolset-8 is really installed + being used
  • ARROW-11253 - [R] : Make sure that large metadata tests are reproducible
  • ARROW-11255 - [Packaging][Conda][macOS] Fix Python version
  • ARROW-11257 - [C++][Parquet] PyArrow Table contains different data after writing and reloading from Parquet
  • ARROW-11271 - [Rust][Parquet] Fix parquet list schema null conversion
  • ARROW-11274 - [Packaging][wheel][Windows] Fix wheels path for Gemfury
  • ARROW-11275 - [Packaging][wheel][Linux] Fix paths for Gemfury
  • ARROW-11283 - [Julia] Update Julia install link for 3.0 release
  • ARROW-11286 - [Release][Yum] Fix minimal build example check
  • ARROW-11287 - [Packaging][RPM] Add missing dependencies
  • ARROW-11301 - [C++] Fix reading Parquet LZ4-compressed files produced by Hadoop
  • ARROW-11302 - [Release][Python] Remove verification of python 3.5 wheel on macOS
  • ARROW-11306 - [Packaging][Ubuntu][16.04] Add missing libprotobuf-dev dependency
  • ARROW-11363 - C++ Library Build Failure with gRPC 1.34+
  • ARROW-11390 - [Python] pyarrow 3.0 issues with turbodbc
  • ARROW-11445 - Type conversion failure on numpy 0.1.20
  • ARROW-11450 - [Python] pyarrow<3 incompatible with numpy>=1.20.0
  • ARROW-11487 - [Python] Can't create array from Categorical with numpy 1.20
  • ARROW-11835 - [Python] PyArrow 3.0/Pip installation errors on Big Sur.
  • ARROW-12399 - Unable to load libhdfs
  • PARQUET-1935 - [C++] Fix bug in WriteBatchSpaced

Apache Arrow 2.0.0 (2020-10-19)

Bug Fixes

  • ARROW-2367 - [Python] ListArray has trouble with sizes greater than kMaximumCapacity
  • ARROW-4189 - [Rust] Added coverage report.
  • ARROW-4917 - [C++] orc_ep fails in cpp-alpine docker
  • ARROW-5578 - [C++][Flight] Flight does not build out of the box on Alpine Linux
  • ARROW-7226 - [Python][Doc] Add note re: JSON format support
  • ARROW-7384 - [Website] Fix search indexing warning reported by Google
  • ARROW-7517 - [C++] Builder does not honour dictionary type provided during initialization
  • ARROW-7663 - [Python] Raise better error message when passing mixed-type (int/string) Pandas dataframe to pyarrow Table
  • ARROW-7903 - [Rust][DataFusion] Migrated to sqlparser 0.6.1
  • ARROW-7957 - [Python] Handle new FileSystem in ParquetDataset by automatically using new implementation
  • ARROW-8265 - [Rust] [DataFusion] Table API collect() should not require context
  • ARROW-8394 - [JS] Upgrade to TypeScript 4.0.2, fix typings for TS 3.9+
  • ARROW-8735 - [Rust][Parquet] Allow arm 32 to use soft hash implementation
  • ARROW-8749 - [C++] IpcFormatWriter writes dictionary batches with wrong ID
  • ARROW-8773 - [Python] Preserve nullability of fields in schema.empty_table()
  • ARROW-9028 - [R] Should be able to convert an empty table
  • ARROW-9096 - [Python] Pandas roundtrip with dtype="object" underlying numeric column index
  • ARROW-9177 - [C++][Parquet] Tracking issue for cross-implementation LZ4 Parquet compression compatibility
  • ARROW-9414 - [Packaging][deb][RPM] Enable S3
  • ARROW-9462 - [Go] The Indentation after the first Record in arrjson writer is incorrect
  • ARROW-9463 - [Go] Make arrjson Writer close idempotent
  • ARROW-9490 - [Python][C++] Bug in pa.array when input mixes int8 with float
  • ARROW-9495 - [C++] Equality assertions don't handle Inf / -Inf properly
  • ARROW-9520 - [Rust][DataFusion] Add support for aliased aggregate exprs
  • ARROW-9528 - [Python] Honor tzinfo when converting from datetime
  • ARROW-9532 - [Python][Doc] Use Python3_EXECUTABLE instead of PYTHON_EXECUTABLE for finding Python executable
  • ARROW-9535 - [Python] Remove symlink fixes from conda recipe
  • ARROW-9536 - [Java] Miss parameters in PlasmaOutOfMemoryException.java
  • ARROW-9541 - [C++] CMakeLists requires UTF8PROC_STATIC when building static library
  • ARROW-9544 - [R] Fix version argument of write_parquet()
  • ARROW-9546 - [Python] Clean up Pandas Metadata Conversion test
  • ARROW-9548 - [Go] Test output files are not removed correctly
  • ARROW-9549 - [Rust] Fixed version in dependency in parquet.
  • ARROW-9554 - [Java] FixedWidthInPlaceVectorSorter sometimes produces wrong result
  • ARROW-9556 - [Python][C++] Segfaults in UnionArray with null values
  • ARROW-9560 - [Packaging] Add required conda-forge.yml
  • ARROW-9569 - [CI][R] Fix rtools35 builds for msys2 key change
  • ARROW-9570 - [Doc] Clean up sphinx sidebar
  • ARROW-9573 - [Python][Dataset] Provide read_table(ignore_prefixes=)
  • ARROW-9574 - [R] Cleanups for CRAN 1.0.0 release
  • ARROW-9575 - [R] gcc-UBSAN failure on CRAN
  • ARROW-9577 - [C++] Ignore EBADF error in posix_madvise()
  • ARROW-9583 - [Rust] Fix offsets in result of arithmetic kernels
  • ARROW-9588 - [C++] Partially support building with clang in an MSVC setting
  • ARROW-9589 - [C++/R] Forward declare structs as structs
  • ARROW-9592 - [CI] Update homebrew before calling brew bundle
  • ARROW-9596 - [CI][Crossbow] Fix homebrew-cpp again, again
  • ARROW-9597 - [C++] AddAlias in compute::FunctionRegistry should be synchronized
  • ARROW-9598 - [C++][Parquet] Fix writing nullable structs
  • ARROW-9599 - [CI] Appveyor toolchain build fails because CMake detects different C and C++ compilers
  • ARROW-9600 - [Rust] pin proc macro
  • ARROW-9600 - [Rust][Arrow] pin older version of proc-macro2 during build
  • ARROW-9602 - [R] Improve cmake detection in Linux build
  • ARROW-9603 - [C++] Fix parquet write to not assume leaf-array validity bitmaps have the same values as parent structs
  • ARROW-9606 - [C++][Dataset] Support "a"_.In(<>).Assume(<compound>)
  • ARROW-9609 - [C++][Dataset] CsvFileFormat reads all virtual columns as null
  • ARROW-9621 - [Python] Skip test_move_file for in-memory fsspec filesystem
  • ARROW-9622 - [Java] Fixed UnsupportedOperationException in complexcopier with null value in unionvector inside st…
  • ARROW-9628 - [Rust] Disable artifact caching for Mac OSX builds
  • ARROW-9629 - [Python] Fix kartothek integration tests by fixing dependencies
  • ARROW-9631 - [Rust] Make arrow not depend on flight
  • ARROW-9631 - [Rust] flight should depend on arrow, not the other way around
  • ARROW-9642 - [C++] Let MakeBuilder refer DictionaryType's index_type for deciding the starting bit width of the indices
  • ARROW-9643 - [C++] Only register the SIMD variants when it's supported.
  • ARROW-9644 - [C++][Dataset] Don't apply ignore_prefixes to partition base_dir
  • ARROW-9652 - [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs
  • ARROW-9653 - [Rust][DataFusion] Do not error in planner with SQL has multiple group by expressions
  • ARROW-9659 - [C++] Fix RecordBatchStreamReader when source is CudaBufferReader
  • ARROW-9660 - [C++] Revamp dictionary association in IPC
  • ARROW-9666 - [Python][wheel][Windows] Fix wheel build for Windows
  • ARROW-9670 - [C++][FlightRPC] don't hang if Close and Read called simultaneously
  • ARROW-9676 - [R] Error converting Table with nested structs
  • ARROW-9684 - [C++] Fix undefined behaviour on invalid IPC / Parquet input
  • ARROW-9692 - [Python] Fix distutils-related warning
  • ARROW-9693 - [CI][Docs] Nightly docs build fails
  • ARROW-9696 - [Rust][DataFusion] fix nested binary expressions
  • ARROW-9698 - [C++] Remove -DNDEBUG flag leak in .pc file
  • ARROW-9700 - [Python] fix create_library_symlinks for macos
  • ARROW-9712 - [Rust][DataFusion] Fix parquet error handling and general code improvements
  • ARROW-9714 - [Rust][DataFusion] Implement type coercion rule for limit and sort
  • ARROW-9716 - [Rust][DataFusion] Implement limit on concurrent threads in MergeExec
  • ARROW-9726 - [Rust][DataFusion] Do not create parquet reader thread until execute is called
  • ARROW-9727 - [C++] Fix crashes on invalid IPC input (OSS-Fuzz)
  • ARROW-9729 - [Java] Disable Error Prone when project is imported into …
  • ARROW-9733 - [Rust][DataFusion] Added support for COUNT/MIN/MAX on string columns
  • ARROW-9734 - [Rust][DataFusion] TableProvider.scan now returns partitions instead of iterators
  • ARROW-9741 - [Rust] [DataFusion] Incorrect count in TPC-H query 1 result set
  • ARROW-9743 - [R] Sanitize paths in open_dataset
  • ARROW-9744 - [Python] Fix build failure on aarch64
  • ARROW-9764 - [CI][Java] Fix wrong image name for push
  • ARROW-9768 - [Python] Check overflow in conversion of datetime objects to nanosecond timestamps
  • ARROW-9768 - [Rust][DataFusion] Rename PhysicalPlannerImpl to DefaultPhysicalPlanner
  • ARROW-9778 - [Rust][DataFusion] Implement Expr.nullable() and make consistent between logical and physical plans
  • ARROW-9783 - [Rust][DataFusion] Remove aggregate expression data type
  • ARROW-9785 - [Python] Fix excessively slow S3 options test
  • ARROW-9789 - [C++] Don't install jemalloc in parallel
  • ARROW-9790 - [Rust][Parquet] : Increase test coverage in arrow_reader.rs
  • ARROW-9790 - [Rust][Parquet] Fix PrimitiveArrayReader boundary conditions
  • ARROW-9793 - [Rust][DataFusion] Fixed unit tests
  • ARROW-9797 - [Rust] AMD64 Conda Integration Tests is failing for the Master branch
  • ARROW-9799 - [Rust] [DataFusion] Implementation of physical binary expression get_type method is incorrect
  • ARROW-9800 - [Rust][Parquet] Remove println! when writing column statistics
  • ARROW-9801 - DictionaryArray with non-unique values are silently corrupted when written to a Parquet file
  • ARROW-9809 - [Rust][DataFusion] Fixed type coercion, supertypes and type checking.
  • ARROW-9814 - [Python] Fix crash in test_parquet::test_read_partitioned_directory_s3fs
  • ARROW-9815 - [Rust][DataFusion] Remove the use of Arc/Mutex to protect plan time structures
  • ARROW-9815 - [Rust][DataFusion] Add a trait for looking up scalar functions by name
  • ARROW-9815 - [Rust][DataFusion] Fixed deadlock caused by accessing the scalar functions' registry.
  • ARROW-9816 - [C++] Escape quotes in config.h
  • ARROW-9827 - [C++][Dataset] Skip parsing RowGroup metadata statistics when there is no filter
  • ARROW-9831 - [Rust][DataFusion] Fixed compilation error
  • ARROW-9840 - [Python] fs documentation out of date with code (FileStats -> FileInfo)
  • ARROW-9846 - [Rust] Master branch broken build
  • ARROW-9851 - [C++] Disable AVX512 runtime paths with Valgrind
  • ARROW-9852 - [C++] Add more IPC fuzz regression files
  • ARROW-9852 - [C++] Validate dictionaries fully when combining deltas
  • ARROW-9855 - [R] Fix bad merge/Rcpp conflict
  • ARROW-9859 - [C++] Decode username and password in URIs
  • ARROW-9864 - [Python] Support pathlib.path in pq.write_to_dataset
  • ARROW-9874 - [C++] Add sink-owning version of IPC writers
  • ARROW-9876 - [C++] Faster ARM build on Travis-CI
  • ARROW-9877 - [C++] Fix homebrew-cpp build fail on AVX512
  • ARROW-9879 - [Python] Add support for numpy scalars to ChunkedArray.getitem
  • ARROW-9882 - [C++/Python] Update OSX build to conda-forge-ci-setup=3
  • ARROW-9883 - [R] Fix linuxlibs.R install script for R < 3.6
  • ARROW-9888 - [Rust][DataFusion] Allow ExecutionContext to be shared between threads (again)
  • ARROW-9889 - [Rust][DataFusion] Implement physical plan for EmptyRelation
  • ARROW-9906 - [C++] Keep S3 filesystem alive through open file objects
  • ARROW-9913 - [C++] Make outputs of Decimal128::FromString independent of the presence of one another.
  • ARROW-9920 - [Python] Validate input to pa.concat_arrays() to avoid segfault
  • ARROW-9922 - [Rust] Add StructArray::TryFrom (+40%)
  • ARROW-9924 - [C++][Dataset] Enable per-column parallelism for single ParquetFileFragment scans
  • ARROW-9931 - [C++] Fix undefined behaviour on invalid IPC input
  • ARROW-9932 - [R] Arrow 1.0.1 R package fails to install on R3.4 over linux
  • ARROW-9936 - [Python] Fix / test relative file paths in pyarrow.parquet
  • ARROW-9937 - [Rust][DataFusion] Improved aggregations
  • ARROW-9943 - [C++] Recursively apply Arrow metadata when reading from Parquet
  • ARROW-9946 - [R] Check sink argument class in ParquetFileWriter
  • ARROW-9953 - [R] Declare minimum version for bit64
  • ARROW-9962 - [Python] Fix conversion to_pandas with tz-aware index column and fixed offset timezones
  • ARROW-9968 - [C++] Fix UBSAN build
  • ARROW-9969 - [C++] Fix RecordBatchBuilder with dictionary types
  • ARROW-9970 - [Go] fix checkptr failure in sum methods
  • ARROW-9972 - [CI] Work around grpc-re2 clash on Homebrew
  • ARROW-9973 - [Java] JDBC DateConsumer does not allow dates before epoch
  • ARROW-9976 - [Python] ArrowCapacityError when doing Table.from_pandas with large dataframe
  • ARROW-9990 - [Rust][DataFusion] Fixed the NOT operator
  • ARROW-9993 - [Python] Tzinfo - string roundtrip fails on pytz.StaticTzInfo objects
  • ARROW-9994 - [C++][Python] Auto chunking nested array containing binary-like fields result malformed output
  • ARROW-9996 - [C++] Dictionary is unset when calling DictionaryArray.GetScalar for null values
  • ARROW-10003 - [C++] Create parent dir for any destination fs in CopyFiles
  • ARROW-10008 - [C++][Dataset] Fix filtering/row group statistics of dict columns
  • ARROW-10011 - [C++] Make FindRE2.cmake re-entrant
  • ARROW-10012 - [C++] Make MockFileSystem thread-safe
  • ARROW-10013 - [FlightRPC][C++] fix setting generic client options
  • ARROW-10017 - [Java] Fix LargeMemoryUtil long conversion
  • ARROW-10022 - [C++] Fix divide by zero and overflow error for scalar arithmetic benchmark
  • ARROW-10027 - [C++] Fix Take array kernel for NullType
  • ARROW-10034 - [Rust] Fix Rust build on master
  • ARROW-10041 - [Rust] Added check of data type to GenericString::from.
  • ARROW-10047 - [CI] Conda integration tests failing with cmake error
  • ARROW-10048 - [Rust] Fixed error in computing min/max with null entries.
  • ARROW-10049 - [C++/Python] Sync conda recipe with conda-forge
  • ARROW-10060 - [Rust][DataFusion] Fixed error on which Err were discarded in MergeExec.
  • ARROW-10062 - [Rust] Fix for null elems at key position in dictionary arrays
  • ARROW-10073 - [Python] Don't rely on dict item order in test_parquet_nested_storage
  • ARROW-10081 - [C++/Python] Fix bash syntax in drone.io conda builds
  • ARROW-10085 - [C++] Fix S3 region resolution on Windows
  • ARROW-10087 - [CI] Fix nightly docs job
  • ARROW-10098 - [R][Doc] Fix copy_files doc mismatch
  • ARROW-10104 - [Python] Separate tests into its own conda package
  • ARROW-10114 - [R] Segfault in to_dataframe_parallel with deeply nested structs
  • ARROW-10116 - [Python][Packaging] Fix gRPC linking error in macOS wheels builds
  • ARROW-10119 - [C++] Fix Parquet crashes on invalid input
  • ARROW-10121 - [C++] Fix emission of new dictionaries in IPC writer
  • ARROW-10124 - [C++] Don't restrict permissions when creating files
  • ARROW-10125 - [R] Int64 downcast check doesn't consider all chunks
  • ARROW-10130 - [C++][Dataset] Ensure ParquetFileFragment::SplitByRowGroup preserves the 'has_complete_metadata' status
  • ARROW-10136 - [Rust] : Fix null handling in StringArray and BinaryArray filtering, add BinaryArray::from_opt_vec
  • ARROW-10137 - [C++][R] Move nameof.h into R subproject
  • ARROW-10147 - [Python] Pandas metadata fails if index name not JSON-serializable
  • ARROW-10150 - [C++] Fix crashes on invalid Parquet file
  • ARROW-10169 - [Rust] Pretty print null PrimitiveTypes as empty strings
  • ARROW-10175 - [CI] Fix nightly HDFS integration tests (ensure to use legacy dataset)
  • ARROW-10176 - [C++] Avoid using unformattable types for test parameters
  • ARROW-10178 - [CI] Remove patch to fix Spark master build
  • ARROW-10179 - [Rust] Fixed error in labeler
  • ARROW-10181 - [Rust] Skip compiling one test on 32 bit ARM architecture
  • ARROW-10188 - [Rust][DataFusion] Fixed DataFusion examples.
  • ARROW-10189 - [Doc] Fixed typo in C-Data interface example
  • ARROW-10192 - [Python] Always decode inner dictionaries when converting array to Pandas
  • ARROW-10193 - [Python] Segfault when converting to fixed size binary array
  • ARROW-10200 - [CI][Java] Fix a job failure for s390x Java on TravisCI
  • ARROW-10204 - [Rust] Filter kernel should only count bits in valid range
  • ARROW-10214 - [Python] Allow printing undecodable schema metadata
  • ARROW-10226 - [Rust] [Parquet] Parquet reader reading wrong columns in some batches within a parquet file
  • ARROW-10230 - [JS][Doc] JavaScript documentation fails to build
  • ARROW-10232 - FixedSizeListArray is incorrectly written/read to/from parquet
  • ARROW-10234 - [C++][Gandiva] Fix logic of round() for floats/decimals in Gandiva
  • ARROW-10237 - [C++] Duplicate dict values cause corrupt parquet
  • ARROW-10238 - [C#] List<Struct> is broken
  • ARROW-10239 - [C++] Add missing zlib dependency to aws-sdk-cpp
  • ARROW-10244 - [Python] Document pyarrow.dataset.parquet_dataset
  • ARROW-10248 - [Python][Dataset] Always apply Python's default write properties
  • ARROW-10262 - [C++] Fix TypeClass for BinaryScalar and LargeBinaryScalar
  • ARROW-10271 - [Rust] Update dependencies
  • ARROW-10279 - [Release][Python] Fix verification script to align with the new macos wheel platform tags
  • ARROW-10280 - [Packaging][Python] Fix macOS wheel artifact patterns
  • ARROW-10281 - [Python] Fix warnings when running tests
  • ARROW-10284 - [Python] Correctly suppress warning about legacy filesystem on import
  • ARROW-10285 - [Python] Fix usage of deprecated num_children in pyarrow.orc submodule
  • ARROW-10286 - [C++][FlightRPC] Make CMake output less confusing
  • ARROW-10288 - [C++] Fix compilation errors on 32-bit x86
  • ARROW-10290 - [C++] List POP_BACK is not available in older CMake versions
  • ARROW-10296 - [R] Data saved as integer64 loaded as integer
  • ARROW-10517 - [Python] Unable to read/write Parquet datasets with fsspec on Azure Blob
  • ARROW-11062 - [Java] When writing to flight stream, Spark's mapPartitions is not working

New Features and Improvements

  • ARROW-983 - [C++] Implement InputStream and OutputStream classes for interacting with socket connections
  • ARROW-1509 - [Python] Write serialized object as a stream of encapsulated IPC messages
  • ARROW-1644 - [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels
  • ARROW-1669 - [C++] Consider adding Abseil (Google C++11 standard library extensions) to toolchain
  • ARROW-1797 - [C++] Implement binary arithmetic kernels for numeric arrays
  • ARROW-2164 - [C++] Clean up unnecessary decimal module refs
  • ARROW-3080 - [Python] Unify Arrow to Python object conversion paths
  • ARROW-3757 - [R] R bindings for Flight RPC client
  • ARROW-3850 - [Python] Support MapType and StructType for enhanced PySpark integration
  • ARROW-3872 - [R] Add ad hoc test of feather compatibility
  • ARROW-4046 - [Python/CI] Exercise large memory tests
  • ARROW-4248 - [C++][Plasma] Build on Windows / Visual Studio
  • ARROW-4685 - [C++] Update Boost to 1.69 in manylinux1 docker image
  • ARROW-4927 - [Rust] Update top level README to describe current functionality
  • ARROW-4957 - [Rust] [DataFusion] Implement get_supertype correctly
  • ARROW-4965 - [Python] Timestamp array type detection should use tzname of datetime.datetime objects
  • ARROW-5034 - [C#] ArrowStreamWriter and ArrowFileWriter implement sync WriteRecordBatch
  • ARROW-5123 - [Rust] Parquet derive for simple structs
  • ARROW-6075 - [FlightRPC] Handle uncaught exceptions in middleware
  • ARROW-6281 - [Python] Produce chunked arrays for nested types in pyarrow.array
  • ARROW-6282 - [Format] Support lossy compression
  • ARROW-6437 - [R] Add AWS SDK to system dependencies for macOS and Windows
  • ARROW-6535 - [C++] Status::WithMessage should accept variadic parameters
  • ARROW-6537 - [R] : Pass column_types to CSV reader
  • ARROW-6972 - [C#] Support for StructArrays
  • ARROW-6982 - [R] Add bindings for compare and boolean kernels
  • ARROW-7136 - [Rust] Added caching to the docker image
  • ARROW-7218 - [Python] Conversion from boolean numpy scalars not working
  • ARROW-7302 - [C++] CSV: allow dictionary types in explicit column types
  • ARROW-7372 - [C++] Allow creating dictionary array from simple JSON
  • ARROW-7871 - [Python] Expose more compute kernels
  • ARROW-7960 - [C++] Add support fo reading additional types
  • ARROW-8001 - [R][Dataset] Bindings for dataset writing
  • ARROW-8002 - [C++][Dataset][R] Support partitioned dataset writing
  • ARROW-8048 - [Python] Run memory leak tests nightly as follow up to ARROW-4120
  • ARROW-8172 - [C++] ArrayFromJSON for dictionary arrays
  • ARROW-8205 - [Rust][DataFusion] Added check to uniqueness of column names.
  • ARROW-8253 - [Rust] [DataFusion] Improve ergonomics of registering UDFs
  • ARROW-8262 - [Rust] [DataFusion] Add example that uses LogicalPlanBuilder
  • ARROW-8296 - [C++][Dataset] Add IpcFileWriteOptions
  • ARROW-8355 - [Python] Remove hard pandas dependency from FeatherDataset and minimize pandas dependency in test_feather.py
  • ARROW-8359 - [C++/Python] Enable linux-aarch64 builds
  • ARROW-8383 - [Rust] Allow easier access to keys array of a dictionary array
  • ARROW-8402 - [Java] Support ValidateFull methods in Java
  • ARROW-8493 - [C++][Parquet] Start populating repeated ancestor defintion
  • ARROW-8494 - [C++][Parquet] Full support for reading mixed list and structs
  • ARROW-8581 - [C#] Accept and return DateTime from DateXXArray
  • ARROW-8601 - [Go][FOLLOWUP] Fix RAT violations related to Flight in Go
  • ARROW-8601 - [Go][Flight] Implementations Flight RPC server and client
  • ARROW-8618 - [C++] Clean up some redundant std::move()s
  • ARROW-8678 - [C++/Python][Parquet] Remove old writer code path
  • ARROW-8712 - [R] Expose strptime timestamp parsing in read_csv conversion options
  • ARROW-8774 - [Rust] [DataFusion] Improve threading model
  • ARROW-8810 - [R] Add documentation about Parquet format, appending to stream format
  • ARROW-8824 - [Rust] [DataFusion] Implement new SQL parser
  • ARROW-8828 - [Rust] Implement SQL tokenizer
  • ARROW-8829 - [Rust] Implement SQL parser
  • ARROW-9010 - [Java] Framework and interface changes for RecordBatch IPC buffer compression
  • ARROW-9065 - [C++] Support parsing date32 in dataset partition folders
  • ARROW-9068 - [C++][Dataset] Simplify partitioning interface
  • ARROW-9078 - [C++] Parquet read / write extension type with nested storage type
  • ARROW-9104 - [C++] Parquet encryption tests should write files to a temporary directory instead of the testing submodule's directory
  • ARROW-9107 - [C++][Dataset] Support temporal partitioning fields
  • ARROW-9147 - [C++][Dataset] Support projection from null->any type
  • ARROW-9205 - [Documentation] Fix typos
  • ARROW-9266 - [Python][Packaging] Enable S3 support in macOS wheels
  • ARROW-9271 - [R] Preserve data frame metadata in round trip
  • ARROW-9286 - [C++] Add function "aliases" to compute::FunctionRegistry
  • ARROW-9328 - [C++][Gandiva] Add LTRIM, RTRIM, BTRIM functions for string
  • ARROW-9338 - [Rust] Add clippy instructions
  • ARROW-9344 - [C++][Flight] Measure latency quantiles
  • ARROW-9358 - [Integration] remove generated_large_batch.json
  • ARROW-9371 - [Java] Run vector tests for both allocators
  • ARROW-9377 - [Java] Support unsigned dictionary indices
  • ARROW-9387 - [R] Use new C++ table select method
  • ARROW-9388 - [C++] Division kernels
  • ARROW-9394 - [Python] Support pickling of Scalars
  • ARROW-9398 - [C++] Register SIMD sum variants to function instance.
  • ARROW-9402 - [C++] Rework portable wrappers for checked integer arithmetic
  • ARROW-9405 - [R] Switch to cpp11
  • ARROW-9412 - [C++] Add non-bundled dependencies to INTERFACE_LINK_LIBRARIES of static libarrow
  • ARROW-9429 - [Python] ChunkedArray.to_numpy
  • ARROW-9454 - [GLib] Add binding of some dictionary builders
  • ARROW-9465 - [Python] Improve ergonomics of compute module
  • ARROW-9469 - [Python] Make more objects weakrefable
  • ARROW-9487 - [Developer] Cover the archery release utilities with unittests
  • ARROW-9488 - [Release] Use the new changelog generation when updating the website
  • ARROW-9507 - [Rust][DataFusion] Implement Display for PhysicalExpr
  • ARROW-9508 - [Release][APT][Yum] Enable verification for arm64 binaries
  • ARROW-9516 - [Rust][DataFusion] refactor of column names
  • ARROW-9517 - [C++/Python] Add support for temporary credentials to S3Options
  • ARROW-9518 - [Python] Deprecate pyarrow serialization
  • ARROW-9521 - [Rust][DataFusion] Handle custom CSV file extensions
  • ARROW-9523 - [Rust] Improve filter kernel performance
  • ARROW-9534 - [Rust][DataFusion] Added support for lit to all supported rust types.
  • ARROW-9550 - [Rust] [DataFusion] Remove Rc<RefCell<_>> from hash aggregate operator
  • ARROW-9553 - [Rust] Release script doesn't bump parquet crate's arrow dependency version
  • ARROW-9557 - [R] Iterating over parquet columns is slow in R
  • ARROW-9559 - [Rust][DataFusion] Made function public
  • ARROW-9563 - [Dev][Release] Use archery's changelog generator when creating release notes for the website
  • ARROW-9568 - [CI][C++] Use msys2/setup-msys2
  • ARROW-9576 - [Python][Doc] Fix error in example code for extension types
  • ARROW-9580 - [JS][Doc] Fix syntax error in example code
  • ARROW-9581 - [Dev][Release] Bump next snapshot versions to 2.0.0
  • ARROW-9582 - [Rust] Implement memory size methods
  • ARROW-9585 - [Rust][DataFusion] Remove duplicated to-do line
  • ARROW-9587 - [FlightRPC][Java] clean up FlightStream/DoPut
  • ARROW-9593 - [Python] Add custom pickle reducers for DictionaryScalar
  • ARROW-9604 - [C++] Add aggregate min/max benchmark
  • ARROW-9605 - [C++] Speed up aggregate min/max compute kernels on integer types
  • ARROW-9607 - [C++][Gandiva] Add bitwise_and(), bitwise_or() and bitwise_not() functions for integers
  • ARROW-9608 - [Rust] Remove arrow flight from parquet's feature gating
  • ARROW-9615 - [Rust] Added kernel to compute length of a string.
  • ARROW-9617 - [Rust][DataFusion] Add length of string array
  • ARROW-9618 - [Rust][DataFusion] Made it easier to write optimizers
  • ARROW-9619 - [Rust][DataFusion] Add predicate push-down
  • ARROW-9632 - [Rust] add a func "new" for ExecutionContextSchemaProvider
  • ARROW-9638 - [C++][Compute] Implement mode kernel
  • ARROW-9639 - [Ruby] Add dependency version check
  • ARROW-9640 - [C++][Gandiva] Implement round() for integers and long integers
  • ARROW-9641 - [C++][Gandiva] Implement round() for floating point and double floating point numbers
  • ARROW-9645 - [Python] Deprecate pyarrow.filesystem in favor of pyarrow.fs
  • ARROW-9646 - [C++][Dataset] Support writing with ParquetFileFormat
  • ARROW-9650 - [Packaging][APT] Drop support for Ubuntu 19.10
  • ARROW-9654 - [Rust][DataFusion] Add EXPLAIN <SQL> statement
  • ARROW-9656 - [Rust][DataFusion] Better error messages for unsupported EXTERNAL TABLE types
  • ARROW-9658 - [Python] Python bindings for dataset writing
  • ARROW-9665 - [R] head/tail/take for Datasets
  • ARROW-9667 - [CI][Crossbow] Segfault in 2 nightly R builds
  • ARROW-9671 - [C++] Fix a bug in BasicDecimal128 constructor that interprets uint64_t integers with highest bit set as negative.
  • ARROW-9673 - [Rust][DataFusion] Add a param "dialect" for DFParser::parse_sql
  • ARROW-9678 - [Rust][DataFusion] Improve projection push down to remove unused columns
  • ARROW-9679 - [Rust][DataFusion] More efficient creation of final batch from HashAggregateExec
  • ARROW-9681 - [Java] Fix test failures of Arrow Memory - Core on big-endian platform
  • ARROW-9683 - [Rust][DataFusion] Add debug printing to physical plans and associated types
  • ARROW-9691 - [Rust][DataFusion] Make sql_statement_to_plan method public
  • ARROW-9695 - [Rust] Improve comments on LogicalPlan enum variants
  • ARROW-9699 - [C++][Compute] Optimize mode kernel for small integer types
  • ARROW-9701 - [CI][Java] Add a job for s390x Java on TravisCI
  • ARROW-9702 - [C++] Register bpacking SIMD to runtime path.
  • ARROW-9703 - [Developer][Archery] Restartable cherry-picking process for creating maintenance branches
  • ARROW-9706 - [Java] Tests of TestLargeListVector correctly read offset
  • ARROW-9710 - [C++] Improve performance of Decimal128::ToString by 10x, and make the implementation reusable for Decimal256.
  • ARROW-9711 - [Rust] Add new benchmark derived from TPC-H
  • ARROW-9713 - [Rust][DataFusion] Remove explicit panics
  • ARROW-9715 - [R] changelog/doc updates for 1.0.1
  • ARROW-9718 - [Python] ParquetWriter to work with new FileSystem API
  • ARROW-9721 - [Packaging][Python] Update wheel dependency files
  • ARROW-9722 - [Rust] Shorten key lifetime for dict lookup key
  • ARROW-9723 - [C++][Compute] Count NaN in mode kernel
  • ARROW-9725 - [Rust][DataFusion] SortExec and LimitExec re-use MergeExec
  • ARROW-9737 - [C++][Gandiva] Add bitwise_xor() for integers
  • ARROW-9739 - [CI][Ruby] Don't install gem documents
  • ARROW-9742 - [Rust][DataFusion] Improved DataFrame trait (formerly known as the Table trait)
  • ARROW-9751 - [Rust][DataFusion] Allow UDFs to accept multiple data types per argument
  • ARROW-9752 - [Rust][DataFusion] Add support for User-Defined Aggregate Functions.
  • ARROW-9753 - [Rust][DataFusion] Replaced Arc<Mutex<>> by Box<>
  • ARROW-9754 - [Rust][DataFusion] Implement async in ExecutionPlan trait
  • ARROW-9757 - [Rust][DataFusion] Add prelude.rs
  • ARROW-9758 - [Rust][DataFusion] Allow physical planner to be replaced
  • ARROW-9759 - [Rust][DataFusion] Implement DataFrame.sort()
  • ARROW-9760 - [Rust][DataFusion] Added DataFrame::explain
  • ARROW-9761 - [C/C++] Add experimental C stream inferface
  • ARROW-9762 - [Rust][DataFusion] ExecutionContext::sql now returns DataFrame
  • ARROW-9769 - [Python] Un-skip tests with fsspec in-memory filesystems
  • ARROW-9775 - [C++] Automatic S3 region selection
  • ARROW-9781 - [C++] Fix valgrind uninitialized value warnings
  • ARROW-9782 - [C++][Dataset] More configurable Dataset writing
  • ARROW-9784 - [Rust][DataFusion] Make running TPCH benchmark repeatable
  • ARROW-9786 - [R] Unvendor cpp11 before release
  • ARROW-9788 - [Rust][DataFusion] Rename SelectionExec to FilterExec
  • ARROW-9792 - [Rust][DataFusion] Aggregate expression functions should not return result
  • ARROW-9794 - [C++] Add IsVendor API for CpuInfo
  • ARROW-9795 - [C++][Gandiva] Implement castTIMESTAMP(int64) in Gandiva
  • ARROW-9806 - [R] More compute kernel bindings
  • ARROW-9807 - [R] News update/version bump post-1.0.1
  • ARROW-9808 - [Python] Update read_table doc string
  • ARROW-9811 - [C++] Unchecked floating point division by 0 should succeed
  • ARROW-9813 - [C++] Disable semantic interposition
  • ARROW-9819 - [C++] Bump mimalloc to 1.6.4
  • ARROW-9821 - [Rust][DataFusion] Support for User Defined ExtensionNodes in the LogicalPlan
  • ARROW-9821 - [Rust][DataFusion] Make crate::logical_plan and crate::physical_plan modules
  • ARROW-9823 - [CI][C++][MinGW] Enable S3
  • ARROW-9832 - [Rust] [DataFusion] Refactor PhysicalPlan to remove Partition
  • ARROW-9833 - [Rust][DataFusion] TableProvider.scan now returns ExecutionPlan
  • ARROW-9834 - [Rust] [DataFusion] Remove Partition trait
  • ARROW-9835 - [Rust][DataFusion] Removed FunctionMeta and FunctionType
  • ARROW-9836 - [Rust][DataFusion] Improve API for usage of UDFs
  • ARROW-9837 - [Rust][DataFusion] Added provider for variable
  • ARROW-9838 - [Rust] [DataFusion] DefaultPhysicalPlanner should insert explicit MergeExec nodes
  • ARROW-9839 - [Rust][DataFusion] Implement ExecutionPlan.as_any
  • ARROW-9841 - [Rust] Update checked-in fbs files
  • ARROW-9844 - [CI] Add Go build job on s390x
  • ARROW-9845 - [Rust][Parquet] Move serde_json dependency to dev-dependencies as it is only used in tests
  • ARROW-9848 - [Rust] Implement 0.15 IPC alignment
  • ARROW-9849 - [Rust][DataFusion] Simplified argument types of ScalarFunctions.
  • ARROW-9850 - [Go] Defer should not be used inside a loop
  • ARROW-9853 - [RUST] Implement take kernel for dictionary arrays
  • ARROW-9854 - [R] Support reading/writing data to/from S3
  • ARROW-9858 - [Python][Docs] Add user guide for filesystems interface
  • ARROW-9863 - [C++][Parquet] Compile regexes only once
  • ARROW-9867 - [C++][Dataset] Add FileSystemDataset::filesystem property
  • ARROW-9868 - [C++][R] Provide CopyFiles for copying files between FileSystems
  • ARROW-9869 - [R] Implement full S3FileSystem/S3Options constructor
  • ARROW-9870 - [R] Friendly interface for filesystems (S3)
  • ARROW-9871 - [C++] Add uppercase to ARROW_USER_SIMD_LEVEL
  • ARROW-9873 - [C++][Compute] Optimize mode kernel for integers in small value range
  • ARROW-9875 - [Python] Let FileSystem.get_file_info accept a single path
  • ARROW-9884 - [R] Bindings for writing datasets to Parquet
  • ARROW-9885 - [Rust][DataFusion] Minor code simplification
  • ARROW-9886 - [Rust][DataFusion] Parameterized testing of physical cast.
  • ARROW-9887 - [Rust][DataFusion] Added support for complex return types for built-in functions
  • ARROW-9890 - [R] Add zstandard compression codec in macOS build
  • ARROW-9891 - [Rust][DataFusion] Made math functions accept f32.
  • ARROW-9892 - [Rust][DataFusion] Added concat of utf8
  • ARROW-9893 - [Python] Support parquet options in dataset writing
  • ARROW-9895 - [Rust] Improve sorting kernels
  • ARROW-9899 - [Rust][DataFusion] Switch from Box<Schema> --> SchemaRef (Arc<Schema>) to be consistent with the rest of Arrow
  • ARROW-9900 - [Rust][DataFusion] Switch from Box -> Arc in LogicalPlanNode
  • ARROW-9901 - [C++] Add hand-crafted Parquet to Arrow reconstruction tests
  • ARROW-9902 - [Rust][DataFusion] Add array() built-in function
  • ARROW-9904 - [C++] Unroll the loop of CountSetBits.
  • ARROW-9908 - [Rust] Add support for temporal types in JSON reader
  • ARROW-9910 - [Rust][DataFusion] Fixed error in type coercion of Variadic.
  • ARROW-9914 - [Rust][DataFusion] Document SQL Type --> Arrow type mapping
  • ARROW-9916 - [RUST] Avoid cloning array data
  • ARROW-9917 - [Python][Compute] Bindings for mode kernel
  • ARROW-9919 - [Rust][DataFusion] Speedup math operations by 15%+
  • ARROW-9921 - [Rust] Replace TryFrom by From in StringArray from Vec<Option<&str>> (+50%)
  • ARROW-9925 - [GLib] Add low level value readers for GArrowListArray family
  • ARROW-9926 - [GLib] Use placement new for GArrowRecordBatchFileReader
  • ARROW-9928 - [C++] Speed up integer parsing slightly
  • ARROW-9929 - [Dev] Autotune cmake-format
  • ARROW-9933 - [Developer] Add drone as a CI provider for crossbow
  • ARROW-9934 - [Rust] Shape and stride check in tensor
  • ARROW-9941 - [Python] Better string representation for extension types
  • ARROW-9944 - [Rust][DataFusion] Implement to_timestamp function
  • ARROW-9949 - [C++] Improve performance of Decimal128::FromString by 46%, and make the implementation reusable for Decimal256.
  • ARROW-9950 - [Rust][DataFusion] Made UDFs usable without a registry
  • ARROW-9952 - [Python] Optionally use pyarrow.dataset in parquet.write_to_dataset
  • ARROW-9954 - [Rust][DataFusion] Made aggregates support the same signatures as functions.
  • ARROW-9956 - [C++][Gandiva] Implementation of binary_string function in gandiva
  • ARROW-9957 - [Rust] Replace tempdir with tempfile
  • ARROW-9961 - [Rust][DataFusion] Make to_timestamp function parses timestamp without timezone offset as local
  • ARROW-9964 - [C++] Allow reading date types from CSV data
  • ARROW-9965 - [Java] Improve performance of BaseFixedWidthVector.setSafe by optimizing capacity calculations
  • ARROW-9966 - [Rust] Speedup kernels for sum,min,max by 10%-60%
  • ARROW-9967 - [Python] Add compute module docs + expose more option classes
  • ARROW-9971 - [Rust] Improve speed of take by 2x-3x (change scaling with batch size)
  • ARROW-9977 - [Rust][Large] StringArray
  • ARROW-9979 - [Rust] Fix arrow crate clippy lints
  • ARROW-9980 - [Rust][Parquet] Fix clippy lints
  • ARROW-9981 - [Rust][Flight] Expose IpcWriteOptions on utils
  • ARROW-9983 - [C++][Dataset][Python] Use larger default batch size than 32K for Datasets API
  • ARROW-9984 - [Rust][DataFusion] Minor cleanup DRY
  • ARROW-9986 - [Rust] allow to_timestamp to parse local times without fractional seconds
  • ARROW-9987 - [Rust][DataFusion] Improved docs for Expr
  • ARROW-9988 - [Rust][DataFusion] Added +-/* as operators to logical expressions.
  • ARROW-9992 - [C++][Python] Refactor python to arrow conversions based on a reusable conversion API
  • ARROW-9998 - [Python] Support pickling DictionaryScalar
  • ARROW-9999 - [Python] Support constructing dictionary array directly through pa.array()
  • ARROW-10000 - [C++][Python] Support constructing StructArray from list of key-value pairs
  • ARROW-10001 - [Rust][DataFusion] Added developer guide to README.
  • ARROW-10010 - [Rust] Speedup arithmetic (1.3-1.9x)
  • ARROW-10015 - [Rust] Simd aggregate kernels
  • ARROW-10016 - [Rust] Implement is null / is not null kernels
  • ARROW-10018 - [CI] Disable Sphinx and API documentation build on master
  • ARROW-10019 - [Rust] Add substring kernel
  • ARROW-10023 - [C++][Gandiva] Implement split_part function in gandiva
  • ARROW-10024 - [C++][Parquet] Create nested reading benchmarks
  • ARROW-10028 - [Rust] Simplified macro
  • ARROW-10030 - [Rust] Add support for FromIter and IntoIter for primitive types
  • ARROW-10035 - [C++] Update vendored libraries
  • ARROW-10037 - [C++] Workaround to force find AWS SDK to look for shared libraries
  • ARROW-10040 - [Rust] Iterate over and combine boolean buffers with arbitrary offsets
  • ARROW-10043 - [Rust][DataFusion] Implement COUNT(DISTINCT col)
  • ARROW-10044 - [Rust] Improved Arrow's README.
  • ARROW-10046 - [Rust][DataFusion] Made RecordBatchReader implement Iterator
  • ARROW-10050 - [C++][Gandiva] Implement concat() in Gandiva for up to 10 arguments
  • ARROW-10051 - [C++][Compute] Move kernel state when merging
  • ARROW-10054 - [Python] don't crash when slice offset > length
  • ARROW-10055 - [Rust] DoubleEndedIterator implementation for NullableIter
  • ARROW-10057 - [C++] Add hand-written Parquet nested tests
  • ARROW-10058 - [C++] Improve repeated levels conversion without BMI2
  • ARROW-10059 - [R][Doc] Give more advice on how to set up C++ build
  • ARROW-10063 - [Archery][CI] Fetch main branch in archery build only when it is a pull request
  • ARROW-10064 - [C++] Resolve compile warnings on Apple Clang 12
  • ARROW-10065 - [Rust] Simplify code (+500, -1k)
  • ARROW-10066 - [C++] Make sure default AWS region selection algorithm is used
  • ARROW-10068 - [C++] Add bundled external project for aws-sdk-cpp
  • ARROW-10069 - [Java] Support running Java benchmarks from command line
  • ARROW-10070 - [C++][Compute] Implement var and std aggregate kernel
  • ARROW-10071 - [R] segfault with ArrowObject from previous session, or saved
  • ARROW-10074 - [C++] Use string constructor instead of string_view.to_string
  • ARROW-10075 - [C++] Use nullopt from arrow::util instead of vendored namespace
  • ARROW-10076 - [C++] Use temporary directory facility in all unit tests
  • ARROW-10077 - [C++] Fix possible integer multiplication overflow
  • ARROW-10083 - [C++] Improve Parquet fuzz seed corpus
  • ARROW-10084 - [Rust][DataFusion] Added length of LargeStringArray and fixed undefined behavior.
  • ARROW-10086 - [Rust] Renamed min/max_large_string kernels
  • ARROW-10090 - [C++][Compute] Improve mode kernel
  • ARROW-10092 - [Dev][Go] Add grpc generated go files to rat exclusion list
  • ARROW-10093 - [R] Add ability to opt-out of int64 -> int demotion
  • ARROW-10096 - [Rust][DataFusion] Removed unused code
  • ARROW-10099 - [C++][Dataset] Simplify type inference for partition columns
  • ARROW-10100 - [C++][Python][Dataset] Add ParquetFileFragment::Subset method
  • ARROW-10102 - [C++] Refactor BasicDecimal128 Multiplication to use unsigned helper
  • ARROW-10103 - [Rust] Add contains kernel
  • ARROW-10105 - [FlightRPC] Add client option to disable certificate validation with TLS
  • ARROW-10120 - [C++] Add two-level nested Parquet read to Arrow benchmarks
  • ARROW-10127 - Update specification for Decimal to allow for 256-bits
  • ARROW-10129 - [Rust] Cargo build is rebuilding dependencies on arrow changes
  • ARROW-10134 - [Python][Dataset] Add ParquetFileFragment.num_row_groups
  • ARROW-10139 - [C++] Add support for building arrow_testing without building tests
  • ARROW-10148 - [Rust] Improved rust/lib.rs that is shown in docs.rs
  • ARROW-10151 - [Python] Add support for MapArray conversion to Pandas
  • ARROW-10155 - [Rust][DataFusion] Improved lib.rs docs
  • ARROW-10156 - [Rust] Added github action to label PRs for rust.
  • ARROW-10157 - [Rust] Add an example to the take kernel
  • ARROW-10160 - [Rust] Improve DictionaryType documentation (clarify which type is which)
  • ARROW-10161 - [Rust][DataFusion] DRYed code in tests
  • ARROW-10162 - [Rust] Add pretty print support for DictionaryArray
  • ARROW-10164 - [Rust] Add support for DictionaryArray to cast kernel
  • ARROW-10167 - [Rust][DataFusion] Support DictionaryArray in sql.rs tests, by using standard pretty printer
  • ARROW-10171 - [Rust][DataFusion] Added ExecutionContext::From<ExecutionContextState>
  • ARROW-10190 - [Website] Add Jorge to list of committers
  • ARROW-10196 - [C++] Add Future::DeferNotOk
  • ARROW-10199 - [Rust][Parquet] Release Parquet at crates.io to remove debug prints
  • ARROW-10201 - [C++][CI] Disable S3 in arm64 job on Travis CI
  • ARROW-10202 - [CI][Windows] Use sf.net mirror for MSYS2
  • ARROW-10205 - [Java][FlightRPC] Allow disabling server validation
  • ARROW-10206 - [C++][Python][FlightRPC] Allow disabling server validation
  • ARROW-10215 - [Rust][DataFusion] Renamed Source to SendableRecordBatchReader.
  • ARROW-10217 - [CI] Run fewer GitHub Actions jobs
  • ARROW-10227 - [Ruby] Use a table size as the default for parquet chunk_size
  • ARROW-10229 - [C++] Remove errant log line
  • ARROW-10231 - [CI] Unable to download minio in arm32v7 docker image
  • ARROW-10233 - [Rust] Make array_value_to_string available in all Arrow builds
  • ARROW-10235 - [Rust][DataFusion] Improve documentation for type coercion
  • ARROW-10240 - [Rust] Optionally load data into memory before running benchmark query
  • ARROW-10251 - [Rust][DataFusion] MemTable::load() now loads partitions in parallel
  • ARROW-10252 - [Python] Add option to skip inclusion of Arrow headers in Python installation
  • ARROW-10256 - [C++][Flight] Disable -Werror carefully
  • ARROW-10257 - [R] Prepare news/docs for 2.0 release
  • ARROW-10260 - [Python] Missing MapType in to_pandas_dtype()
  • ARROW-10265 - [CI] Use smaller build when cache doesn't exist on Travis CI
  • ARROW-10266 - [CI][macOS] Ensure using Python 3.8 with Homebrew
  • ARROW-10267 - [Python] Skip flight test if disable_server_verification feature is not available
  • ARROW-10272 - [Packaging][Python] Pin newer multibuild version to avoid updating homebrew
  • ARROW-10273 - [CI][Homebrew] Fix "brew audit" usage
  • ARROW-10287 - [C++] Avoid std::random_device
  • PARQUET-1845 - [C++] Add expected results of Int96 in big-endian
  • PARQUET-1878 - [C++] lz4 codec is not compatible with Hadoop Lz4Codec
  • PARQUET-1904 - [C++] Export file_offset in RowGroupMetaData

Apache Arrow 1.0.1 (2020-08-21)

Bug Fixes

  • ARROW-9535 - [Python] Remove symlink fixes from conda recipe
  • ARROW-9536 - [Java] Miss parameters in PlasmaOutOfMemoryException.java
  • ARROW-9544 - [R] Fix version argument of write_parquet()
  • ARROW-9549 - [Rust] Fixed version in dependency in parquet.
  • ARROW-9556 - [Python][C++] Segfaults in UnionArray with null values
  • ARROW-9560 - [Packaging] Add required conda-forge.yml
  • ARROW-9569 - [CI][R] Fix rtools35 builds for msys2 key change
  • ARROW-9570 - [Doc] Clean up sphinx sidebar
  • ARROW-9573 - [Python][Dataset] Provide read_table(ignore_prefixes=)
  • ARROW-9574 - [R] Cleanups for CRAN 1.0.0 release
  • ARROW-9575 - [R] gcc-UBSAN failure on CRAN
  • ARROW-9577 - [C++] Ignore EBADF error in posix_madvise()
  • ARROW-9589 - [C++/R] Forward declare structs as structs
  • ARROW-9592 - [CI] Update homebrew before calling brew bundle
  • ARROW-9596 - [CI][Crossbow] Fix homebrew-cpp again, again
  • ARROW-9598 - [C++][Parquet] Fix writing nullable structs
  • ARROW-9599 - [CI] Appveyor toolchain build fails because CMake detects different C and C++ compilers
  • ARROW-9600 - [Rust] pin proc macro
  • ARROW-9600 - [Rust][Arrow] pin older version of proc-macro2 during build
  • ARROW-9602 - [R] Improve cmake detection in Linux build
  • ARROW-9606 - [C++][Dataset] Support "a"_.In(<>).Assume(<compound>)
  • ARROW-9609 - [C++][Dataset] CsvFileFormat reads all virtual columns as null
  • ARROW-9621 - [Python] Skip test_move_file for in-memory fsspec filesystem
  • ARROW-9631 - [Rust] Make arrow not depend on flight
  • ARROW-9631 - [Rust] flight should depend on arrow, not the other way around
  • ARROW-9644 - [C++][Dataset] Don't apply ignore_prefixes to partition base_dir
  • ARROW-9659 - [C++] Fix RecordBatchStreamReader when source is CudaBufferReader
  • ARROW-9684 - [C++] Fix undefined behaviour on invalid IPC / Parquet input
  • ARROW-9700 - [Python] fix create_library_symlinks for macos
  • ARROW-9712 - [Rust][DataFusion] Fix parquet error handling and general code improvements
  • ARROW-9743 - [R] Sanitize paths in open_dataset
  • ARROW-10126 - [Python] Impossible to import pyarrow module in python. Generates this "ImportError: DLL load failed: The specified procedure could not be found."
  • ARROW-10460 - [FlightRPC][Python] FlightRPC authentication mechanism changed and is undocumented, breaking current working code

New Features and Improvements

  • ARROW-9402 - [C++] Rework portable wrappers for checked integer arithmetic
  • ARROW-9563 - [Dev][Release] Use archery's changelog generator when creating release notes for the website
  • ARROW-9715 - [R] changelog/doc updates for 1.0.1
  • ARROW-9845 - [Rust] [Parquet] serde_json is only used in tests but isn't in dev-dependencies

Apache Arrow 1.0.0 (2020-07-24)

Bug Fixes

  • ARROW-1692 - [Java] UnionArray round trip not working
  • ARROW-3329 - [Python] Python tests for decimal to int and decimal to decimal casts
  • ARROW-3861 - [Python] ParquetDataset.read() respect specified columns and not include partition columns
  • ARROW-4018 - [C++] Fix RLE tests' failures on big-endian platforms
  • ARROW-4309 - [Documentation] Add a docker-compose entry which builds the documentation with CUDA enabled
  • ARROW-4600 - [Ruby] returns dictionary value
  • ARROW-5158 - [Packaging][Wheel] Symlink libraries in wheels
  • ARROW-5310 - [Python] better error message on creating ParquetDataset from empty directory
  • ARROW-5359 - [Python] Support non-nanosecond out-of-range timestamps in conversion to pandas
  • ARROW-5572 - , ARROW-5310, ARROW-5666: [Python] ParquetDataset tests for new implementation
  • ARROW-5666 - [Python] Underscores in partition (string) values are dropped when reading dataset
  • ARROW-5744 - [C++] Allow Table::CombineChunks to leave string columns chunked
  • ARROW-5875 - [FlightRPC] integration tests for Flight features
  • ARROW-6235 - [R] Implement conversion from arrow::BinaryArray to R character vector
  • ARROW-6523 - [C++][Dataset] arrow_dataset target does not depend on anything
  • ARROW-6848 - [C++] Support building libraries targeting C++14 or higher
  • ARROW-7018 - [R] Non-UTF-8 data in Arrow <--> R conversion
  • ARROW-7028 - [R] Date roundtrip results in different R storage mode
  • ARROW-7084 - [C++] Check for full type equality in ArrayRangeEquals
  • ARROW-7173 - [Integration] Add test to verify Map field names can be arbitrary
  • ARROW-7208 - [Python][Parquet] Raise better error message when passing a directory path instead of a file path to ParquetFile
  • ARROW-7273 - [Python][C++][Parquet] Do not permit constructing a non-nullable null field in Python, catch this case in Arrow->Parquet schema conversion
  • ARROW-7480 - [Rust] [DataFusion] Query fails/incorrect when aggregated + grouped columns don't match the selected columns
  • ARROW-7610 - [Java] Finish support for 64 bit int allocations
  • ARROW-7654 - [Python] Ability to set column_types to a Schema in csv.ConvertOptions is undocumented
  • ARROW-7681 - [Rust] Explicitly seeking a BufReader will discard the internal buffer (2)
  • ARROW-7702 - [C++][Dataset] Provide (optional) deterministic order of batches
  • ARROW-7782 - [Python] Losing index information when using write_to_dataset with partition_cols
  • ARROW-7840 - [Java] [Integration] Java executables fail
  • ARROW-7843 - [Ruby] MSYS2 packages needed for Gandiva
  • ARROW-7925 - [C++][Docs] Better document use of IWYU, including new 'match' option
  • ARROW-7939 - [Python] crashes when reading parquet file compressed with snappy
  • ARROW-7967 - [CI][Crossbow] Pin macOS version in autobrew job to match CRAN
  • ARROW-8050 - [Python][Packaging] Do not include generated Cython source files in wheel packages
  • ARROW-8078 - [Python] Missing links in the docs regarding field and schema DataTypes
  • ARROW-8115 - [Python] Conversion when mixing NaT and datetime objects not working
  • ARROW-8251 - , ARROW-7782: [Python] Preserve pandas index and extension dtypes in write_to_dataset roundtrip
  • ARROW-8344 - [C#] Bug-fixes to binary array plus other improvements
  • ARROW-8360 - [C++][Gandiva] Fixes date32 support for date/time functions
  • ARROW-8374 - [R] : Table to vector of DictonaryType will error when Arrays don't have the same Dictionary per array
  • ARROW-8392 - [Java] Fix overflow related corner cases for vector value comparison
  • ARROW-8448 - [Packaging] Update linux-packages README
  • ARROW-8455 - [Rust] Parquet Arrow column read on partially compatible files FIX
  • ARROW-8455 - [Rust] Parquet Arrow column read on partially compatible files
  • ARROW-8471 - [C++][Integration] Represent 64 bit integers as strings
  • ARROW-8472 - [Go][Integration] Represent 64 bit integers as JSON::string
  • ARROW-8473 - [Rust] Untick "Statistics support"
  • ARROW-8480 - [Rust] Use NonNull well aligned pointer as Unique reference
  • ARROW-8503 - [Packaging][deb] Fix building apache-arrow-archive-keyring for RC
  • ARROW-8505 - [Release][C#] "sourcelink test" is failed by Apache.ArrowAssemblyInfo.cs
  • ARROW-8508 - [Rust] FixedSizeListArray improper offset for value
  • ARROW-8510 - [C++][Datasets] Do not use variant in WritePlan to fix compiler error with VS 2017
  • ARROW-8511 - [Release] In verify-release-candidate.bat, exit when CMake build fails, use Unity build
  • ARROW-8514 - [Developer][Release] Verify Python 3.5 Windows wheel
  • ARROW-8529 - [C++] Fix usage of NextCounts() on dictionary-encoded data
  • ARROW-8535 - [Rust] Specify arrow-flight version
  • ARROW-8536 - [Rust][Flight] Check in proto file, conditional build if file exists
  • ARROW-8537 - [C++] Revert Optimizing BitmapReader
  • ARROW-8539 - [CI] "AMD64 MacOS 10.15 GLib & Ruby" fails
  • ARROW-8554 - [C++][Benchmark] Fix building error "cannot bind lvalue"
  • ARROW-8556 - [R] zstd symbol not found if there are multiple installations of zstd
  • ARROW-8566 - [R] error when writing POSIXct to spark
  • ARROW-8568 - [C++] Fix decimal to decimal cast issues
  • ARROW-8577 - [Plasma][CUDA] Make CUDA initialization lazy
  • ARROW-8583 - [C++][Doc] Undocumented parameter in Dataset namespace
  • ARROW-8584 - [C++] Fix ORC link order
  • ARROW-8585 - [Packaging][Python] Windows wheels fail to build because of link error
  • ARROW-8586 - [R] installation failure on CentOS 7
  • ARROW-8587 - [C++] Fix linking Flight benchmarks
  • ARROW-8592 - [C++] Update docs to reflect LLVM 8
  • ARROW-8593 - [C++][Parquet] Fix build with musl libc
  • ARROW-8598 - [Rust] simd_compare_op creates buffer of incorrect length
  • ARROW-8602 - [C++][CMake] Fix ws2_32 link issue when cross-compiling on Linux
  • ARROW-8603 - [C++][Documentation] Add missing params comment
  • ARROW-8604 - [R][CI] Update CI to use R 4.0
  • ARROW-8608 - [C++] Update vendored 'variant.hpp' to fix CUDA 10.2
  • ARROW-8609 - [C++] Fix ORC Java JNI crash
  • ARROW-8610 - [Rust] DivideByZero when running arrow crate when simd feature is disabled
  • ARROW-8613 - [C++][Dataset][Python] Raise in discovery on unparsable partition expression
  • ARROW-8615 - [R] Error better and insist on RandomAccessFile in read_feather
  • ARROW-8617 - [Rust] Avoid loading simd_load_set_invalid which doesn't exist on aarch64
  • ARROW-8632 - [C++] Fix conversion error warning in array_union_test.cc
  • ARROW-8641 - [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader
  • ARROW-8643 - [Python] Fix failing pandas tests with DatetimeIndex on pandas master
  • ARROW-8644 - [Python] Restore ParquetDataset behaviour to always include partition column for dask compatibility
  • ARROW-8646 - [Java] Allow UnionListWriter to write null values
  • ARROW-8649 - [Java][Website] Java documentation on website is hidden
  • ARROW-8657 - [C++][Python] Add separate configuration for data pages
  • ARROW-8663 - [Documentation] Small correction to building.rst
  • ARROW-8680 - [Rust] Fix ComplexObjectArray null value shifting
  • ARROW-8684 - [Python] Workaround Cython type initialization bug
  • ARROW-8689 - [C++] Fix linking S3FS benchmarks
  • ARROW-8693 - [Python] Insert implicit cast in Dataset.get_fragments with filter
  • ARROW-8694 - [C++][Parquet] Relax string size limit when deserializing Thrift messages
  • ARROW-8701 - [Rust] Unresolved import `crate::compute::util::simd_load_set_invalid` on Raspberry Pi
  • ARROW-8704 - [C++] Fix Parquet undefined behaviour on invalid input
  • ARROW-8705 - copying null values from ComplexCopier
  • ARROW-8706 - [C++][Parquet] Tracking JIRA for PARQUET-1857 (unencrypted INT16_MAX Parquet row group limit)
  • ARROW-8710 - [Rust] Ensure right order of messages written, and flush stream when complete.
  • ARROW-8722 - [Dev] Pass environment variables to the container when running "archery docker run -e"
  • ARROW-8726 - [C++] Filename should not be part of DirectoryPartitioning
  • ARROW-8728 - [C++] Fix bitmap operation buffer overflow
  • ARROW-8729 - [C++][Dataset] Ensure non-empty batches when only virtual columns are projected
  • ARROW-8734 - [R] improve nightly build installation
  • ARROW-8741 - [Python][Packaging] Keep VS2015 with for the windows wheels
  • ARROW-8750 - [Python] Correctly default to lz4 compression for Feather V2 in Python
  • ARROW-8768 - [R][CI] Fix nightly as-cran spurious failure
  • ARROW-8775 - [C++][FlightRPC] fix integration tests
  • ARROW-8776 - [FlightRPC] Fix discrepancy between headers in Java and C++
  • ARROW-8798 - [C++] Fix Parquet crash on invalid input
  • ARROW-8799 - [C++][Parquet] NestedListReader needs to handle empty item batches
  • ARROW-8801 - [Python] Fix memory leak when converting datetime64-with-tz data to pandas
  • ARROW-8802 - [C++][Dataset] Preserve dataset schema's metadata on column projection
  • ARROW-8803 - [Java] Row count should be set before loading buffers in VectorLoader
  • ARROW-8808 - [Rust] Fix divide by zero error in builder
  • ARROW-8809 - [Rust] Fix JSON schema bug
  • ARROW-8811 - [Java] Fix CI
  • ARROW-8820 - [C++][Gandiva] fix date_trunc functions to return date types
  • ARROW-8821 - [Rust] fix type cast for nested binary expression using Like, NotLike, Not operators
  • ARROW-8825 - [C++] Mark parameter as unused
  • ARROW-8826 - [Crossbow] remote URL should always have .git
  • ARROW-8832 - [Python] Provide better error message when S3/HDFS is not enabled in installation
  • ARROW-8848 - [Ruby][CI] Fix MSYS2 update error
  • ARROW-8848 - [Ruby][CI] Fix MSYS2 update error
  • ARROW-8858 - [FlightRPC] ensure binary/multi-valued headers are properly exposed
  • ARROW-8860 - [C++] Fix IPC/Feather decompression for nested types (with child_data)
  • ARROW-8862 - [C++] NumericBuilder should use MemoryPool passed to CTOR
  • ARROW-8863 - [C++] Ensure that ArrayData::null_count is always set to 0 when using ArrayData::Make and supplying null validity bitmap
  • ARROW-8869 - [Rust][DataFusion] Add support for new scan nodes to type coercion rule
  • ARROW-8871 - [C++] Fix Gandiva for value_parsing.h refactor
  • ARROW-8872 - [CI] Restore ci/detect-changes.py
  • ARROW-8874 - [C++][Dataset] Scanner::ToTable race when ScanTask exit early with an error
  • ARROW-8878 - [R] try_download is confused when download.file.method isn't default
  • ARROW-8882 - [C#] Add .editorconfig to C# code
  • ARROW-8888 - [Python] Do not use thread pool when converting pandas columns that are definitely zero-copyable
  • ARROW-8889 - [Python] avoid SIGSEGV when comparing RecordBatch to None
  • ARROW-8892 - [C++][CI] CI builds for MSVC do not build benchmarks
  • ARROW-8909 - [Java] Out of order writes using setSafe
  • ARROW-8911 - [C++] Fix segfault when slicing ChunkedArray with zero chunks
  • ARROW-8924 - [C++][Gandiva] Avoid potential int overflow in castDATE_date32()
  • ARROW-8925 - [Rust][DataFusion] CsvExec::schema bug fix
  • ARROW-8930 - [C++] libz.so linking error with liborc.a
  • ARROW-8932 - [C++][CI] Fix link error at arrow-orc-adapter-test
  • ARROW-8946 - [Python] Add tests for parquet.write_metadata
  • ARROW-8948 - [Java][Integration] enable duplicate field names integration tests
  • ARROW-8951 - [C++] Fix compiler warnings on gcc8 in release builds
  • ARROW-8954 - [Website] ca-certificates should be listed in installation instructions
  • ARROW-8957 - [FlightRPC][C++] directly use IpcWriteOptions
  • ARROW-8959 - [Rust] Update benchmark to use new API (fixes broken build)
  • ARROW-8962 - [C++] Add explicit implementation for junk values
  • ARROW-8968 - [C++][Gandiva] set data layout for pre-compiled IR to llvm::module
  • ARROW-8975 - [FlightRPC][C++] try to fix MacOS flaky tests
  • ARROW-8977 - [R] Table$create with schema crashes with some dictionary index types
  • ARROW-8978 - [C++][CI] Fix valgrind warnings in cpp-conda-valgrind nightly build
  • ARROW-8980 - [Python] Ensure that ARROW:schema metadata key is scrubbed when converting Parquet schema back to Arrow schema
  • ARROW-8982 - [CI] Remove allow_failures for s390x on TravisCI
  • ARROW-8986 - [Archery][ursabot] Fix benchmark diff checkout of origin/master
  • ARROW-9000 - [Java] Update errorprone to 2.4.0
  • ARROW-9009 - [C++][Dataset] ARROW:schema should be removed from schema's metadata when reading Parquet files
  • ARROW-9013 - [C++] Validate CMake options
  • ARROW-9020 - [Python] read_json won't respect explicit_schema in parse_options
  • ARROW-9024 - [C++/Python] Install anaconda-client in conda-clean job
  • ARROW-9026 - [C++/Python] Force package removal from arrow-nightlies c…
  • ARROW-9037 - [C++] C-ABI: do not error out when importing array with null_count == -1
  • ARROW-9040 - [Python][Parquet]"_ParquetDatasetV2" fail to read with columns and use_pandas_metadata=True
  • ARROW-9057 - [Rust][Datafusion] Fix projection on in memory scan
  • ARROW-9059 - [Rust] Fix sign in array slice_data_docstring
  • ARROW-9066 - [Python] Raise correct error in isnull()
  • ARROW-9071 - [C++] Fixed a bug in MakeArrayOfNull
  • ARROW-9077 - [C++] Fix aggregate/scalar-compare benchmark null_percent calculation
  • ARROW-9080 - [C++] arrow::AllocateBuffer returns a Result<unique_ptr<Buffer>>
  • ARROW-9082 - [Rust] - Stream reader fail when steam not ended with (opt…
  • ARROW-9084 - [C++] CMake is unable to find zstd target when ZSTD_SOURCE=SYSTEM
  • ARROW-9085 - [C++][CI] Fix Windows build
  • ARROW-9087 - [C++] Support additional HDFS options
  • ARROW-9098 - [C++] Fixed ToStructArray handling of 0 column RecordBatches
  • ARROW-9105 - [C++][Dataset][Python] Pass an explicit schema to split_by_row_groups
  • ARROW-9120 - [C++] Do not suppress linting on files with "codegen" in their name
  • ARROW-9121 - [C++] Forbid empty or root path in FileSystem::DeleteDirContents
  • ARROW-9122 - [C++] Properly handle sliced arrays in ascii_lower, ascii_upper kernels
  • ARROW-9126 - [C++] Fix building trimmed Boost bundle on Windows
  • ARROW-9127 - [Rust] Update thrift dependency to 0.13 (latest)
  • ARROW-9134 - [Python] Parquet partitioning degrades Int32 to float64
  • ARROW-9141 - [R] Update cross-package documentation links
  • ARROW-9142 - [C++] random::RandomArrayGenerator::Boolean "probability" misdocumented / incorrect
  • ARROW-9143 - [C++] Do not produce internal ArrayData with kUnknownNullCount in RecordBatch::Slice if source ArrayData::null_count is set to 0
  • ARROW-9146 - [C++][Dataset] Lazily store fragment physical schema
  • ARROW-9151 - [R][CI] Fix Rtools 4.0 build: pacman sync
  • ARROW-9160 - [C++] Implement contains for exact matches
  • ARROW-9174 - [Go] Fix table panic on 386
  • ARROW-9183 - [C++] Fix build with clang & old libstdc++.
  • ARROW-9184 - [Rust][Datafusion] table scan without projection should return all columns
  • ARROW-9194 - [C++] Array::GetScalar not implemented for decimal type
  • ARROW-9195 - [Java] Fixed UNSAFE.get from bytearray usage
  • ARROW-9209 - [C++] Benchmarks fail to build ARROW_IPC=OFF and ARROW_BUILD_TESTS=OFF
  • ARROW-9219 - [R] coerce_timestamps in Parquet write options does not work
  • ARROW-9221 - [Java] account for big-endian buffers in ArrowBuf.setBytes
  • ARROW-9223 - [Python] Propagate timezone information in pandas conversion
  • ARROW-9230 - [FlightRPC][Python] pass through all options in flight.connect
  • ARROW-9233 - [C++] Add NullType code paths for is_valid, is_null kernels
  • ARROW-9236 - [Rust] CSV WriterBuilder never writes header
  • ARROW-9237 - [R] 0.17 install on Arch Linux
  • ARROW-9238 - [C++][CI][FlightRPC] increase test coverage of round-robin under IPC and Flight
  • ARROW-9252 - [Integration] Factor out IPC integration tests into script, add back 0.14.1 "gold" files
  • ARROW-9260 - [CI] Fix non amd64 job failures with Ubuntu 14.04 and 20.04
  • ARROW-9260 - [CI][TRIAGE] Disable self-hosted builds until ARM64v8 build can be fixed
  • ARROW-9261 - [Python] Fix CA certificate lookup with S3 filesystem on manylinux
  • ARROW-9274 - [Rust] Parse 64bit numbers from integration files as strings
  • ARROW-9282 - [R] Remove usage of EXTPTRPTR
  • ARROW-9284 - [Java] getMinorTypeForArrowType returns sparse minor type for dense union types
  • ARROW-9288 - [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning
  • ARROW-9297 - [C++][Parquet] Support chunked row groups in RowGroupRecordBatchReader
  • ARROW-9298 - [C++] Fix crashes with invalid IPC input
  • ARROW-9303 - [R] Linux static build should always bundle dependencies
  • ARROW-9305 - [Python] Dependency load failure in Windows wheel build
  • ARROW-9315 - [Java] Fix the failure of testAllocationManagerType
  • ARROW-9317 - [Java] add few testcases for arrow-memory
  • ARROW-9326 - [Python] Remove setuptools pinning
  • ARROW-9326 - [FOLLOWUP] Use requirements-build.txt for installing setuptools (#7638)
  • ARROW-9326 - [Python][TRIAGE] Pin to setuptools version prior to distutils-related changes on July 3 (#7636)
  • ARROW-9330 - [C++] Fix crash and undefined behaviour on corrupt IPC input
  • ARROW-9334 - [Dev][Archery] Push ancestor docker images
  • ARROW-9336 - [Ruby] Add support for missing keys in StructArrayBuilder
  • ARROW-9343 - [C++][Gandiva] CastInt/Float from string functions should handle leading/trailing white spaces
  • ARROW-9347 - [Python] Fix mv in fsspec handler for directories
  • ARROW-9350 - [C++] Fix Valgrind failures
  • ARROW-9351 - [C++] Fix CMake 3.2 detection in option value validation
  • ARROW-9353 - [Python][CI] Disable known failures in dask integration tests
  • ARROW-9354 - [C++] Turbodbc latest fails to build in the integration tests
  • ARROW-9355 - [R] : Fix -Wimplicit-int-float-conversion
  • ARROW-9360 - [CI][Crossbow] Nightly homebrew-cpp job times out
  • ARROW-9363 - [C++][Dataset] Preserve schema metadata in ParquetDatasetFactory
  • ARROW-9368 - [Python] Rename predicate argument to filter in split_by_row_group()
  • ARROW-9373 - [C++] Fix Parquet crash on invalid input (OSS-Fuzz)
  • ARROW-9380 - [C++] Fix Filter crashes and bug in kernels with NullHandling::OUTPUT_NOT_NULL
  • ARROW-9384 - [C++] Avoid memory blowup on invalid IPC input
  • ARROW-9385 - [Python] Fix JPype tests and JVM buffer lifetime
  • ARROW-9389 - [C++] Add binary metafunctions for the set lookup kernels isin and match that can be called with CallFunction
  • ARROW-9397 - [R] Pass CC/CXX et al. to cmake when building libarrow in Linux build
  • ARROW-9408 - [Integration] Fix Windows numpy datagen issues
  • ARROW-9409 - [CI][Crossbow] Nightly conda-r fails
  • ARROW-9410 - [CI][Crossbow] Fix homebrew-cpp again
  • ARROW-9413 - [Rust] Disable cpm_nan clippy error
  • ARROW-9415 - [C++] Arrow does not compile on Power9
  • ARROW-9416 - [Go] Add testcases for some datatypes
  • ARROW-9417 - [C++] Write length in IPC message by using little-endian
  • ARROW-9418 - [R] nyc-taxi Parquet files not downloaded in binary mode on Windows
  • ARROW-9419 - [C++] Expand fill_null function testing, test sliced arrays, fix some bugs
  • ARROW-9428 - [C++][Doc] Update buffer allocation documentation
  • ARROW-9436 - [C++][CI] Fix Valgrind failure
  • ARROW-9438 - [CI] Add spark patch to compile with recent Arrow Java changes
  • ARROW-9439 - [C++] Fix crash on invalid IPC input
  • ARROW-9440 - [Python] Expose Fill Null kernel
  • ARROW-9443 - [C++] Bundled bz2 build should only build libbz2
  • ARROW-9448 - [Java] fix empty ArrowBuf getting a null log in debug mode
  • ARROW-9449 - [R] Strip arrow.so
  • ARROW-9450 - [Python] Fix tests startup time
  • ARROW-9456 - [Python] Dataset segfault when not importing pyarrow.parquet
  • ARROW-9458 - [Python] Release GIL in ScanTask.execute
  • ARROW-9460 - [C++] Fix BinaryContainsExact for pattern with repeated characters
  • ARROW-9461 - [Rust] Fixed error in reading Date32 and Date64.
  • ARROW-9476 - [C++][Dataset] Fix incorrect dictionary association in HivePartitioningFactory
  • ARROW-9486 - [C++][Dataset] Support implicit cast of InExpression::set to dict
  • ARROW-9497 - [C++][Parquet] Fix fuzz failure case caused by malformed Parquet data
  • ARROW-9499 - [C++] AdaptiveIntBuilder::AppendNull does not increment the null count
  • ARROW-9500 - [C++] Do not use std::to_string to fix segfault on gcc 7.x in -O3 builds
  • ARROW-9501 - Add logic in timestampdiff() when end date is last day of…
  • ARROW-9503 - [Rust] Comparison sliced arrays is wrong
  • ARROW-9504 - [C++/Python] Segmentation fault on ChunkedArray.take
  • ARROW-9506 - [Packaging][Python] Fix macOS wheel build failures
  • ARROW-9512 - [C++] Avoid variadic template unpack inside lambda to work around gcc 4.8 bug
  • ARROW-9524 - [CI][Gandiva] Fix c++ unit test failure in Gandiva nightly build
  • ARROW-9527 - [Rust] Removed un-used dev dependencies.
  • ARROW-10126 - [Python] Impossible to import pyarrow module in python. Generates this "ImportError: DLL load failed: The specified procedure could not be found."
  • PARQUET-1839 - Set values read for required column
  • PARQUET-1857 - [C++] Do not fail to read unencrypted files with over 32767 row groups. Change some DCHECKs causing segfaults to throw exceptions
  • PARQUET-1865 - [C++] Fix usages of C++17 extensions in parquet/encoding_benchmark.cc
  • PARQUET-1877 - [C++] Reconcile thrift limits
  • PARQUET-1882 - [C++] Buffered Reads should allow for 0 length

New Features and Improvements

  • ARROW-300 - [Format] Proposal for "trivial" IPC body buffer compression using either LZ4 or ZSTD codecs
  • ARROW-842 - [Python] Recognize pandas.NaT as null when converting object arrays with from_pandas=True
  • ARROW-971 - [C++][Compute] IsValid, IsNull kernels
  • ARROW-974 - [Website] Add Use Cases section to the website
  • ARROW-1277 - Completing integration tests for major implemented data types
  • ARROW-1567 - [C++] Implement "fill_null" function that replaces null values with a scalar value
  • ARROW-1570 - [C++] Define API for creating a kernel instance from function of scalar input and output with a particular signature
  • ARROW-1682 - [Doc] Expand S3/MinIO fileystem dataset documentation
  • ARROW-1796 - [Python] RowGroup filtering on file level
  • ARROW-2260 - [C++][Plasma] Use Gflags for command-line parsing
  • ARROW-2444 - [Python][C++] Better handle reading empty parquet files
  • ARROW-2702 - [Python] Change a couple of error types in numpy_to_arrow.cc
  • ARROW-2714 - [Python] Implement variable step slicing with Take
  • ARROW-2912 - [Website] Build more detailed Community landing page a la Apache Spark
  • ARROW-3089 - [Rust] Add ArrayBuilder for different Arrow arrays
  • ARROW-3134 - [C++] Implement n-ary iterator for a collection of chunked arrays with possibly different chunking layouts
  • ARROW-3154 - [Python] Expand documentation on Parquet metadata inspection and writing of _metadata
  • ARROW-3244 - [Python] Multi-file parquet loading without scan
  • ARROW-3275 - [Python] Add documentation about inspecting Parquet file metadata
  • ARROW-3308 - [R] Convert R character vector with data exceeding 2GB to Large type
  • ARROW-3317 - [R] Test/support conversions from data.frame with a single character column exceeding 2GB capacity of BinaryArray
  • ARROW-3446 - [R] Document mapping of Arrow <-> R types
  • ARROW-3509 - [C++] Standardize on using Field in Type/Array
  • ARROW-3520 - [C++] Add "list_flatten" vector kernel wrapper for Flatten method of ListArray types
  • ARROW-3688 - [Rust] Add append_values for primitive builders
  • ARROW-3764 - [C++] Port Python "ParquetDataset" business logic to C++
  • ARROW-3827 - [Rust] Implement UnionArray Updated
  • ARROW-4022 - [C++] Promote Datum variant out of compute namespace
  • ARROW-4221 - [C++][Python] Add canonical flag in COO sparse index
  • ARROW-4390 - [R] Serialize "labeled" metadata in Feather files, IPC messages
  • ARROW-4412 - [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.
  • ARROW-4427 - [Doc] Move Confluence Wiki pages to the Sphinx docs
  • ARROW-4429 - [Doc] Add Git conventions to contributing guidelines
  • ARROW-4526 - [Java] Remove Netty references from ArrowBuf and move Allocator out of vector package
  • ARROW-5035 - [C#] ArrowBuffer.Builder<bool> is broken
  • ARROW-5082 - [Python] Substantially reduce Python wheel package and install size
  • ARROW-5143 - [Flight] Enable integration testing of batches with dictionaries
  • ARROW-5279 - [C++] Support reading delta dictionaries in IPC streams
  • ARROW-5377 - [C++] Make IpcPayload public and add GetPayloadSize
  • ARROW-5489 - [C++] Normalize kernels and ChunkedArray behavior
  • ARROW-5548 - [Documentation] http://arrow.apache.org/docs/latest/ is not latest
  • ARROW-5649 - [Integration][C++] Create integration test for extension types
  • ARROW-5708 - [C#] Null support for BooleanArray
  • ARROW-5760 - [C++] New compute::Take implementation for better performance, faster dispatch, smaller code size / faster compilation
  • ARROW-5854 - [Python] Expose compare kernels on Array class
  • ARROW-6052 - [C++] Split up arrow/array.h/cc into multiple files under arrow/array/, move ArrayData to separate header, make ArrayData::dictionary ArrayData
  • ARROW-6110 - [Java][Integration] Support LargeList Type and add integration test with C++
  • ARROW-6111 - [Java] Support LargeVarChar and LargeBinary types
  • ARROW-6439 - [R] Implement S3 file-system interface in R
  • ARROW-6456 - [C++] Possible to reduce object code generated in compute/kernels/take.cc?
  • ARROW-6501 - [C++] Remove non_zero_length_ field from SparseIndex class
  • ARROW-6521 - [C++] Add an API to query runtime build info
  • ARROW-6543 - [R] Support LargeBinary and LargeString types
  • ARROW-6602 - [Doc] Add a feature/implementation matrix
  • ARROW-6603 - [C#] Adds ArrayBuilder API to support writing null values + BooleanArray null support
  • ARROW-6645 - [Python] Use common boundschecking function for checking dictionary indices when converting to pandas
  • ARROW-6689 - [Rust] [DataFusion] Query execution enhancements for 1.0.0 release
  • ARROW-6691 - [Rust] [DataFusion] Use tokio and Futures instead of spawning threads
  • ARROW-6775 - [C++][Python] Implement list_value_lengths and list_parent_indices functions
  • ARROW-6776 - [Python] Need a lite version of pyarrow
  • ARROW-6800 - [C++] Add CMake option to build libraries targeting a C++14 or C++17 toolchain environment
  • ARROW-6839 - [Java] Add APIs to read and write "custom_metadata" field of IPC file footer (#7231)
  • ARROW-6856 - [C++] Use ArrayData instead of Array for ArrayData::dictionary
  • ARROW-6917 - [Archery][Release] Add support for JIRA curation, changelog generation and commit cherry-picking for maintenance releases
  • ARROW-6945 - [Rust][Integration] Run rust integration tests
  • ARROW-6959 - [C++] Clarify what signatures are preferred for compute kernels
  • ARROW-6978 - [R] Add bindings for sum and mean compute kernels
  • ARROW-6979 - [R] Enable jemalloc in autobrew formula
  • ARROW-7009 - [C++] Refactor filter/take kernels to use Datum instead of overloads
  • ARROW-7010 - [C++] Implement decimal-to-float casts
  • ARROW-7011 - [C++] Implement casts from float/double to decimal
  • ARROW-7012 - [C++] Add comments explaining high level detail about ChunkedArray class and questions about chunk sizes
  • ARROW-7068 - [C++] Add ListArray::offsets and LargeListArray::offsets returning boxed version of offsets as Int32Array/Int64Array
  • ARROW-7075 - [C++] Boolean kernels should not allocate in Call()
  • ARROW-7175 - [Website] Add a security page to track when vulnerabilities are patched
  • ARROW-7229 - [C++] Unify ConcatenateTables APIs
  • ARROW-7230 - [C++] Use vendored std::optional instead of boost::optional in Gandiva
  • ARROW-7237 - [C++] Use Result<T> in arrow/json APIs
  • ARROW-7243 - [Docs] Add common "implementation status" table to the README of each native language implementation, as well as top level README
  • ARROW-7285 - [C++] ensure C++ implementation meets clarified dictionary spec
  • ARROW-7300 - [C++][Gandiva] Implement functions to cast from strings to integers/floats
  • ARROW-7313 - [C++] Add function for retrieving a scalar from an array slot
  • ARROW-7371 - [GLib] Add GLib binding of Dataset
  • ARROW-7375 - [Python] Expose C++ MakeArrayOfNull
  • ARROW-7391 - [C++][Dataset] Remove Expression subclasses from bindings
  • ARROW-7495 - [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager (#6433)
  • ARROW-7605 - [C++] Create and install "dependency bundle" static library including jemalloc, mimalloc, and any BUNDLED static library so that static linking to libarrow.a is possible
  • ARROW-7607 - [C++] Example of using Arrow as a dependency of another CMake project
  • ARROW-7673 - [C++][Dataset] Revisit File discovery failure mode
  • ARROW-7676 - [Packaging][Python] Ensure that the static libraries are not built in the wheel scripts
  • ARROW-7699 - [Java] Support concating dense union vectors in batch
  • ARROW-7705 - [Rust] Initial sort implementation
  • ARROW-7717 - [CI] Have nightly integration test for Spark's latest release
  • ARROW-7759 - [C++][Dataset] Add CsvFileFormat
  • ARROW-7778 - [Integration][C++] Enable nested dictionaries
  • ARROW-7784 - [C++] Improve compilation time of arrow/array/diff.cc and reduce code size
  • ARROW-7801 - [Developer] Add issue_comment workflow to fix lint/style/codegen
  • ARROW-7803 - [R][CI] Autobrew/homebrew tests should not always install from master
  • ARROW-7831 - [Java] Fix build error from #6402
  • ARROW-7831 - [Java] do not allocate a new offset buffer if the slice starts at 0 since the relative offset pointer would be unchanged
  • ARROW-7902 - [Integration] Unskip nested dictionary integration tests
  • ARROW-7910 - [C++] Add internal GetPageSize() function
  • ARROW-7924 - [Rust] Add sort for float types
  • ARROW-7950 - [Python] Determine + test minimal pandas version + raise error when pandas is too old
  • ARROW-7955 - [Java] Support large buffer for file/stream IPC
  • ARROW-8020 - [Java] Implement vector validate functionality
  • ARROW-8023 - [Website] Write a blog post about the C data interface
  • ARROW-8025 - [C++][CI][FOLLOWUP] Fix test compilation failure due to conflicting changes in scalar_cast_test.cc
  • ARROW-8025 - [C++] Implement cast from String to Binary
  • ARROW-8046 - [Developer][Integration] Makefile.docker's target names are broken
  • ARROW-8062 - [C++][Dataset] Implement ParquetDatasetFactory
  • ARROW-8065 - [C++][Dataset] Refactor ScanOptions and Fragment relation
  • ARROW-8074 - [C++][Dataset][Python] FileFragments from buffers and NativeFiles
  • ARROW-8108 - [Java] Extract a common interface for dictionary encoders
  • ARROW-8111 - [C++] User-defined timestamp parser option to CSV, new TimestampParser interface, and strptime-compatible impl
  • ARROW-8114 - [Java][Integration] Enable custom_metadata integration test
  • ARROW-8121 - [Java] Enhance code style checking for Java code (add spaces after commas, semi-colons and type casts)
  • ARROW-8149 - [C++/Python] Enable CUDA Support in conda recipes
  • ARROW-8157 - [C++][Gandiva] Support building with LLVM 9
  • ARROW-8162 - [Format][Python] Add serialization for CSF sparse tensors to Python
  • ARROW-8169 - [Java] Improve the performance of JDBC adapter by allocating memory proactively
  • ARROW-8171 - [Java] Consider pre-allocating memory for fix-width vector in Avro adapter iterator (#7211)
  • ARROW-8190 - [FlightRPC][C++] Expose IPC options
  • ARROW-8229 - [Java] Move ArrowBuf into the Arrow package (#6729)
  • ARROW-8230 - [Java] Remove netty dependency from arrow-memory (#7347)
  • ARROW-8261 - [Rust-DataFusion] Made limit accept integers and no longer accept expressions.
  • ARROW-8263 - [Rust][DataFusion] Added some documentation to available SQL functions.
  • ARROW-8281 - [R] Name collision of arrow.dll on Windows conda
  • ARROW-8283 - [Python] Limit FileSystemDataset constructor from fragments/paths, no filesystem interaction
  • ARROW-8287 - [Rust] Add "pretty" util to help with printing tabular output of RecordBatches
  • ARROW-8293 - [Python] Run flake8 on python/examples also
  • ARROW-8297 - [FlightRPC][C++] Implement Flight DoExchange for C++
  • ARROW-8301 - [R] Handle ChunkedArray and Table in C data interface
  • ARROW-8312 - [Java][Gandiva] support TreeNode in IN expression
  • ARROW-8314 - [Python] Add a Table.select method to select a subset of columns
  • ARROW-8318 - [C++][Dataset] Construct FileSystemDataset from fragments
  • ARROW-8399 - [Rust] Extend memory alignments to include other architectures
  • ARROW-8413 - [C++][Parquet] Refactor Generating validity bitmap for values column
  • ARROW-8422 - [Rust][Parquet] Arrow to Parquet schema conversion
  • ARROW-8430 - [CI] Configure self-hosted runners for Github Actions
  • ARROW-8434 - [C++] Avoid multiple schema deserializations in RecordBatchFileReader
  • ARROW-8440 - [C++] Refine SIMD header files
  • ARROW-8443 - [Gandiva][C++] Fix Trunc and Round output types.
  • ARROW-8447 - [C++][Dataset] Ensure row deterministic ordering in Scanner::ToTable
  • ARROW-8456 - [Release] Add python script to help curating JIRA
  • ARROW-8467 - [C++] Fix TestArrayImport tests for big-endian platforms
  • ARROW-8474 - [CI][Crossbow] Skip some nightlies we don't need to run
  • ARROW-8477 - [C++] Enable reading and writing of long filenames for Windows
  • ARROW-8481 - [Java] Provide an allocation manager based on Unsafe API
  • ARROW-8483 - [Ruby] Removed irrelevant bits of documentation in Arrow::Table
  • ARROW-8485 - [Integration][Java] Implement extension types integration
  • ARROW-8486 - [C++] Fix BitArray failures on big-endian platforms
  • ARROW-8487 - [FlightRPC] Provide a way to target a particular payload size
  • ARROW-8488 - [R] Remove VALUE_OR_STOP and STOP_IF_NOT_OK macros
  • ARROW-8496 - [C++] Refine ByteStreamSplitDecodeScalar
  • ARROW-8497 - [Archery] Add missing components to build options
  • ARROW-8499 - [C++][Dataset] In ScannerBuilder, batch_size will not wor…
  • ARROW-8500 - [C++] Add benchmark for using Filter on RecordBatch
  • ARROW-8501 - [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6
  • ARROW-8502 - [Release][APT][Yum] Ignore all Linux packages for arm64v8
  • ARROW-8504 - [C++] Add BitRunReader and use it in parquet
  • ARROW-8506 - [C++] Add tests to verify the encoded stream of RLE with bit_width > 8
  • ARROW-8507 - [Release] Detect .git directory automatically in changelog.py
  • ARROW-8509 - [GLib] Add low level record batch read/write functions
  • ARROW-8512 - [C++] Remove unused expression/operator prototype code
  • ARROW-8513 - [Python] Expose Take with Table input in Python
  • ARROW-8515 - [C++] Bitmap::ToString should group by bytes
  • ARROW-8516 - [Rust] Improve PrimitiveBuilder::append_slice performance
  • ARROW-8517 - [Release] Update Crossbow release verification tasks for 0.17.0 RC0
  • ARROW-8520 - [Developer] Use .asf.yaml to direct GitHub notifications to JIRA and mailing lists
  • ARROW-8521 - [Release] Update CHANGELOG.md to include patch releases
  • ARROW-8522 - [Release][Developer] Add option to bootstrap NPM when running release verification script
  • ARROW-8524 - [CI] Free up space on github actions
  • ARROW-8526 - [Python] Fix non-deterministic row order failure in dataset tests
  • ARROW-8531 - [C++] Deprecate ARROW_USE_SIMD CMake option
  • ARROW-8538 - [Packaging] Remove boost from homebrew formula
  • ARROW-8540 - [C++] Add memory allocation benchmarks
  • ARROW-8541 - [Release] Don't remove previous source releases automatically
  • ARROW-8542 - [Release] Fix checksum url in the website post release script
  • ARROW-8543 - [C++] Single pass coalescing algorithm + Rebase
  • ARROW-8544 - [CI][Crossbow] Add a status.json to the gh-pages summary of nightly builds to get around rate limiting
  • ARROW-8548 - [Website] 0.17 release post
  • ARROW-8549 - [R] Assorted post-0.17 release cleanups
  • ARROW-8550 - [CI] Don't run cron GHA jobs on forks
  • ARROW-8551 - [CI][Gandiva] Use LLVM 8 in gandiva linux build
  • ARROW-8552 - [Rust] support iterate parquet row columns
  • ARROW-8553 - [C++] Optimize unaligned bitmap operations
  • ARROW-8555 - [FlightRPC][Java] implement DoExchange
  • ARROW-8558 - [Rust][CI] GitHub Actions missing rustfmt
  • ARROW-8559 - [Rust] Consolidate Record Batch reader traits in main arrow crate
  • ARROW-8560 - [Rust] Docs for MutableBuffer resize are incorrect
  • ARROW-8561 - [C++][Gandiva] Stop using deprecated google::protobuf::MessageLite::ByteSize()
  • ARROW-8562 - [C++] IO: Parameterize I/O Coalescing using S3 metrics
  • ARROW-8563 - [Go] Minor change to make newBuilder public
  • ARROW-8564 - [Website] Add Ubuntu 20.04 LTS to supported package list
  • ARROW-8569 - [CI] Upgrade xcode version for testing homebrew formulae
  • ARROW-8571 - [C++] Switch AppVeyor image to VS 2017
  • ARROW-8572 - [Python] expose UnionArray fields to Python
  • ARROW-8573 - [Rust] Upgrade Rust to 1.44 nightly
  • ARROW-8574 - [Rust] Implement Debug for all plain types
  • ARROW-8575 - [Developer] Add issue_comment workflow to rebase a PR
  • ARROW-8590 - [Rust] Use arrow crate pretty util in DataFusion
  • ARROW-8591 - [Rust] Reverse lookup for a key in DictionaryArray
  • ARROW-8597 - [Rust] Lints and readability improvements for arrow crate
  • ARROW-8606 - [CI] Don't trigger all builds on a change to any file in ci/
  • ARROW-8607 - [R][CI] Unbreak builds following R 4.0 release
  • ARROW-8611 - [R] Can't install arrow 0.17 on Ubuntu 18.04 R 3.6.3
  • ARROW-8612 - [GLib] Add GArrowReadOptions and GArrowWriteOptions
  • ARROW-8616 - [Rust] Turn explicit SIMD off by default
  • ARROW-8619 - [C++] Use distinct enum values for MonthInterval, DayTimeInterval
  • ARROW-8622 - [Rust] Allow the parquet crate to be compiled on aarch64 platforms
  • ARROW-8623 - [C++][Gandiva] Reduce use of Boost, remove Boost headers from header files
  • ARROW-8624 - [Website] Install page should mention arrow-dataset packages
  • ARROW-8628 - [Dev] Wrap docker-compose commands with archery
  • ARROW-8629 - [Rust] Eliminate indirection of zero sized allocations
  • ARROW-8633 - [C++] Add ValidateAscii function
  • ARROW-8634 - [Java] Add Getting Started section to Java README
  • ARROW-8639 - [C++][Plasma] Require gflags
  • ARROW-8645 - [C++] Missing gflags dependency for plasma
  • ARROW-8647 - [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type
  • ARROW-8648 - [Rust] Optimize Rust CI Workflows
  • ARROW-8650 - [Rust][Website] Add documentation to Arrow website
  • ARROW-8651 - [Python][Dataset] Support pickling of Dataset objects
  • ARROW-8656 - [Python] Switch to VS2017 in the windows wheel builds
  • ARROW-8659 - [Rust] ListBuilder allocate with_capacity
  • ARROW-8660 - [C++][Gandiva] Reduce usage of Boost in Gandiva codebase
  • ARROW-8662 - [CI] Consolidate appveyor scripts
  • ARROW-8664 - [Java] Add flag to skip null check
  • ARROW-8668 - [Packaging][APT][Yum][ARM] Use Travis CI's ARM machine to build packages
  • ARROW-8669 - [C++] Add IpcWriteOptions argument to GetRecordBatchSize()
  • ARROW-8671 - [C++][FOLLOWUP] Fix ASAN/UBSAN bug found with IPC fuzz testing files
  • ARROW-8671 - [C++] Use new BodyCompression Flatbuffers member for IPC compression metadata
  • ARROW-8682 - [Ruby][Parquet] Add support for column level compression
  • ARROW-8687 - [Java] Remove references to io.netty.buffer.ArrowBuf
  • ARROW-8690 - [Python] Clean-up dataset+parquet tests now order is determinstic
  • ARROW-8692 - [C++] Avoid memory copies when downloading from S3
  • ARROW-8695 - [Java] Remove references to PlatformDependent in arrow-memory
  • ARROW-8696 - [Java] Convert tests to maven failsafe
  • ARROW-8699 - [R] Fix automatic r_to_py conversion
  • ARROW-8702 - [Packaging][C#] Build NuGet packages in release process
  • ARROW-8703 - [R] schema$metadata should be properly typed
  • ARROW-8707 - [CI] Docker push fails because of wrong dockerhub credentials
  • ARROW-8708 - [CI] Utilize github actions cache for docker-compose volumes
  • ARROW-8711 - [Python] Expose timestamp_parsers in csv.ConvertOptions
  • ARROW-8717 - [CI][Packaging] Add build dependency on boost to homebrew
  • ARROW-8720 - [C++] Fix checked_pointer_cast ifdef logic
  • ARROW-8721 - [CI] Fix R build matrix
  • ARROW-8723 - [Rust] Remove SIMD specific benchmark code
  • ARROW-8724 - [Packaging][deb][RPM] Use directory in host as build directory
  • ARROW-8725 - [Rust] remove redundant directory walk in parquet datasource
  • ARROW-8727 - [C++] Don't require stack allocation of any object to use StringConverter, hide behind ParseValue function
  • ARROW-8730 - [Rust] Use slice instead of &Vec for function args
  • ARROW-8733 - [C++][Dataset][Python] Expose RowGroupInfo statistics values
  • ARROW-8736 - [Rust][DataFusion] Table API should provide a schema() method
  • ARROW-8740 - [CI] Fix archery option in pandas master cron test
  • ARROW-8742 - [C++][Python] Add GRPC Mutual TLS for clients and server
  • ARROW-8743 - [CI][C++] Add a test job for s390x
  • ARROW-8744 - [Rust] handle channel close in parquet batch iterator
  • ARROW-8745 - [C++] Enhance Bitmap::ToString test for big-endian platforms
  • ARROW-8747 - [C++] Write compressed size in little-endian format for Feather V2
  • ARROW-8751 - [Rust] support empty parquet file in arrow array reader
  • ARROW-8752 - [Rust] remove unused hashmaps in build_array_reader
  • ARROW-8753 - [CI][C++] Add a test job for ARM
  • ARROW-8754 - [C++][CI] Enable additional tests on s390x
  • ARROW-8756 - [C++] Fix Bitmap Words tests' failures on big-endian platforms
  • ARROW-8757 - [C++][Plasma] Write Plasma header in little-endian format
  • ARROW-8758 - [R] Updates for compatibility with dplyr 1.0
  • ARROW-8759 - [C++][Plasma] Fix TestPlasmaSerialization.DeleteReply failure on big-endian platforms
  • ARROW-8762 - [C++] Use arrow::internal::BitmapAnd directly in Gandiva
  • ARROW-8763 - [C++] Add RandomAccessFile::WillNeed
  • ARROW-8764 - [C++] Make executor configurable in ReadAsync and ReadRangeCache
  • ARROW-8766 - [Python] Allow implementing filesystems in Python
  • ARROW-8769 - [C++][R] Add convenience accessor for StructScalar fields
  • ARROW-8770 - [C++][CI] Enable arrow-csv-test on s390x
  • ARROW-8772 - [C++] Unrolled aggregate dense for better speculative execution
  • ARROW-8777 - [Rust] Parquet.rs does not support reading fixed-size binary fields.
  • ARROW-8778 - [C++][Gandiva] Fix SelectionVector related failure on big-endian platform
  • ARROW-8779 - [R] Implement conversion to List<Struct>
  • ARROW-8781 - [CI][MinGW] Enable ccache
  • ARROW-8782 - [Rust] Add benchmark crate
  • ARROW-8783 - [Rust][DataFusion] Add ParquetScan and CsvScan variants in LogicalPlan
  • ARROW-8784 - [Rust][DataFusion] Remove use of Arc from LogicalPlan
  • ARROW-8785 - [Python][Packaging] Build the windows wheels with MIMALLOC enabled
  • ARROW-8786 - [Packaging][rpm] Use bundled zstd in the CentOS 8 build
  • ARROW-8788 - [C#] Introduce bit-packed builder for null support in builders
  • ARROW-8789 - [Rust] Add separate crate for integration test binaries
  • ARROW-8790 - [C++][CI] Enable arrow-flight-test on s390x
  • ARROW-8791 - [Rust] Allow creation of StringDictionaryBuilder with an existing array of dictionary values
  • ARROW-8792 - [C++][Python][R][GLib] New Array compute kernels implementation and execution framework
  • ARROW-8793 - [C++] Do not inline BitUtil::SetBitsTo
  • ARROW-8794 - [C++] Expand performance coverage of parquet to arrow reading
  • ARROW-8795 - [C++] Limited iOS support
  • ARROW-8800 - [C++] Split ChunkedArray into arrow/chunked_array.h/cc
  • ARROW-8804 - [R][CI] Followup to Rtools40 upgrade
  • ARROW-8814 - [Dev][Release] Binary upload script keeps raising locale warnings
  • ARROW-8815 - [Dev][Release] Binary upload script should retry on unexpected bintray request error
  • ARROW-8818 - [Rust] Failing to build on master due to Flatbuffers/Union issues
  • ARROW-8822 - [Rust][DataFusion] Add InMemoryScan to LogicalPlan
  • ARROW-8827 - [Rust] Add initial skeleton for Rust integration tests
  • ARROW-8830 - [GLib] Add support for Tell against not seekable GIO output stream
  • ARROW-8831 - [Rust] change simd_compare_op in comparison kernel to use bitmask SIMD operation to significantly improve performance
  • ARROW-8833 - [Rust] Implement VALIDATE mode in integration tests
  • ARROW-8834 - [Rust][Integration Testing] Implement stream-to-file, file-to-stream
  • ARROW-8835 - [Rust] Implement arrow-stream-to-file for integration testing
  • ARROW-8836 - [Website] Update copyright end year automatically
  • ARROW-8837 - [Rust] Implement Null data type
  • ARROW-8838 - [Rust] File reader fails to read header from valid files
  • ARROW-8839 - [Rust][DataFusion] support CSV schema inference in logical plan
  • ARROW-8840 - [Rust][DataFusion] implement std::error:Error trait for ExecutionError
  • ARROW-8841 - [C++] Add benchmark and unittest for encoding::PLAIN spaced
  • ARROW-8843 - [C++] Compare bitmaps in words
  • ARROW-8844 - [C++] Transfer bitmap in words
  • ARROW-8846 - [Dev][Python] Autoformat Python files with archery
  • ARROW-8847 - [C++] Pass task hints in Executor API
  • ARROW-8851 - [Python][Documentation] Fix FutureWarnings in Python Plas…
  • ARROW-8852 - [R] Post-0.17.1 adjustments
  • ARROW-8854 - [Rust][Integration Testing] Standardize error handling
  • ARROW-8855 - [Rust][Integration] Complete record_batch_from_json types
  • ARROW-8856 - [Rust][Integration] Return None from an empty IPC message
  • ARROW-8864 - [R] Add methods to Table/RecordBatch for consistency with data.frame
  • ARROW-8866 - [C++] Split UNION into SPARSE_UNION and DENSE_UNION
  • ARROW-8867 - [R] Support converting POSIXlt type
  • ARROW-8875 - [C++] use AWS SDK SetResponseStreamFactory to avoid a copy of bytes
  • ARROW-8877 - [Rust][DataFusion] introduce CsvReadOption struct to simplify UX
  • ARROW-8879 - [FlightRPC][Java] FlightStream should unwrap ExecutionExceptions
  • ARROW-8880 - [R][Linux] Make R Binary Install Friendlier
  • ARROW-8881 - [Rust] Add large binary, string and list support
  • ARROW-8885 - [R] Don't include everything everywhere
  • ARROW-8886 - [C#] Resize to negative length no longer permitted
  • ARROW-8887 - [Java] Avoid runway doubling of buffer size for complex vectors
  • ARROW-8890 - [R] Fix C++ lint issues
  • ARROW-8895 - [C++] Test temporal types with Take and Filter, expand types supported by RandomArrayGenerator::ArrayOf
  • ARROW-8896 - [C++] Use Take to implement dictionary<T> to T casts
  • ARROW-8899 - [R] Add R metadata like pandas metadata for round-trip fidelity
  • ARROW-8901 - [C++] Reduce number of take kernels
  • ARROW-8903 - [C++] Implement optimized "unsafe take" for use with selection vectors for kernel execution
  • ARROW-8904 - [Python] Adapt to child->field API migration/deprecation
  • ARROW-8906 - [Rust][DataFusion] support schema inference from multiple CSV files
  • ARROW-8907 - [Rust] Implement scalar comparison operations
  • ARROW-8912 - [Ruby] Keep reference of Arrow::Buffer's data for GC
  • ARROW-8913 - [Ruby] Use "field" instead of "child"
  • ARROW-8914 - [C++] Keep BasicDecimal128 in native-endian order
  • ARROW-8915 - [Dev][Archery] Require Click 7
  • ARROW-8917 - [C++] Formalize "metafunction" concept. Add Take and Filter metafunctions, port R and Python bindings
  • ARROW-8918 - [C++][Python] Implement cast metafunction to allow use of "cast" with CallFunction, use in Python
  • ARROW-8922 - [C++] Add illustrative "ascii_upper" and "ascii_length" scalar string functions valid for Array and Scalar inputs
  • ARROW-8923 - [C++] Improve usability of arrow::compute::CallFunction
  • ARROW-8926 - [C++] Improve arrow/compute/*.h comments, correct typos and outdated language
  • ARROW-8927 - [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions
  • ARROW-8929 - [C++] Set the default for compute::Arity::VarArgs to 0
  • ARROW-8931 - [Rust] add lexical sort support to arrow compute kernel
  • ARROW-8933 - [C++] Trim redundant generated code from compute/kernels/vector_hash.cc
  • ARROW-8934 - [C++] Enable compute::Subtract with timestamp inputs to return duration
  • ARROW-8937 - [C++] Implement strptime scalar string to timestamp kernel
  • ARROW-8938 - [R] Provide binding for arrow::compute::CallFunction
  • ARROW-8940 - [Java] Fix the performance degradation of integration tests
  • ARROW-8941 - [C++/Python] Add cleanup script for arrow-nightlies conda repository
  • ARROW-8942 - [R] Detect compression in reading CSV/JSON
  • ARROW-8943 - [C++][Python][Dataset] Add partitioning support to ParquetDatasetFactory
  • ARROW-8950 - [C++] Avoid HEAD when possible in S3 filesystem
  • ARROW-8958 - [FlightRPC][Python] implement DoExchange
  • ARROW-8960 - [MINOR][FORMAT] fix typo
  • ARROW-8961 - [C++] Add utf8proc library to toolchain
  • ARROW-8963 - [C++][Parquet] optimize LeafReader::NextBatch to save memory
  • ARROW-8965 - [Python][Doc] Pyarrow documentation for pip nightlies references 404'd location
  • ARROW-8966 - [C++] Move arrow::ArrayData to a separate header file
  • ARROW-8969 - [C++] Reduce binary size of kernels/scalar_compare.cc.o by reusing more kernels between types, operators
  • ARROW-8970 - [C++] Reduce shared library / binary code size (umbrella issue)
  • ARROW-8972 - [Java] Support range value comparison for large varchar/varbinary vectors
  • ARROW-8973 - [Java] Support batch value appending for large varchar/varbinary vectors
  • ARROW-8974 - [C++] Simplify TransferBitmap
  • ARROW-8976 - [C++] compute::CallFunction can't Filter/Take with ChunkedArray
  • ARROW-8979 - [C++] Refine bitmap unaligned word access
  • ARROW-8984 - [R] Revise install guides now that Windows conda package exists
  • ARROW-8985 - [Format] Add Decimal::bitWidth field with default value of 128 for forward compatibility
  • ARROW-8989 - [C++][Doc] Document available compute functions
  • ARROW-8993 - [Rust] support reading non-seekable sources
  • ARROW-8994 - [C++] Disable include-what-you-use cpplint lint checks
  • ARROW-8996 - [C++] Add AVX version for aggregate sum/mean with runtime dispatch
  • ARROW-8997 - [Archery] Improve benchmark comparison formatting
  • ARROW-9004 - [C++][Gandiva] Support building with LLVM 10
  • ARROW-9005 - [Rust][Datafusion] support sort expression
  • ARROW-9007 - [Rust] Support appending array data to builders
  • ARROW-9011 - [Python][Packaging] Move the anaconda cleanup script to crossbow
  • ARROW-9014 - [Packaging] Bump the minor part of the automatically generated version in crossbow
  • ARROW-9015 - [Java] Make BaseAllocator package private
  • ARROW-9016 - [Java] Remove direct references to Netty/Unsafe Allocators
  • ARROW-9017 - [C++][Python] Refactor scalar bindings
  • ARROW-9018 - [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior
  • ARROW-9021 - [Python] Add the filesystem explanation to parquet.read_table docstring
  • ARROW-9022 - [C++] Add/Sub/Mul arithmetic kernels with overflow check
  • ARROW-9029 - [C++] Implement BitBlockCounter for much faster block popcounts of bitmaps
  • ARROW-9030 - [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib
  • ARROW-9031 - [R] Implement conversion from Type::UINT64 to R vector
  • ARROW-9032 - [C++] Split up arrow/util/bit_util.h into multiple header files
  • ARROW-9034 - [C++] Implement "BinaryBitBlockCounter", add single-word functions to BitBlockCounter
  • ARROW-9042 - [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior
  • ARROW-9043 - [Go][FOLLOWUP] Move license file copy to correct location
  • ARROW-9043 - [Go] Temporarily copy LICENSE.txt to go/
  • ARROW-9045 - [C++] Expand / improve Take and Filter benchmarks for enhanced baseline
  • ARROW-9046 - [C++][R] Put more things in type_fwds
  • ARROW-9047 - [Rust] Fix a segfault when setting zero bits in a zero-length bitset.
  • ARROW-9050 - [Release] Use 1.0.0 as the next version
  • ARROW-9051 - [GLib] Refer Array related objects from Array
  • ARROW-9052 - [CI][MinGW] Enable Gandiva
  • ARROW-9055 - [C++] Add sum/mean/minmax kernels for Boolean type
  • ARROW-9058 - [Packaging][wheel] Use sourceforge.net to download Boost
  • ARROW-9060 - [GLib] Add support for building Apache Arrow Datasets GLib with non-installed Apache Arrow Datasets
  • ARROW-9061 - [Packaging][APT][Yum][GLib] Add Apache Arrow Datasets GLib
  • ARROW-9062 - [Rust] json reader dictionary support
  • ARROW-9067 - [C++] Create reusable branchless / vectorized index boundschecking functions
  • ARROW-9070 - [C++] StructScalar needs field accessor methods
  • ARROW-9073 - [C++] Fix RapidJSON include directory detection with RapidJSONConfig.cmake
  • ARROW-9074 - [GLib] Add missing arrow-json check
  • ARROW-9075 - [C++] Optimized Filter implementation: faster performance + compilation, smaller code size
  • ARROW-9079 - [C++] Write benchmark for arithmetic kernels
  • ARROW-9083 - [R] collect int64, uint32, uint64 as R integer type if not out of bounds
  • ARROW-9086 - [CI][Homebrew] Enable Gandiva
  • ARROW-9088 - [Rust] Make prettyprint optional
  • ARROW-9089 - [Python] A PyFileSystem handler for fsspec-based filesystems
  • ARROW-9090 - [C++] Bump versions of bundled libraries
  • ARROW-9091 - [C++][Compute] Add default FunctionOptions
  • ARROW-9093 - [FlightRPC][C++][Python] expose generic gRPC transport options
  • ARROW-9094 - [Python] Bump versions of compiled dependencies in manylinux wheels
  • ARROW-9095 - [Rust] Spec-compliant NullArray
  • ARROW-9099 - [C++][Gandiva] Implement trim function for string
  • ARROW-9100 - [C++] Add ascii_lower kernel
  • ARROW-9101 - [Doc][C++] Document encoding expected for CSV data
  • ARROW-9102 - [Packaging] Upload built manylinux docker images
  • ARROW-9106 - [Python] Allow specifying CSV file encoding
  • ARROW-9108 - [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion
  • ARROW-9109 - [Python][Packaging] Enable S3 support in manylinux wheels
  • ARROW-9110 - [C++] Fix CPU cache size detection on macOS
  • ARROW-9112 - [R] Update autobrew script location
  • ARROW-9115 - [C++] Implementation of ascii_lower/ascii_upper by processing input data buffers in batch
  • ARROW-9116 - [C++][FOLLOWUP] Add 0-length test for BaseBinaryArray::total_values_length
  • ARROW-9116 - [C++] Add BaseBinaryArray::total_values_length
  • ARROW-9118 - [C++] Add more general BoundsCheck function that also checks for arbitrary lower limits in integer arrays
  • ARROW-9119 - [C++] Add support for building with system static gRPC
  • ARROW-9123 - [Python][wheel] Use libzstd.a explicitly
  • ARROW-9124 - [Rust][Datafusion] optimize DFParser::parse_sql to take query string as &str
  • ARROW-9125 - [C++] Add missing include for arrow::internal::ZeroMemory() for Valgrind
  • ARROW-9129 - [Python][JPype] Remove JPype version check
  • ARROW-9130 - [Python] Add deprecation wrapper for pyarrow.compat and guid function for Dask
  • ARROW-9131 - [C++] Faster ascii_lower and ascii_upper.
  • ARROW-9132 - [C++] Support Unique and ValueCounts on dictionary data with non-changing dictionaries, add ChunkedArray::Make validating constructor
  • ARROW-9133 - [C++] Add utf8_upper and utf8_lower
  • ARROW-9137 - [GLib] Add gparquet_arrow_file_reader_read_row_group()
  • ARROW-9138 - [Docs][Format] Make sure format version is hard coded in the docs
  • ARROW-9139 - [Python] Switch parquet.read_table to use new datasets API by default
  • ARROW-9144 - [CI] OSS-Fuzz build fails because recent changes in the google repository
  • ARROW-9145 - [C++] Implement BooleanArray::true_count and false_count, add Python bindings
  • ARROW-9152 - [C++] Specialized implementation of filtering Binary/LargeBinary-based types
  • ARROW-9153 - [Python] Add bindings for StructScalar
  • ARROW-9154 - [Developer] Use GitHub issue templates better
  • ARROW-9155 - [Archery] Less precise but faster default settings for "archery benchmark diff"
  • ARROW-9156 - [C++] Reducing the code size of the tensor module
  • ARROW-9157 - [Rust][Datafusion] create_physical_plan should take self as immutable reference
  • ARROW-9158 - [Rust][Datafusion] projection physical plan compilation should preserve nullability
  • ARROW-9159 - [Python] Implement Array.isnull/isvalid methods
  • ARROW-9162 - [Python] Expose Add/Subtract/Multiply arithmetic kernels
  • ARROW-9163 - [C++] Validate UTF8 contents of a StringArray
  • ARROW-9166 - [Website] Add overview page
  • ARROW-9167 - [Doc][Website] /docs/c_glib/index.html is overwritten
  • ARROW-9168 - [C++][Flight] Don't share TCP connection among clients
  • ARROW-9173 - [C++][Doc] Document how to use Arrow from a third-party CMake project
  • ARROW-9175 - [FlightRPC][C++] Expose peer to server
  • ARROW-9176 - [Rust] Fix for memory leaks in Arrow allocator
  • ARROW-9178 - [R] Improve documentation about CSV reader
  • ARROW-9179 - [R] Replace usage of iris dataset in tests
  • ARROW-9180 - [Developer] Remove usage of whitelist, blacklist, slave, etc.
  • ARROW-9181 - [C++] Instantiate fewer templates for cast kernels
  • ARROW-9182 - [C++] Use "applicator" namespace for some kernel execution functors. Streamline some applicator implementations
  • ARROW-9185 - [Java][Gandiva] Make llvm build optimisation configurable from java
  • ARROW-9188 - [C++] Use Brotli shared libraries if they are available
  • ARROW-9189 - [Website] Improve contributor guide
  • ARROW-9190 - [Website][C++] Add blog post on efforts to make building lighter and easier
  • ARROW-9191 - [Rust] Do not panic when milliseconds is less than zero as chrono can handle…
  • ARROW-9192 - [CI][Rust] Add support for running clippy
  • ARROW-9193 - [C++] Avoid spurious intermediate string copy in ToDateHolder
  • ARROW-9197 - [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size
  • ARROW-9201 - [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests
  • ARROW-9202 - [GLib] Add GArrowDatum
  • ARROW-9203 - [Packaging][deb] Add missing gir1.2-arrow-dataset-1.0.install
  • ARROW-9204 - [C++][Flight] Change records_per_stream to int64
  • ARROW-9206 - [C++][Flight] Add latency benchmark
  • ARROW-9207 - [Python] Clean-up internal FileSource class
  • ARROW-9210 - [C++] Use BitBlockCounter in array/visitor_inline.h
  • ARROW-9214 - [C++] Use separate functions for valid/not-valid values in VisitArrayDataInline
  • ARROW-9216 - [C++] Use BitBlockCounter for plain spaced encoding/decoding
  • ARROW-9217 - [C++] Cover 0.01% null for the plain spaced benchmark
  • ARROW-9220 - [C++] Make utf8proc optional even with ARROW_COMPUTE=ON
  • ARROW-9222 - [Format] Columnar.rst changes for removing validity bitmap from union types
  • ARROW-9224 - [Dev][Archery] clone local source with --shared
  • ARROW-9225 - [C++][Compute] Speed up counting sort
  • ARROW-9231 - [Format] Increment MetadataVersion from V4 to V5
  • ARROW-9234 - [GLib][CUDA] Add support for dictionary memo on reading record batch from buffer
  • ARROW-9241 - [C++] Add forward compatibility check for Decimal bit width
  • ARROW-9242 - [Java] Add forward compatibility check for Decimal bit width
  • ARROW-9247 - [Python] Expose total_values_length functions on BinaryArray, LargeBinaryArray
  • ARROW-9248 - [C++] Add "list_size" function that returns Int32Array/Int64Array giving list cell sizes
  • ARROW-9249 - [C++] Implement "list_parent_indices" vector function
  • ARROW-9250 - [C++] Instantiate fewer templates in IsIn, Match kernel implementations
  • ARROW-9251 - [C++] Relocate integration testing JSON code implementation to src/arrow/testing
  • ARROW-9254 - [C++] Split out CastNumberToNumberUnsafe function from scalar_cast_numeric, add data()/mutable_data() functions for accessing primitive scalar data opaquely
  • ARROW-9255 - [C++] Use CMake to build bundled Protobuf with CMake >= 3.7
  • ARROW-9256 - [C++] Incorrect variable name ARROW_CXX_FLAGS
  • ARROW-9258 - [FORMAT] Add V5 MetadataVersion to Schema.fbs
  • ARROW-9259 - [Format] Add language indicating that unsigned dictionary indices are supported but that signed integers are preferred
  • ARROW-9262 - [Packaging][Linux][CI] Use Ubuntu 18.04 to build ARM64 packages on Travis CI
  • ARROW-9263 - [C++] Promote compute aggregate benchmark size to 1M.
  • ARROW-9264 - [C++][Parquet] Refactor and modernize schema conversion code
  • ARROW-9265 - [C++] Allow writing and reading V4-compliant IPC data
  • ARROW-9268 - [C++] add string_is{alpnum,alpha...,upper} kernels
  • ARROW-9272 - [C++][Python] Reduce complexity in python to arrow conversion
  • ARROW-9276 - [Dev] Enable ARROW_CUDA when generating API documentations
  • ARROW-9277 - [C++] Fix docs of reading CSV files
  • ARROW-9278 - [C++][Python] Remove validity bitmap from Union types, update IPC read/write and integration tests
  • ARROW-9280 - [Rust][Parquet] Calculate page and column statistics
  • ARROW-9281 - [R] Turn off utf8proc in R builds
  • ARROW-9283 - [Python] Expose build info
  • ARROW-9287 - [C++] Support unsigned dictionary indices
  • ARROW-9289 - [R] Remove deprecated functions
  • ARROW-9290 - [Rust][Parquet] Add features to allow opting out of dependencies
  • ARROW-9291 - [R] : Support fixed size binary/list types
  • ARROW-9292 - [Doc] Remove Rust from feature matrix
  • ARROW-9294 - [GLib] Add GArrowFunction and related objects
  • ARROW-9300 - [Java] Separate Netty Memory to its own module
  • ARROW-9306 - [Ruby] Add support for Arrow::RecordBatch.new(raw_table)
  • ARROW-9307 - [Ruby] Add Arrow::RecordBatchIterator#to_a
  • ARROW-9308 - [Format] Add Feature enum for forward compatibility.
  • ARROW-9316 - [C++] Use "Dataset" instead of "Datasets"
  • ARROW-9321 - [C++][Dataset] Populate statistics opportunistically
  • ARROW-9322 - [R] Dataset documentation polishing
  • ARROW-9323 - [Ruby] Add Red Arrow Dataset
  • ARROW-9327 - [Rust] Fix all clippy errors for arrow crate
  • ARROW-9329 - [C++][Gandiva] Implement castTimestampToDate function in gandiva
  • ARROW-9331 - [C++] Improve the performance of Tensor-to-SparseTensor conversion
  • ARROW-9333 - [Python] Expose more IPC options
  • ARROW-9335 - [Website] Update website for 1.0
  • ARROW-9337 - [R] On C++ library build failure, give an unambiguous message
  • ARROW-9339 - [Rust] Comments on SIMD in Arrow README are incorrect
  • ARROW-9340 - [R] Use CRAN version of decor package
  • ARROW-9341 - [GLib] Use arrow::Datum version Take()
  • ARROW-9345 - [C++][Dataset] Support casting scalars to dictionary scalars
  • ARROW-9346 - [C++][Python][Dataset] Add total_byte_size metadata to RowGroupInfo
  • ARROW-9362 - [Java] Support reading/writing V5 MetadataVersion
  • ARROW-9365 - [Go] Added the rest of the implemented array builders to NewBuilder
  • ARROW-9370 - [Java] Bump Netty version
  • ARROW-9374 - [C++][Python] Expose MakeArrayFromScalar
  • ARROW-9379 - [Rust] Add support for unsigned dictionary keys
  • ARROW-9383 - [Python] Support fsspec filesystems in Dataset API
  • ARROW-9386 - [Rust] RecordBatch.schema() should not return &Arc<Schema>
  • ARROW-9390 - [C++][Followup] Add underscores to is* string functions
  • ARROW-9390 - [Doc] Add missing file
  • ARROW-9390 - [C++][Doc] Review compute function names
  • ARROW-9391 - [Rust] Padding added to arrays causes float32's to be incorrectly cast to float64 float64s in the case where a record batch only contains one row.
  • ARROW-9393 - [Doc] update supported types documentation for Java
  • ARROW-9395 - [Python] allow configuring MetadataVersion
  • ARROW-9399 - [C++] Add forward compatibility test to detect and raise error for future MetadataVersion
  • ARROW-9403 - [Python] add Array.tolist as alias of .to_pylist
  • ARROW-9407 - [Python] Recognize more pandas null sentinels in sequence type inference when converting to Arrow
  • ARROW-9411 - [Rust] Update dependencies
  • ARROW-9424 - [C++][Parquet] Disable writing files with LZ4 codec
  • ARROW-9425 - [Rust][DataFusion] Made ExecutionContext sharable and sync
  • ARROW-9427 - [Rust][DataFusion] Added ExecutionContext.tables()
  • ARROW-9437 - [Python][Packaging] Homebrew fails to install build dependencies in the macOS wheel builds
  • ARROW-9442 - [Python] Do not call Validate() in pyarrow_wrap_table
  • ARROW-9445 - [Python] Revert Array.equals changes + expose comparison ops in compute
  • ARROW-9446 - [C++] Add compiler id, version, and build flags to BuildInfo
  • ARROW-9447 - [Rust][DataFusion] Made ScalarUDF (Send + Sync)
  • ARROW-9452 - [Rust][DataFusion] Optimize ParquetScanExec
  • ARROW-9470 - [CI][Java] Run Maven in parallel
  • ARROW-9472 - [R] Provide configurable MetadataVersion in IPC API and environment variable to set default to V4 when needed
  • ARROW-9473 - [Doc] Polishing for 1.0
  • ARROW-9478 - [C++] Improve error message for unsupported casts
  • ARROW-9484 - [Docs] Update is* functions to be is_* in the compute docs
  • ARROW-9485 - [R] Better shared library stripping
  • ARROW-9493 - [Python] Enable dictionary encoding in read_table with datasets API
  • ARROW-9509 - [Release] Don't test Gandiva in the windows wheel verification script
  • ARROW-9511 - [Packaging][Release] Set conda packages' build number to 0
  • ARROW-9514 - [Python] The new Dataset API will not work with files on Azure Blob
  • ARROW-9519 - [Rust] Improved error message when getting a field by name.
  • ARROW-9529 - [Dev][Release] Improvements to release verification scripts
  • ARROW-9531 - [Packaging][Release] Update conda forge dependency pins
  • PARQUET-1820 - [C++] pre-buffer specified columns of row group
  • PARQUET-1843 - [C++] Drop duplicated assignment
  • PARQUET-1855 - [C++] Improve parquet *MetaData documentation
  • PARQUET-1861 - [Parquet][Documentation] Clarify buffered stream option

Apache Arrow 0.17.1 (2020-05-18)

Bug Fixes

  • ARROW-8503 - [Packaging][deb] Fix building apache-arrow-archive-keyring for RC
  • ARROW-8505 - [Release][C#] "sourcelink test" is failed by Apache.ArrowAssemblyInfo.cs
  • ARROW-8584 - [C++] Fix ORC link order
  • ARROW-8608 - [C++] Update vendored 'variant.hpp' to fix CUDA 10.2
  • ARROW-8609 - [C++] Fix ORC Java JNI crash
  • ARROW-8641 - [C++][Python] Sort included indices in IpcReader - Respect column selection in FeatherReader
  • ARROW-8657 - [C++][Python] Add separate configuration for data pages
  • ARROW-8684 - [Python] Workaround Cython type initialization bug
  • ARROW-8694 - [C++][Parquet] Relax string size limit when deserializing Thrift messages
  • ARROW-8704 - [C++] Fix Parquet undefined behaviour on invalid input
  • ARROW-8706 - [C++][Parquet] Tracking JIRA for PARQUET-1857 (unencrypted INT16_MAX Parquet row group limit)
  • ARROW-8728 - [C++] Fix bitmap operation buffer overflow
  • ARROW-8741 - [Python][Packaging] Keep VS2015 with for the windows wheels
  • ARROW-8750 - [Python] Correctly default to lz4 compression for Feather V2 in Python
  • PARQUET-1857 - [C++] Do not fail to read unencrypted files with over 32767 row groups. Change some DCHECKs causing segfaults to throw exceptions

New Features and Improvements

  • ARROW-7731 - [C++][Parquet] Support LargeListArray
  • ARROW-8501 - [Packaging][RPM] Upgrade devtoolset to 8 on CentOS 6
  • ARROW-8549 - [R] Assorted post-0.17 release cleanups
  • ARROW-8699 - [R] Fix automatic r_to_py conversion
  • ARROW-8758 - [R] Updates for compatibility with dplyr 1.0
  • ARROW-8786 - [Packaging][rpm] Use bundled zstd in the CentOS 8 build

Apache Arrow 0.17.0 (2020-04-20)

Bug Fixes

  • ARROW-1907 - [C++/Python] Feather format cannot accommodate string columns containing more than a total of 2GB of data
  • ARROW-2255 - [C++][Developer][Integration] Serialize custom field/schema metadata
  • ARROW-2587 - [Python][Parquet] Verify nested data can be written
  • ARROW-3004 - [Documentation] Builds docs for master rather than a pinned commit
  • ARROW-3543 - [R] Better support for timestamp format and time zones in R
  • ARROW-5265 - [Python][CI] Add integration test with kartothek
  • ARROW-5473 - [C++] Fix googletest_ep build failure on windows+ninja
  • ARROW-5981 - [C++] Propagate errors from MemoTable to DictionaryBuilder
  • ARROW-6528 - [C++] Spurious Flight test failures (port allocation failure)
  • ARROW-6547 - [C++] valgrind errors in diff-test
  • ARROW-6738 - [Java] Fix problems with current union comparison logic
  • ARROW-6757 - [Release] Use same CMake generator for C++ and Python when verifying RC, remove Python 3.5 from wheel verification
  • ARROW-6871 - [Java] Enhance TransferPair related parameters check and tests
  • ARROW-6872 - [Python] Fix empty table creation from schema with dictionary field
  • ARROW-6890 - [Rust] [Parquet] ArrowReader fails with seg fault
  • ARROW-6895 - [C++][Parquet] Do not reset dictionary in ByteArrayDictionaryRecordReader during incremental reads
  • ARROW-7008 - [C++] Check binary offsets and data buffers for nullness in validation. Produce valid arrays in DictionaryEncode on zero-length arrays
  • ARROW-7049 - [C++] Fix MinGW64 warning in FieldRef::Get
  • ARROW-7301 - [Java] Sql type DATE should correspond to DateDayVector
  • ARROW-7335 - [C++][Gandiva] Add day_time_interval functions: castBIGINT, extractDay
  • ARROW-7390 - [C++][Dataset] Fix RecordBatchProjector race
  • ARROW-7405 - [Java] ListVector isEmpty API is incorrect
  • ARROW-7466 - [CI][Java] Fix gandiva-jar-osx nightly build failure
  • ARROW-7467 - [Java] ComplexCopier does incorrect copy for Map nullable info
  • ARROW-7507 - [Rust] Bump Thrift version to 0.13 in parquet-format and parquet
  • ARROW-7520 - [R] Writing many batches causes a crash
  • ARROW-7546 - [Java] Use new implementation to concat vectors values in batch
  • ARROW-7624 - [Rust] Soundness issues via Buffer methods
  • ARROW-7628 - [Python] Clarify docs of csv reader skip_rows and nulls in strings
  • ARROW-7631 - [C++][Gandiva] return zero if there is an overflow while downscaling a decimal
  • ARROW-7672 - [C++] NULL pointer dereference bug
  • ARROW-7680 - [C++] Fix dataset.factory(...) with Windows paths
  • ARROW-7701 - [FlightRPC][C++] disable flaky MacOS test
  • ARROW-7713 - [Java] TastLeak was put at the wrong location
  • ARROW-7722 - [FlightRPC][Java] disable flaky Flight auth test
  • ARROW-7734 - [C++] check status details for nullptr in equality
  • ARROW-7740 - [C++] Fix StructArray::Flatten corruption
  • ARROW-7755 - [Python] Windows wheel cannot be installed on Python 3.8
  • ARROW-7758 - [Python] Safe cast to nanosecond timestamps in to_pandas conversion
  • ARROW-7760 - [Release] Fix verify-release-candidate.sh since pip3 seems to no longer be in miniconda, install miniconda unconditionally
  • ARROW-7762 - [Python] Do not ignore exception for invalid version in ParquetWriter
  • ARROW-7766 - [Python][Packaging] Windows py38 wheels are built with wrong ABI tag
  • ARROW-7772 - [R][C++][Dataset] Unable to filter on date32 object with date64 scalar
  • ARROW-7775 - [Rust] fix: Don't let safe code arbitrarily transmute readers and writers
  • ARROW-7777 - [Go] Fix StructBuilder and ListBuilder panics on index out of range
  • ARROW-7780 - [Release] Fix Windows wheel RC verification script given lack of "m" ABI tag in Python 3.8
  • ARROW-7781 - [C++] Improve message when referencing a missing field
  • ARROW-7783 - [C++] Set ARROW_COMPUTE=ON if ARROW_DATASET=ON
  • ARROW-7785 - [C++] Improve compilation performance of sparse tensor related code
  • ARROW-7786 - [R] Wire up check_metadata in Table.Equals method
  • ARROW-7789 - [R] Can't initialize arrow objects when R.oo package is loaded
  • ARROW-7791 - [C++][Parquet] Fix building error "cannot bind lvalue"
  • ARROW-7792 - [R] read_* functions should close connection to file
  • ARROW-7793 - [Java] Release accounted-for reservation memory to parent in case of leak
  • ARROW-7794 - [Rust][Flight] Remove hard-coded relative path to Flight.proto
  • ARROW-7794 - [Rust] Support releasing arrow-flight
  • ARROW-7797 - [Release][Rust] Fix arrow-flight's version in datafusion crate
  • ARROW-7802 - [C++][Python] Support LargeBinary and LargeString in the hash kernel
  • ARROW-7806 - [Python] Support LargeListArray and list<LargeBinaryArray> conversion to pandas.
  • ARROW-7807 - [R] Installation on RHEL 7 Cannot call io___MemoryMappedFile__Open()
  • ARROW-7809 - [R] vignette does not run on Win 10 nor ubuntu
  • ARROW-7813 - [Rust] Remove and fix unsafe code
  • ARROW-7815 - [C++] Improve input validation
  • ARROW-7827 - [Python] conda-forge pyarrow package does not have s3 enabled
  • ARROW-7832 - [R] Patches to 0.16.0 release
  • ARROW-7836 - [Rust] "allocate_aligned"/"reallocate" need to initialize memory to avoid UB
  • ARROW-7837 - [JAVA] copyFromSafe fails due to a bug in handleSafe
  • ARROW-7838 - [C++] Only link Boost libraries with tests, not libarrow.so
  • ARROW-7841 - [C++] Use ${HADOOP_HOME}/lib/native/ to find libhdfs.so again
  • ARROW-7844 - [R] Converter_List is not thread-safe
  • ARROW-7848 - [C++][Python][Doc] Add MapType API doc
  • ARROW-7852 - [Python] 0.16.0 wheels not compatible with older numpy
  • ARROW-7857 - [Python] Revert temporary changes to pandas extension array tests
  • ARROW-7861 - [C++][Parquet] Add fuzz regression corpus for parquet reader
  • ARROW-7884 - [C++] Relax concurrency rules around GetSize()
  • ARROW-7887 - [Rust] Add date/time/duration/timestamp types to filter kernel
  • ARROW-7889 - [Rust] Add support to datafusion-cli for parquet files.
  • ARROW-7899 - [Integration][Java] Fix Flight integration test client to verify each batch
  • ARROW-7908 - [R] Can't install package without setting LIBARROW_DOWNLOAD=true
  • ARROW-7922 - [CI][Crossbow] Nightly macOS wheel builds fail (brew bundle edition)
  • ARROW-7923 - [CI][Crossbow] macOS autobrew fails on homebrew-versions
  • ARROW-7926 - [Dev] Improve "archery lint" UI
  • ARROW-7928 - [Python] Update Python flight server and client examples for latest API
  • ARROW-7931 - [C++] Fix crash on corrupt Map array input (OSS-Fuzz)
  • ARROW-7936 - [Python] Fix and exercise tests on python 3.5
  • ARROW-7940 - [C++] Remove ARROW_USE_CLCACHE handling
  • ARROW-7944 - [Python] Test failures without Pandas
  • ARROW-7956 - [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas
  • ARROW-7958 - [Java] Update Avro to version 1.9.2
  • ARROW-7962 - [R][Dataset] Followup to "Consolidate Source and Dataset classes"
  • ARROW-7968 - [C++] orc_ep build fails on 64-bit Raspbian
  • ARROW-7973 - [Developer][C++] ResourceWarnings in run_cpplint.py
  • ARROW-7974 - [C++][Developer] Fix linter warnings when PYTHONDEVMODE enabled
  • ARROW-7975 - [C++] Preserve intended buffer size by default when writing to IPC format
  • ARROW-7978 - [Dev] Do not run IWYU in Github Actions "lint" workflow
  • ARROW-7980 - [Python] Fix creation of tz-aware datetime dtype on first pandas import
  • ARROW-7981 - [C++][Dataset] Fix compilation on gcc 5.4
  • ARROW-7985 - [C++] Fix builder capacity check
  • ARROW-7990 - [Developer][C++] Add option to run "archery lint --iwyu" on all C++ files, not just the ones that you changed. Add "match" option to iwyu.sh
  • ARROW-7992 - [C++] Fix MSVC warning (#6525)
  • ARROW-7996 - [Python] Error serializing empty pandas DataFrame with pyarrow
  • ARROW-7997 - [Python] Schema equals method with inconsistent docs in pyarrow
  • ARROW-7999 - [C++] Fix crash on corrupt List / Map array input
  • ARROW-8000 - [C++] Fix compilation on gcc 4.8
  • ARROW-8003 - [C++] Use CMAKE_C_COMPILER when building bundled bzip2
  • ARROW-8006 - [C++] Initialize spaced data when reading nulls from Parquet
  • ARROW-8007 - [Python] Remove unused and defunct assert_get_object_equal in plasma tests
  • ARROW-8008 - [C++/Python] Set Python3_FIND_FRAMEWORK=LAST
  • ARROW-8009 - [Java] Fix the hash code methods for BitVector
  • ARROW-8011 - [C++] Fix buffer size when reading Parquet data to Arrow
  • ARROW-8013 - [Python][Packaging] Fix building manylinux wheels
  • ARROW-8021 - [Python] Install test requirements including pandas in Appveyor
  • ARROW-8029 - [R] rstudio/r-base:3.6-centos7 GHA build failing on master
  • ARROW-8036 - [C++] Avoid gtest 1.10 deprecation warnings
  • ARROW-8042 - [Python] Clean up docstring and error message when creating ChunkedArray with no chunks
  • ARROW-8057 - [Python] Do not compare schema metadata in Schema.equals and Table.equals by default
  • ARROW-8070 - [C++] Cast segfaults on unsupported cast from list<binary> to utf8
  • ARROW-8071 - [GLib] Fix build error with configure
  • ARROW-8075 - [R] Loading R.utils after arrow breaks some arrow functions
  • ARROW-8088 - [C++][Dataset] Support dictionary partition columns
  • ARROW-8091 - [CI][Crossbow] Fix nightly homebrew and R failures
  • ARROW-8092 - [CI][Crossbow] OSX wheels fail on bundled bzip2
  • ARROW-8094 - [CI][Crossbow] Nightly valgrind test fails
  • ARROW-8095 - [C++] Add support for string dictionary value with length
  • ARROW-8098 - [Go] Avoid unsafe unsafe.Pointer usage
  • ARROW-8099 - [Integration] archery integration --with-LANG flags don't work
  • ARROW-8101 - [FlightRPC][Java] Fix null arrays in Flight with no buffers
  • ARROW-8102 - [Dev] Crossbow's version detection doesn't work in the comment bot's scenario
  • ARROW-8105 - [Python] Fix segfault when shrunken masked array is passed to pyarrow.array
  • ARROW-8106 - [Python] Ensure extension array conversion tests passes with latest pandas
  • ARROW-8110 - [C#] BuildArrays fails if NestedType is included
  • ARROW-8112 - [FlightRPC][C++] make sure status codes round-trip through gRPC
  • ARROW-8119 - [Dev] Make Yaml optional dependency for archery
  • ARROW-8122 - [Python] Empty numpy arrays with shape cannot be deserialized
  • ARROW-8125 - [C++] Restore link between tests created with add_arrow_test and arrow-tests target
  • ARROW-8127 - [C++][Parquet] Incorrect column chunk metadata for multipage batch writes
  • ARROW-8128 - [C#] NestedType children serialized on wrong length
  • ARROW-8132 - [C++] Fix S3FileSystem tests on Windows
  • ARROW-8133 - [CI] Github Actions sometimes fail to checkout Arrow
  • ARROW-8136 - [Python] More robust inference of local relative path in dataset
  • ARROW-8136 - [Python] Restore creating a dataset from a relative path
  • ARROW-8138 - [C++] parquet::arrow::FileReader cannot read multiple RowGroup
  • ARROW-8139 - [C++] FileSystem enum causes attributes warning
  • ARROW-8142 - [C++][Compute] Explicit no chunks case for WrapDatumsLike
  • ARROW-8144 - [CI] Cmake 3.2 nightly build fails
  • ARROW-8154 - [Python] HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release
  • ARROW-8159 - [Python] Support pandas.ExtensionDtype in Schema.from_pandas
  • ARROW-8166 - [C++] fix AVX512 intrinsics fail with clang-8
  • ARROW-8176 - [FlightRPC] bind to a free port for integration tests
  • ARROW-8186 - [Python] Fix dataset expression operation with invalid scalar
  • ARROW-8188 - [R] Adapt to latest checks in R-devel
  • ARROW-8193 - [C++] Fix gcc 4.8 compilation error with non-copyable types in Iterator<T>::ToVector
  • ARROW-8197 - [Rust][DataFusion] Fix schema returned by physical plan
  • ARROW-8206 - [R] Minor fix for backwards compatibility on Linux installation
  • ARROW-8209 - [Python] Improve error message when trying to access duplicate Table column
  • ARROW-8213 - [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message
  • ARROW-8216 - [C++][Compute] Filter out nulls by default
  • ARROW-8217 - [R] Unskip previously failing test on Win32 in test-dataset.R from ARROW-7979
  • ARROW-8219 - [Rust] sqlparser crate needs to be bumped to version 0.2.5
  • ARROW-8223 - [Python] Schema.from_pandas breaks with pandas nullable integer dtype
  • ARROW-8233 - [CI][GLib][R] Fix timeount on MinGW
  • ARROW-8234 - [CI] Build timeouts on "AMD64 Windows RTools 35"
  • ARROW-8236 - [Rust] Linting GitHub Actions task failing
  • ARROW-8237 - [Python][Documentation] Minor corrections to python minimal build documentation
  • ARROW-8237 - [Python][Documentation] Review Python developer documentation, add Dockerfile showing minimal source build with conda and pip/virtualenv
  • ARROW-8238 - [C++] Fix FieldPath type definition
  • ARROW-8239 - [Java] fix param checks in splitAndTransfer method
  • ARROW-8245 - [Python][Parquet] Skip hidden directories when reading partitioned parquet files
  • ARROW-8254 - [Rust] [DataFusion] CLI is not working as expected
  • ARROW-8255 - [Rust][DataFusion] Bug fix for COUNT(*)
  • ARROW-8259 - [Rust][DataFusion] ProjectionPushDown now respects LIMIT
  • ARROW-8268 - [CI][Ruby] Enable Zstandard on Ubuntu 16.04
  • ARROW-8269 - [Python] Add pandas mark to test_parquet_row_group_fragments to fix nopandas build
  • ARROW-8270 - [Python][Flight] Update Python server example to support TLS
  • ARROW-8272 - [CI][Python] Fix test failure on Python 3.5
  • ARROW-8274 - [C++] Use LZ4 frame format for "LZ4" compression in IPC
  • ARROW-8276 - [C++][Dataset] Use Scanner for Fragment.to_table
  • ARROW-8280 - [C++] Use c-ares_INCLUDE_DIR
  • ARROW-8286 - [Python] Ensure to create FileSystemDataset when passing pathlib path
  • ARROW-8298 - [C++][MinGW] Fix gRPC detection
  • ARROW-8303 - [Python] Fix test failure on Python 3.5 caused by non-deterministic dict key ordering
  • ARROW-8304 - [Flight][Python] Fix client example with TLS
  • ARROW-8305 - [Java] ExtensionTypeVector should make sure underlyingVector not null
  • ARROW-8310 - [C++] Improve auto-retry in S3 tests
  • ARROW-8315 - [Python] Fix dataset tests on Python 3.5
  • ARROW-8323 - [C++] Add pragmas wrapping proto_utils.h to disable conversion warnings
  • ARROW-8326 - [C++] Use TYPED_TEST_SUITE instead of deprecated TYPED_TEST_CASE
  • ARROW-8327 - [FlightRPC][Java] check gRPC trailers for null
  • ARROW-8331 - [C++] Fix filter_benchmark.cc compilation
  • ARROW-8333 - [C++] Compile benchmarks in at least one C++ CI entry
  • ARROW-8334 - [C++][Gandiva] Missing DATE32 in LLVM Types
  • ARROW-8342 - [Python] Continue to return dict from "metadata" properties accessing KeyValueMetadata
  • ARROW-8345 - [Python] Ensure feather read/write can work without pandas installed
  • ARROW-8346 - [CI][GLib] Follow pkg-config change in Homebrew
  • ARROW-8349 - [CI][NIGHTLY:gandiva-jar-osx] Use latest pygit2
  • ARROW-8353 - [C++] Fix some compiler warnings in release builds
  • ARROW-8354 - [R] Fix segfault in Table to Array conversion
  • ARROW-8357 - [Rust][DataFusion] Add format dir to dockerfile for CLI
  • ARROW-8358 - [C++] Fix some clang-11 compiler warnings
  • ARROW-8365 - [C++] Error when writing files to S3 larger than 5 GB
  • ARROW-8366 - [Rust][Rust] Support releasing arrow-flight"
  • ARROW-8369 - [CI] Fix crossbow wildcard groups
  • ARROW-8373 - [CI][GLib] Find gio-2.0 manually on macOS
  • ARROW-8380 - Export StringDictionaryBuilder from arrow::array crate
  • ARROW-8384 - [Python][C++] Allow configuring Kerberos ticket cache path
  • ARROW-8386 - [Python] Fix error when pyarrow.jvm gets an empty vector
  • ARROW-8388 - [C++][CI] Ensure Arrow compiles with GCC 4.8
  • ARROW-8397 - [C++] Fail to compile aggregate_test.cc on Ubuntu 16.04
  • ARROW-8406 - [C++][Python] Fix file URI handling
  • ARROW-8410 - [C++] Fix compilation errors on modest ARMv8 platforms (rockpro64, rpi4)
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8414 - [Python] Fix non-deterministic row order failure in parquet tests
  • ARROW-8415 - [C++][Packaging] Fix gandiva linux job
  • ARROW-8416 - [Python] Add feather alias for ipc format in dataset API
  • ARROW-8420 - [C++] Distinguish ARMv7 from ARMv8 in SetupCxxFlags.cmake
  • ARROW-8427 - [C++][Dataset] Only apply ignore_prefixes to selector results
  • ARROW-8428 - [C++] GCC 4.8 Implicit move-on-return failure in C++ tests
  • ARROW-8429 - [C++] Implement missing checks in IPC MessageDecoder
  • ARROW-8432 - [CI] Don't depend on a single apache mirror for dependencies
  • ARROW-8437 - [C++] Remove std::move return value from MakeRandomNullBitmap test utility
  • ARROW-8438 - [C++] Fix crash in io-memory-benchmark
  • ARROW-8439 - [Python] Update options usage in S3FileSystem docs
  • ARROW-8441 - [C++] Check invalid input in ipc::MessageDecoder
  • ARROW-8442 - [Python] Change NullType.to_pandas_dtype to return object instead of float64
  • ARROW-8460 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8465 - [Packaging][Python] Windows py35 wheel build fails because of boost
  • ARROW-8466 - [Packaging] The python unittests are not running in the windows wheel builds
  • ARROW-8468 - [C++][Documentation] Fix the incorrect null bits description
  • ARROW-8469 - [Dev] Fix nightly docker tests on azure
  • ARROW-8478 - [Java] Revert "ARROW-7534
  • ARROW-8498 - [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works
  • PARQUET-1780 - [C++] Set ColumnMetadata.encoding_stats field
  • PARQUET-1788 - Remove UBSan when rep/dev levels are null
  • PARQUET-1797 - [C++] Fix fuzzer issues
  • PARQUET-1799 - [C++] Stream API: Relax schema checking when reading
  • PARQUET-1810 - [C++] Fix undefined behaviour on invalid enum values (OSS-Fuzz)
  • PARQUET-1813 - [C++] Remove debug print statement from parquet-arrow-schema-test
  • PARQUET-1819 - [C++] Refactor decoding
  • PARQUET-1819 - [C++] Fix crashes on invalid input
  • PARQUET-1823 - [C++] Invalid RowGroup returned by parquet::arrow::FileReader
  • PARQUET-1824 - [C++] Fix crashes and undefined behaviour on invalid input
  • PARQUET-1829 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1831 - [C++] Fix crashes on invalid input (OSS-Fuzz)
  • PARQUET-1835 - [C++] Fix crashes on invalid input

New Features and Improvements

  • ARROW-590 - [Integration][C++] Implement union types
  • ARROW-1470 - [C++] Add BufferAllocator abstract interface
  • ARROW-1560 - [C++] Kernel implementations for "match" function
  • ARROW-1571 - [C++][Compute] Optimize sorting integers in small value range
  • ARROW-1581 - [Packaging] Tooling to make nightly wheels available for install
  • ARROW-1582 - [Python] Set up + document nightly conda builds for macOS
  • ARROW-1636 - [C++][Integration] Implement integration test parsing in C++ for null type, add integration test data generation
  • ARROW-2447 - [C++] Device and MemoryManager API
  • ARROW-2882 - [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
  • ARROW-3054 - [Packaging] Tooling to enable nightly conda packages to be updated to some anaconda.org channel
  • ARROW-3410 - [C++][Python] Add streaming CSV reader.
  • ARROW-3750 - [R] Pass various wrapped Arrow objects created in Python into R with zero copy via reticulate
  • ARROW-4120 - [Python] Testing utility for checking for "macro" memory leaks detectible with psutil.Process
  • ARROW-4226 - [C++] Add sparse CSF tensor support
  • ARROW-4286 - [C++/R] Namespace vendored Boost
  • ARROW-4304 - [Rust] Enhance documentation for arrow
  • ARROW-4428 - [R] Feature flags for R build
  • ARROW-4482 - [Website] Add blog archive page
  • ARROW-4815 - [Rust][DataFusion] Add support for SQL wilcard operator
  • ARROW-5357 - [Rust] Change Buffer::len to represent total bytes instead of used bytes
  • ARROW-5405 - [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript
  • ARROW-5497 - [Release] Build and publish R/Java/JS docs
  • ARROW-5501 - [R] Reorganize read/write file/stream functions
  • ARROW-5510 - [C++][Python][R][GLib] Implement Feather "V2" using Arrow IPC file format
  • ARROW-5563 - [Format] Update integration test JSON format documentation
  • ARROW-5585 - [Go] Rename TypeEquals to TypeEqual
  • ARROW-5742 - [CI][C++] Add nightly Valgrind build
  • ARROW-5757 - [Python] Remove Python 2.7 support
  • ARROW-5949 - [Rust] Implement Dictionary Array
  • ARROW-6165 - [Integration] Run integration tests on multiple cores
  • ARROW-6176 - [Python] Basic implementation of arrow_ext_class, in pure Python
  • ARROW-6275 - [C++] Deprecate RecordBatchReader::ReadNext
  • ARROW-6393 - [C++] Add EqualOptions support in SparseTensor::Equals
  • ARROW-6479 - [C++] Inline errors from externalprojects on failure
  • ARROW-6510 - [Python][Filesystem] Expose nanosecond resolution mtime
  • ARROW-6666 - [Rust] Datafusion parquet string literal support
  • ARROW-6724 - [C++] Allow simpler BufferOutputStream creation
  • ARROW-6821 - [C++][Parquet] Do not require Thrift compiler when building (but still require library)
  • ARROW-6823 - [C++][Python][R] Support metadata in the feather format?
  • ARROW-6829 - [Docs] Migrate integration test docs to Sphinx, fix instructions after ARROW-6466
  • ARROW-6837 - [C++] Add APIs to read and write "custom_metadata" field of IPC file footer
  • ARROW-6841 - [C++] Migrate to LLVM 8
  • ARROW-6875 - [FlightRPC] implement criteria for ListFlights
  • ARROW-6915 - [Developer] Do not overwrite point release fix versions with merge tool
  • ARROW-6947 - [Rust][DataFusion] Scalar UDF support
  • ARROW-6996 - [Python] Expose boolean filter kernel on ChunkedArray/RecordBatch/Table
  • ARROW-7044 - [Release] Create a post release script for the home-brew formulas
  • ARROW-7048 - [Java] Support for combining multiple vectors under VectorSchemaRoot
  • ARROW-7063 - [C++][Python] Add metadata output and toggle in PrettyPrint, add pyarrow.Schema.to_string, disable metadata output by default
  • ARROW-7073 - [Java] Support concating vectors values in batch
  • ARROW-7080 - [C++][Parquet] Read and write "field_id" attribute in Parquet files, propagate to Arrow field metadata. Assorted additional changes
  • ARROW-7091 - [C++] Move DataType factory decls to type_fwd.h
  • ARROW-7119 - [C++][CI] Show automatic backtraces
  • ARROW-7201 - [GLib][Gandiva] Add support for BooleanNode
  • ARROW-7202 - [R][CI] Improve rwinlib building on CI to stop re-downloading dependencies
  • ARROW-7222 - [Python][Release] Wipe any existing generated Python API documentation when updating website
  • ARROW-7233 - [C++] Use Result<T> in remaining value-returning IPC APIs
  • ARROW-7256 - [C++] Remove ARROW_MEMORY_POOL_DEFAULT macro
  • ARROW-7330 - [C++] Migrate Arrow Cuda to Result<T>
  • ARROW-7332 - [C++][Python] Propagate Arrow Status through Parquet errors
  • ARROW-7336 - [C++][Compute] fix minmax kernel options
  • ARROW-7338 - [C++] Improve InMemoryDataSource to support generator instead of static list
  • ARROW-7365 - [Python] Convert FixedSizeList in to_pandas
  • ARROW-7373 - [C++][Dataset] Remove FileSource
  • ARROW-7400 - [Java] Avoid the worst case for quick sort
  • ARROW-7412 - [C++][Dataset] Provide FieldRef to disambiguate field references
  • ARROW-7419 - [Python] Support SparseCSCMatrix
  • ARROW-7427 - [Python] Support SparseCSFTensor
  • ARROW-7428 - [Format][C++] Add serialization for CSF sparse tensors
  • ARROW-7444 - [GLib] Add LocalFileSystem support
  • ARROW-7462 - [C++] Add CpuInfo detection for Arm64 Architecture
  • ARROW-7491 - [Java] Improve the performance of aligning
  • ARROW-7499 - [C++] CMake should collect libs when making static build
  • ARROW-7501 - [C++] CMake build_thrift should build flex and bison if necessary
  • ARROW-7515 - [C++] Rename nonexistent and non_existent to not_found
  • ARROW-7524 - [C++][CI] Enable Parquet in the VS2019 GHA job
  • ARROW-7530 - [Developer] Do not include list of PR commits in commit message when using PR merge tool
  • ARROW-7534 - [Java] Create a new java/contrib module
  • ARROW-7547 - [C++][Dataset][Python] Add ParquetFileFormat options
  • ARROW-7555 - [Python] Drop support for python 2.7
  • ARROW-7587 - [C++][Compute] Implement nth_to_indices kernel
  • ARROW-7608 - [C++][Dataset] Add the ability to list files in FileSystemSource
  • ARROW-7615 - [CI][Gandiva] Ensure gandiva_jni library has only a whitelisted set of shared dependencies
  • ARROW-7616 - [Java] Support comparing value ranges for dense union vector
  • ARROW-7625 - [Parquet][GLib] Add support for writer properties
  • ARROW-7641 - [R] Make dataset vignette have executable code:
  • ARROW-7662 - [R] Support creating ListArray from R list
  • ARROW-7664 - [C++] Rework FileSystemFromUri
  • ARROW-7675 - [R][CI] Move Windows CI from Appveyor to GHA
  • ARROW-7679 - [R] Cleaner interface for creating UnionDataset
  • ARROW-7684 - [Rust] Example Flight client and server for DataFusion
  • ARROW-7685 - [Developer] Add support for GitHub Actions to Crossbow
  • ARROW-7691 - [C++] Check non-scalar Flatbuffers fields are not null
  • ARROW-7708 - [Developer][Release] Include PARQUET issues in release changelogs by scraping git history
  • ARROW-7712 - [CI][Crossbow] Delete fuzzit jobs
  • ARROW-7720 - [C++][Python] Add check_metadata argument to Table.equals
  • ARROW-7725 - [C++] Add infrastructure for unity builds and precompiled headers
  • ARROW-7726 - [CI][C++] Use boost binaries on Windows GHA build
  • ARROW-7729 - [Python][CI] Pin pandas version to 0.25 in the dask integration test
  • ARROW-7733 - [Developer] Download new enough Go locally in release verification script
  • ARROW-7735 - [Release][Python] Use pip to install dependencies for wheel verification
  • ARROW-7736 - [Release] Retry binary download on transient error
  • ARROW-7739 - [GLib] Use placement new to initialize shared_ptr object in private structs
  • ARROW-7741 - [C++] Adds parquet write support for nested types
  • ARROW-7742 - [GLib] Add support for MapArray
  • ARROW-7745 - [Doc][C++] Update Parquet documentation
  • ARROW-7749 - [C++] Link more tests together
  • ARROW-7750 - [Release] Make the source release verification script restartable
  • ARROW-7751 - [Release] macOS wheel verification also needs arrow-testing
  • ARROW-7752 - [Release] Enable and test dataset in the verification script
  • ARROW-7754 - [C++] Make Result<> faster
  • ARROW-7761 - [C++][Python] Support S3 URIs
  • ARROW-7764 - [C++] Don't keep a null bitmap in ArrayData if null_count == 0
  • ARROW-7771 - [Developer] Use ARROW_TMPDIR environment variable in the verification scripts instead of TMPDIR
  • ARROW-7774 - [Packaging][Python] Update macos and windows wheel filenames
  • ARROW-7787 - [Rust] Added .collect to Table API
  • ARROW-7788 - [C++][Parquet] Enable Arrow Schema to Parquet Schema for missing types
  • ARROW-7790 - [Website] Update how to install Linux packages
  • ARROW-7795 - [Rust] Added support for NOT
  • ARROW-7796 - [R] write_* functions should invisibly return their inputs
  • ARROW-7799 - [R][CI] Remove flatbuffers from homebrew formulae
  • ARROW-7804 - [C++][R] Compile error on macOS 10.11
  • ARROW-7812 - [Packaging][Python] Use LLVM 8 in manylinux1 wheels
  • ARROW-7817 - [CI] macOS R autobrew nightly failed on installing dependency from source
  • ARROW-7819 - [C++][Gandiva] Add DumpIR to Filter/Projector object
  • ARROW-7824 - [C++][Dataset] WriteFragments to disk
  • ARROW-7828 - [Release] Remove SSH keys for internal use
  • ARROW-7829 - [R] Test R bindings on clang
  • ARROW-7833 - [R] Make install_arrow() actually install arrow
  • ARROW-7834 - [Release] Post release task for updating the documentations
  • ARROW-7839 - [Python][Dataset] Expose IPC format in python bindings
  • ARROW-7846 - [Python][Dev] Remove dependencies on six
  • ARROW-7847 - [Website] Write a blog post about fuzzing
  • ARROW-7849 - [Packaging][Python] Remove the remaining py27 crossbow wheel tasks from the nightlies
  • ARROW-7858 - [C++][Python] Support casting from ExtensionArray
  • ARROW-7859 - [R] Minor patches for CRAN submission 0.16.0.2
  • ARROW-7860 - [C++] Support cast to/from halffloat
  • ARROW-7862 - [R] Linux installation should run quieter by default
  • ARROW-7863 - [C++][Python][CI] Ensure running HDFS related tests
  • ARROW-7864 - [R] Make sure bundled installation works even if there are system packages
  • ARROW-7865 - [R] Test builds on latest Linux versions
  • ARROW-7868 - [Crossbow] Reduce GitHub API query parallelism
  • ARROW-7869 - [Python] Remove boost::system and boost::filesystem from Python wheels
  • ARROW-7872 - [C++/Python] Support conversion of list of structs to pandas
  • ARROW-7874 - [Python][Archery] Validate docstrings with numpydoc
  • ARROW-7876 - [R] Installation fails in the documentation generation image
  • ARROW-7877 - [Packaging] Fix crossbow deployment to github artifacts
  • ARROW-7879 - [C++][Doc] Add doc for the Device API
  • ARROW-7880 - [CI][R] R sanitizer job is not really working
  • ARROW-7881 - [C++] Fix -Wpedantic warnings
  • ARROW-7882 - [C++][Gandiva] Optimise like function for substring pattern
  • ARROW-7886 - [C++][Dataset][Python][R] Consolidate Source and Dataset classes
  • ARROW-7888 - [Python] Update pyarrow.jvm to support jpype 0.7+
  • ARROW-7890 - [C++] Add Future implementation
  • ARROW-7891 - [C++][GLib][Python][R] Make uniform use of check_metadata=false default. Add Py/R/GLib bindings for RecordBatch::Equals with check_metadata
  • ARROW-7892 - [Python] Add FileSystemDataset.format attribute
  • ARROW-7895 - [Python] Remove more python 2.7 cruft
  • ARROW-7896 - [C++] Refactor from #include guards to #pragma once
  • ARROW-7897 - [Packaging] Temporarily disable artifact uploading until we fix the deployment issues
  • ARROW-7898 - [Python] Reduce the number docstring violations using numpydoc
  • ARROW-7904 - [C++][Python] Revamp metadata display, change show_metadata to verbose_metadata
  • ARROW-7907 - [Python] Add test case for previously failing code involving slicing a 0-length ChunkedArray
  • ARROW-7912 - [Format] C data interface
  • ARROW-7913 - [C++][Python][R] C++ implementation of C data interface
  • ARROW-7915 - [CI][Python] Enable development mode in tests
  • ARROW-7916 - [C++] Project IPC batches to materialized fields only
  • ARROW-7917 - [C++] Find Python 3 in CMake configuration
  • ARROW-7919 - [R] install_arrow() should conda install if appropriate
  • ARROW-7920 - [R] Fill in some missing input validation
  • ARROW-7921 - [Go] Add Reset method to various components and clean up comments.
  • ARROW-7927 - [C++] Fix 'cpu_info.cc' compilation warning.
  • ARROW-7929 - [C++] Align CMake target names to upstreams
  • ARROW-7930 - [CI][Python] Test jpype integration
  • ARROW-7932 - [Rust] implement array_reader for temporal types
  • ARROW-7934 - [C++] Fix UriEscape for empty string
  • ARROW-7935 - [Java] Remove Netty dependency for BufferAllocator and ReferenceManager
  • ARROW-7937 - [Python][Packaging] Remove boost from the macos wheels
  • ARROW-7941 - [Rust][DataFusion] Add support for named columns in logical plan
  • ARROW-7943 - [C++][Parquet] Add code to generate rep/def levels for nested arrays
  • ARROW-7947 - [Rust][Flight][DataFusion] Implement get_schema example
  • ARROW-7949 - [Git] Ignore macOS specific file: 'Brewfile.lock.json'
  • ARROW-7951 - [Python] Expose BYTE_STREAM_SPLIT in pyarrow
  • ARROW-7959 - [Ruby] Add support for Ruby 2.3 again
  • ARROW-7963 - [C++][Dataset][Python] Expose Dataset Fragments to Python
  • ARROW-7965 - [Python] Refine higher level dataset API
  • ARROW-7966 - [FlightRPC][C++] Validate individual batches in integration
  • ARROW-7969 - [Packaging] Use cURL to upload artifacts
  • ARROW-7970 - [Packaging][Python] Use system boost to build the macOS wheels
  • ARROW-7971 - [Rust] Create rowcount utility
  • ARROW-7977 - [C++] Rename fs::FileStats to fs::FileInfo
  • ARROW-7979 - [C++] Add experimental buffer compression to IPC write path. Add "field" selection to read path. Migrate some APIs to Result<T>. Read/write Message metadata
  • ARROW-7982 - [C++] Add function VisitArrayDataInline() helper
  • ARROW-7983 - [CI][R] Nightly builds should be more verbose when they fail
  • ARROW-7984 - [R] Check for valid inputs in more places
  • ARROW-7986 - [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector
  • ARROW-7987 - [CI][R] Fix for verbose nightly builds
  • ARROW-7988 - [R] Fix on.exit calls in reticulate bindings
  • ARROW-7991 - [C++][Plasma] Allow option for evicting if full when creating an object
  • ARROW-7993 - [Java] Support decimal type in ComplexCopier
  • ARROW-7994 - [CI][C++][GLib][Ruby] Move MinGW CI to GitHub Actions from AppVeyor
  • ARROW-7995 - [C++] Add facility to coalesce and cache reads
  • ARROW-7998 - [C++][Plasma] Make Seal requests synchronous
  • ARROW-8005 - [Tools] Update apache mirror links
  • ARROW-8014 - [C++] Provide CMake targets exercising tests with a label
  • ARROW-8016 - [Developer] Fix jira-python deprecation warning in merge_arrow_pr.py
  • ARROW-8018 - [C++][Parquet]Parquet Modular Encryption
  • ARROW-8024 - [R] Bindings for BinaryType and FixedSizeBinaryType
  • ARROW-8026 - [Python] Support memoryview as a value type for creating binary-like arrays
  • ARROW-8027 - [Integration] Add test case for duplicated field names
  • ARROW-8028 - [Go] Allow duplicate field names in schemas and nested types
  • ARROW-8030 - [Plasma] Uniform comments style
  • ARROW-8035 - [Developer][Integration] Add integration tests for extension types
  • ARROW-8039 - [Python] Use dataset API in existing parquet readers and tests
  • ARROW-8044 - [CI][NIGHTLY:gandiva-jar-osx] Pin pygit2 at 1.0.3 for OSX
  • ARROW-8055 - [GLib][Ruby] Add some metadata bindings to GArrowSchema
  • ARROW-8058 - [Dataset] Relax DatasetFactory discovery validation
  • ARROW-8059 - [Python] Make FileSystem objects serializable
  • ARROW-8060 - [Python] Make dataset Expression objects serializable
  • ARROW-8061 - [C++][Dataset] Provide RowGroup fragments for ParquetFileFormat
  • ARROW-8063 - [Python][Dataset] Start user guide for pyarrow.dataset
  • ARROW-8064 - [Dev] Implement Comment bot via Github actions
  • ARROW-8069 - [C++] Should the default value of "check_metadata" arguments of Equals methods be "true"?
  • ARROW-8072 - [Plasma] Add const for plasma protocol
  • ARROW-8077 - [Python][Packaging] Add Windows Python 3.5 wheel build script
  • ARROW-8079 - [Python] Implement a wrapper for KeyValueMetadata, duck-typing dict where relevant
  • ARROW-8080 - [C++] Add ARROW_SIMD_LEVEL option
  • ARROW-8082 - [Plasma] Add JNI list() interface
  • ARROW-8083 - [GLib] Add support for Peek() to GIOInputStream
  • ARROW-8086 - [Java] Support writing decimal from big endian byte array in UnionListWriter
  • ARROW-8087 - [C++][Dataset] Partitioning schema fields follow paths' segment ordering
  • ARROW-8096 - [C++][Gandiva] fix TreeExprBuilder::MakeNull to create node for interval type
  • ARROW-8097 - [Dev] Comment bot's crossbow command acts on the master branch
  • ARROW-8103 - [R] Make default Linux build more minimal
  • ARROW-8104 - [C++] Don't install bundled Thrift
  • ARROW-8107 - [Packaging][APT] Use HTTPS for LLVM APT repository for Debian GNU/Linux stretch
  • ARROW-8109 - [Packaging][APT] Drop support for Ubuntu Disco
  • ARROW-8117 - [Datafusion][Rust] allow cast SQLTimestamp to Timestamp
  • ARROW-8118 - [R] dim method for FileSystemDataset
  • ARROW-8120 - [Packaging][APT] Add support for Ubuntu Focal
  • ARROW-8123 - [Rust][DataFusion] Add LogicalPlanBuilder
  • ARROW-8124 - [Rust] Update library dependencies
  • ARROW-8126 - [C++][Compute] Add nth-to-indices kernel benchmark
  • ARROW-8129 - [C++][Compute] Refine compare sort kernel
  • ARROW-8130 - [C++][Gandiva] fix dex visitor to handle interval type
  • ARROW-8140 - [Dev] Follow class name change
  • ARROW-8141 - [C++] speed unpack1_32 using intrinsics API
  • ARROW-8145 - [C++] Rename FileSystem::GetTargetInfos to GetFileInfo
  • ARROW-8146 - [C++] Add per-filesystem facility to sanitize a path
  • ARROW-8150 - [Rust] Allow writing custom FileMetaData k/v pairs
  • ARROW-8151 - [Dataset][Benchmarking] benchmark S3File performance
  • ARROW-8153 - [Packaging] Update the conda feedstock files and upload artifacts to Anaconda
  • ARROW-8158 - [Java] Getting length of data buffer and base variable width vector
  • ARROW-8164 - [C++][Dataset] Provide Dataset::ReplaceSchema()
  • ARROW-8165 - [Packaging] Make nightly wheels available on a PyPI server
  • ARROW-8167 - [CI] Add support for skipping builds with skip pattern in pull request title
  • ARROW-8168 - [Java][Plasma] Improve Java Plasma client off-heap memory usage
  • ARROW-8177 - [rust] Make schema_to_fb_offset public because it is very useful!
  • ARROW-8178 - [C++] Update to Flatbuffers 1.12.0
  • ARROW-8179 - [R] Windows build script tweaking for nightly packaging on GHA
  • ARROW-8181 - [Java][FlightRPC] Expose transport error metadata
  • ARROW-8182 - [Packaging] Increment the version number detected from the latest git tag
  • ARROW-8183 - [C++][Python][FlightRPC] Expose transport error metadata
  • ARROW-8184 - [Packaging] Use arrow-nightlies organization name on Anaconda and Gemfury to host the nightlies
  • ARROW-8185 - [Packaging] Document the available nightly wheels and conda packages
  • ARROW-8187 - [R] Make test assertions robust to i18n
  • ARROW-8191 - [Packaging][APT] Fix cmake removal in Debian GNU/Linux Stretch
  • ARROW-8192 - [C++] script for unpack avx512 intrinsics code
  • ARROW-8194 - [CI] Run tests in parallel on Github Actions
  • ARROW-8195 - [CI][C++][MSVC] Use preinstalled Boost
  • ARROW-8198 - [C++] Format Diff of NullArrays
  • ARROW-8200 - [GLib] Rename garrow_file_system_target_info{,s}() to ...fileinfo{,s}()
  • ARROW-8203 - [C#] Use the latest SourceLink
  • ARROW-8204 - [Rust][DataFusion] Add support for aliased expressions in SQL
  • ARROW-8207 - [Packaging][wheel] Use LLVM 8 in manylinux2010 and manylinux2014
  • ARROW-8215 - [CI][GLib] Fix install error on macOS
  • ARROW-8218 - [C++] Decompress record batch messages in parallel at field level. Only allow LZ4_FRAME, ZSTD compression
  • ARROW-8220 - [Python] Make dataset FileFormat objects serializable
  • ARROW-8222 - [C++] Use bcp to make a slim boost for bundled build
  • ARROW-8224 - [C++] Remove APIs deprecated prior to 0.16.0
  • ARROW-8225 - [Rust] Continuation marker check was in wrong location.
  • ARROW-8225 - [Rust] Rust Arrow IPC reader must respect continuation markers.
  • ARROW-8227 - [C++] Refine SIMD feature definitions
  • ARROW-8231 - [Rust] Parse parquet key_value_metadata
  • ARROW-8232 - [Python] Deprecate pyarrow.open_stream and pyarrow.open_file APIs in favor of accessing via pyarrow.ipc namespace
  • ARROW-8235 - [C++][Compute] Filter out nulls by default
  • ARROW-8241 - [Rust] Add Schema convenience methods index_of and field_with_name
  • ARROW-8242 - [C++] Flight fails to compile on GCC 4.8
  • ARROW-8243 - [Rust][DataFusion] Fix inconsistency in LogicalPlanBuilder api
  • ARROW-8244 - [Python] Fix parquet.write_to_dataset to set file path in metadata_collector
  • ARROW-8246 - [C++] Add -Wa,-mbig-obj to CXXFLAGS on MinGW if it is supported
  • ARROW-8247 - [Python] Expose Parquet writing "engine" setting in pyarrow.parquet.write_table
  • ARROW-8249 - [Rust][DataFusion] Table API now uses LogicalPlanBuilder
  • ARROW-8252 - [CI][Ruby] Add Ubuntu 20.04
  • ARROW-8256 - [Rust][DataFusion] Update CLI documentation for 0.17.0 release
  • ARROW-8264 - [Rust][DataFusion] Add utility for printing batches
  • ARROW-8266 - [C++] Provide backup mirrors for thrift externalproject
  • ARROW-8267 - [CI][GLib] Fix build error on Ubuntu 16.04
  • ARROW-8271 - [Packaging] Allow wheel upload failures to gemfury
  • ARROW-8275 - [Python] Update Feather documentation for V2, Python IPC API cleanups / deprecations
  • ARROW-8277 - [Python] implemented eq, repr, and provided a wrapper of Take() for RecordBatch
  • ARROW-8279 - [C++] Do not export Codec implementation symbols, remove codec-specific headers
  • ARROW-8288 - [Python] Expose with_ modifiers on DataType
  • ARROW-8290 - [Python] Improve FileSystemDataset constructor
  • ARROW-8291 - [Packaging] Conda nightly builds can't locate Numpy
  • ARROW-8292 - [Python] Allow to manually specify schema in dataset() function
  • ARROW-8294 - [Flight] Add DoExchange to Flight.proto
  • ARROW-8295 - [C++][Dataset] Push down projection to IpcReadOptions
  • ARROW-8299 - [C++] Reusable "optional ParallelFor" function for optional use of multithreading
  • ARROW-8300 - [R] Documentation and changelog updates for 0.17
  • ARROW-8307 - [Python] Add memory_map= option to pyarrow.feather.read_table
  • ARROW-8308 - [Rust] Implement DoExchange on examples
  • ARROW-8309 - [CI] C++/Java/Rust workflows should trigger on changes to Flight.proto
  • ARROW-8311 - [C++] Add push style stream format reader
  • ARROW-8316 - [CI] Set docker-compose to use docker-cli instead of docker-py for building images
  • ARROW-8319 - [CI] Install thrift compiler in the debian build
  • ARROW-8320 - [Format] Add clarification to CDataInterface.rst regarding memory alignment of buffers
  • ARROW-8321 - [CI] Use bundled thrift in Fedora 30 build
  • ARROW-8322 - [CI] Fix C# workflow file syntax
  • ARROW-8325 - [R][CI] Stop including boost in R windows bundle
  • ARROW-8329 - [Documentation][C++] Undocumented FilterOptions argument in Filter kernel
  • ARROW-8330 - [Documentation] The post release script generates the documentation with a development version
  • ARROW-8332 - [C++] Don't require Thrift compiler for Parquet build
  • ARROW-8335 - [Release] Add crossbow jobs to run release verification
  • ARROW-8336 - [Packaging][deb] Use libthrift-dev on Debian 10 and Ubuntu 19.10 or later
  • ARROW-8341 - [Packaging][deb] Reduce disk usage on building packages
  • ARROW-8343 - [GLib] Add GArrowRecordBatchIterator
  • ARROW-8347 - [C++] Migrate Array methods to Result<T>
  • ARROW-8351 - [R][CI] Store the Rtools-built Arrow C++ library as a build artifact
  • ARROW-8352 - [R] Add install_pyarrow()
  • ARROW-8356 - [Developer] Support * wildcards with "crossbow submit" via GitHub actions
  • ARROW-8361 - [C++] Add Result<T> APIs to Buffer methods and functions
  • ARROW-8362 - [Crossbow] Ensure that the locally generated version is used in the docker tasks
  • ARROW-8367 - [C++] Deprecate Buffer::FromString(..., MemoryPool*)
  • ARROW-8368 - [C++][C Data Interface] Move several child arrays
  • ARROW-8370 - [C++] Migrate type/schema APIs to Result<T>
  • ARROW-8371 - [Crossbow] Implement and exercise sanity checks for tasks.yml
  • ARROW-8372 - [C++] Migrate Table and RecordBatch APIs to Result<T>
  • ARROW-8375 - [CI][R] Make Windows tests more verbose in case of segfault
  • ARROW-8376 - [R] Add experimental interface to ScanTask/RecordBatch iterators
  • ARROW-8387 - [Rust] Make schema_to_fb public
  • ARROW-8389 - [Integration] Run tests in parallel
  • ARROW-8390 - [R] Expose schema unification features
  • ARROW-8393 - [C++][Gandiva] Make gandiva function registry case-insensitive
  • ARROW-8396 - [Rust] Removes libc dependency
  • ARROW-8398 - [Python] Remove deprecated API usage from python tests
  • ARROW-8401 - [C++] Add byte-stream-split AVX2/AVX512 implementation
  • ARROW-8403 - [C++] Add ToString() to ChunkedArray, Table and RecordBatch
  • ARROW-8407 - [Rust] Add documentation for Dictionary data type
  • ARROW-8408 - [Python] Add memory_map argument to feather.read_feather
  • ARROW-8409 - [R] Add R wrappers for getting and setting global CPU thread pool capacity
  • ARROW-8412 - [C++][Gandiva] Fix gandiva date_diff function definitions
  • ARROW-8433 - [R] Add feather alias for ipc format in dataset API
  • ARROW-8444 - [Documentation] Fix spelling errors across the codebase
  • ARROW-8449 - [R] Use CMAKE_UNITY_BUILD everywhere
  • ARROW-8450 - [Integration][C++] Implement large offsets types
  • ARROW-8457 - [C++] Add expected results for ArrowSchema in big-endian
  • ARROW-8458 - [C++] Prefer the original mirrors for the bundled thirdparty dependencies
  • ARROW-8461 - [Packaging][deb] Use zstd package for Ubuntu Xenial
  • ARROW-8463 - [CI] Balance the nightly test builds between CircleCI, Azure and Github
  • ARROW-8679 - [Python] supporting pandas sparse series in pyarrow
  • PARQUET-458 - [C++][Parquet] Add support for reading/writing DataPageV2 format
  • PARQUET-1663 - [C++] Provide API to check the presence of repeated fields
  • PARQUET-1716 - [C++] Add BYTE_STREAM_SPLIT encoder and decoder
  • PARQUET-1770 - [C++][CI] Add fuzz target for reading Parquet files
  • PARQUET-1785 - [C++] Implement ByteStreamSplitDecoder::DecodeArrow and refactor tests
  • PARQUET-1786 - [C++] Improve ByteStreamSplit decoder using SSE2
  • PARQUET-1806 - [C++] Improve fuzzing seed corpus
  • PARQUET-1825 - [C++] Fix compilation error in column_io_benchmark.cc
  • PARQUET-1828 - [C++] Use SSE2 for the ByteStreamSplit encoder
  • PARQUET-1840 - [C++] Stop Early on DecodeSpaced

Apache Arrow 0.16.0 (2020-02-07)

Bug Fixes

  • ARROW-3783 - [R] Incorrect collection of float type
  • ARROW-3962 - [Go] Handle null values in CSV
  • ARROW-4470 - [Python] Pyarrow using considerable more memory when reading partitioned Parquet file
  • ARROW-4998 - [R] R package fails to install on OSX
  • ARROW-5575 - [C++] Split Targets.cmake for each module
  • ARROW-5655 - [Python] Table.from_pydict/from_arrays not using types in specified schema correctly
  • ARROW-5680 - [Rust][DataFusion] GROUP BY sql tests are now deterministic
  • ARROW-6157 - [C++] Array data validation
  • ARROW-6195 - [C++] Detect Apache mirror without Python
  • ARROW-6298 - [Rust] [CI] Examples are not being tested in CI
  • ARROW-6320 - [C++] Arrow utilities are linked statically
  • ARROW-6429 - [CI][Crossbow] Nightly spark integration job fails
  • ARROW-6445 - [CI][Crossbow] Nightly Gandiva jar trusty job fails
  • ARROW-6567 - [Rust][DataFusion] Wrap aggregate in projection when needed
  • ARROW-6581 - [C++] Fix fuzzit job submission
  • ARROW-6704 - [C++] Check for out of bounds timestamp in unsafe cast
  • ARROW-6708 - [C++] Fix hardcoded boost library names
  • ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
  • ARROW-6736 - [Rust][DataFusion] Evaluate the input to the aggregate expression just once per batch
  • ARROW-6740 - [C++] Unmap MemoryMappedFile as soon as possible
  • ARROW-6745 - [Rust] Fix a variety of minor typos.
  • ARROW-6749 - [Python] Let Array.to_numpy use general conversion code with zero_copy_only=True
  • ARROW-6750 - [Python] Silence S3 error logs by default
  • ARROW-6761 - [Rust] Travis build now uses the correct Rust toolchain
  • ARROW-6762 - [C++] Support reading JSON files with no newline at end
  • ARROW-6785 - [JS] Remove superfluous child assignment
  • ARROW-6786 - [C++] arrow-dataset-file-parquet-test is slow
  • ARROW-6795 - [C#] Fix for reading large (2GB+) files
  • ARROW-6798 - [CI] [Rust] Improve build times by caching dependencies in the Docker image
  • ARROW-6801 - [Rust] Arrow source release tarball is missing benchmarks
  • ARROW-6806 - [C++][Python] Fix crash validating an IPC-originating empty array
  • ARROW-6808 - [Ruby] Ensure requiring suitable MSYS2 package
  • ARROW-6809 - [RUBY] Gem does not install on macOS due to glib2 3.3.7 compilation failure
  • ARROW-6812 - [Java] Fix License header
  • ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
  • ARROW-6820 - [Format] Update Map type child to "entries"
  • ARROW-6834 - [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds
  • ARROW-6835 - [Archery][CMake] Restore ARROW_LINT_ONLY cmake option
  • ARROW-6842 - [Website] Jekyll error building website
  • ARROW-6844 - [C++][Parquet] Fix regression in reading List types with item name that is not "item"
  • ARROW-6846 - [C++] Build failures with glog enabled
  • ARROW-6857 - [C++] Fix DictionaryEncode for zero-chunk ChunkedArray
  • ARROW-6859 - [CI][Nightly] Disable docker layer caching for CircleCI tasks
  • ARROW-6860 - [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so
  • ARROW-6861 - [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder
  • ARROW-6864 - [C++] Add compression-related compile definitions before adding any unit tests
  • ARROW-6867 - [FlightRPC][Java] clean up default executor
  • ARROW-6868 - [Go] Fix slicing struct arrays
  • ARROW-6869 - [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add "FinishDelta" method and "ResetFull" method
  • ARROW-6873 - [Python] Remove stale CColumn references
  • ARROW-6874 - [Python] Fix memory leak when converting to Pandas object data
  • ARROW-6876 - [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns
  • ARROW-6877 - [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions
  • ARROW-6878 - [Python] Fix creating array from list of dicts with bytes keys
  • ARROW-6882 - [C++] Ensure the DictionaryArray indices has no dictionary data
  • ARROW-6885 - [Python] Remove superfluous skipped timedelta test
  • ARROW-6886 - [C++] Fix arrow::io nvcc compiler warnings
  • ARROW-6898 - [Java][hotfix] fix ArrowWriter memory leak
  • ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
  • ARROW-6899 - [Python] Decode dictionary-encoded List children to dense when converting to pandas
  • ARROW-6901 - [Rust][Parquet] Increment total_num_rows when closing a row group
  • ARROW-6903 - [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc
  • ARROW-6905 - [Gandiva][Crossbow] Use xcode9.4 for osx builds, do not build dataset, filesystem
  • ARROW-6910 - [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this
  • ARROW-6913 - [R] Potential bug in compute.cc
  • ARROW-6914 - [CI] docker-clang-format nightly failing
  • ARROW-6922 - [Python] Compat with pandas for MultiIndex.levels.names
  • ARROW-6925 - [C++] Only add -stdlib flag on MacOS when using clang.
  • ARROW-6929 - [C++] Remove first offset==0 check from Validate()
  • ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
  • ARROW-6938 - [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues
  • ARROW-6948 - [Rust][Parquet] Fix boolean array in arrow reader.
  • ARROW-6950 - [C++][Dataset] Add dataset benchmark example
  • ARROW-6957 - [CI][Crossbow] Nightly R with sanitizers build fails installing dependencies
  • ARROW-6962 - [C++][CI] Stop compiling with -Weverything
  • ARROW-6966 - [Go] Set a default memset for when the platform doesn't set one
  • ARROW-6977 - [C++] Disable jemalloc background_thread on macOS
  • ARROW-6983 - [C++] Fix ThreadedTaskGroup lifetime issue
  • ARROW-6989 - [Python] Check for out of range precision decimals in python conversion
  • ARROW-6992 - [C++] : Undefined Behavior sanitizer build option fails with GCC
  • ARROW-6999 - [Python] Fix unnamed index when specifying schema in Table.from_pandas
  • ARROW-7013 - [C++] arrow-dataset pkgconfig is incomplete
  • ARROW-7020 - [Java] Fix the bugs when calculating vector hash code
  • ARROW-7021 - [Java] UnionFixedSizeListWriter decimal type should check writer index
  • ARROW-7022 - , ARROW-7023: [Python] fix handling of pandas Index and Period/Interval extension arrays in pa.array
  • ARROW-7023 - [Python] pa.array does not use "from_pandas" semantics for pd.Index
  • ARROW-7024 - [CI][R] Update R dependencies for Conda build
  • ARROW-7027 - [Python] Correctly raise error in pa.table(..) on invalid input
  • ARROW-7033 - [C++] Set SDKROOT automatically on macOS
  • ARROW-7045 - [R] Preserve factor in Parquet roundtrip
  • ARROW-7050 - [R] Fix compiler warnings in R bindings
  • ARROW-7053 - [Python] setuptools-scm produces incorrect version at apache-arrow-0.15.1 tag
  • ARROW-7056 - [Python] Fix test_fs failures when S3 not enabled
  • ARROW-7059 - [C++][Parquet] Mostly fix performance regression when reading Parquet file with many columns
  • ARROW-7074 - [C++] ASSERT_OK_AND_ASSIGN should use ASSERT_OK instead of EXPE…
  • ARROW-7077 - [C++] Casting dictionary to unrelated value type shouldn't crash
  • ARROW-7087 - [Python] Metadata disappear from pandas dataset
  • ARROW-7097 - [Rust][CI] Apply rustfmt nightly
  • ARROW-7100 - [C++][HDFS] Fix search directories for libjvm.so
  • ARROW-7105 - [CI][Crossbow] Nightly homebrew-cpp job fails
  • ARROW-7106 - [Java] Fix the problem that flight perf test hangs endlessly
  • ARROW-7117 - [C++][CI] Fix the hanging C++ tests in Windows 2019
  • ARROW-7128 - [CI] Use proper version for fedora tests in GitHub actions cron jobs
  • ARROW-7133 - [CI] Allow GH Actions to run on all branches
  • ARROW-7142 - [C++] GCC compilation failures in nightlies
  • ARROW-7152 - [Java] Delete useless class DiffFunction
  • ARROW-7157 - [R] Add validation, helpful error message to Object$new()
  • ARROW-7158 - [C++] Use compiler information provided by CMake
  • ARROW-7163 - [Doc] Fix double-and typos
  • ARROW-7164 - [CI] Dev cron github action is failing every 15 minutes
  • ARROW-7167 - [CI][Python] Add nightly tests for additional pandas versions to Github Actions
  • ARROW-7168 - [Python] Respect the specified dictionary type for pd.Categorical conversion
  • ARROW-7170 - [C++] Fix linking with bundled ORC
  • ARROW-7180 - [CI] Java builds are not triggered on the master branch
  • ARROW-7181 - [C++] Fix an Arrow module search bug with pkg-config
  • ARROW-7183 - [CI][Crossbow] Re-skip r-sanitizer nightly tests
  • ARROW-7187 - [C++][Doc] doxygen broken on master because of @
  • ARROW-7188 - [C++][Doc] doxygen broken on master: missing param implicit_casts
  • ARROW-7189 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-7194 - [Rust] Fix CSV writer recursion issues
  • ARROW-7199 - [Java] Fix ConcurrentModificationException in BaseAllocator::getChildAllocators
  • ARROW-7200 - [C++][Flight] Enable the server to serve to remote clients
  • ARROW-7209 - [Python] Fix tests on pandas master related to extension dtype conversion
  • ARROW-7212 - [Go] add missing Release to benchmark code
  • ARROW-7214 - [Python] Fix pickling of DictionaryArray
  • ARROW-7217 - [CI][Python] Use correct python version in Github Actions
  • ARROW-7225 - [C++] Fix *std::move(Result<T>) for move-only T
  • ARROW-7249 - [CI] Release test fails in master due to new arrow-flight Rust crate
  • ARROW-7250 - [C++] Define constexpr symbols explicitly in StringToFloatConverter::Impl
  • ARROW-7253 - [CI] Fix failure in release test
  • ARROW-7254 - [Java] BaseVariableWidthVector#setSafe appears to make value offsets inconsistent
  • ARROW-7264 - [Java] RangeEqualsVisitor type check is not correct
  • ARROW-7266 - [C++] Fix ArrayDataVisitor on sliced binary-like array
  • ARROW-7271 - [C++][Flight] Use the single parameter version of SetTotalBytesLimit
  • ARROW-7281 - [C++] Make Adaptive builders' length match expectations
  • ARROW-7282 - [Python] IO functions should raise the right exceptions
  • ARROW-7291 - [Dev] Fix FORMAT_DIR
  • ARROW-7294 - [Python] converted_type_name_from_enum(): Incorrect name for INT_64
  • ARROW-7295 - [R] Fix bad test that causes failure on R < 3.5
  • ARROW-7298 - [C++] Fix thirdparty dependency downloader script
  • ARROW-7314 - [Python] Fix compiler warning in pyarrow.union
  • ARROW-7318 - [C#] TimestampArray serialization failure
  • ARROW-7320 - [C++] Specify CMAKE_INSTALL_LIBDIR for gbenchmark
  • ARROW-7327 - [CI] Failing C GLib and R buildbot builders
  • ARROW-7328 - [CI] GitHub Actions should trigger on changes to GitHub Actions configuration
  • ARROW-7341 - [CI] Unbreak nightly Conda R job
  • ARROW-7343 - [Java][FlightRPC] prevent leak in DoGet
  • ARROW-7349 - [C++] Fix the bug of parsing string hex values
  • ARROW-7353 - [C++] Ignore -Wmissing-braces when building with clang
  • ARROW-7354 - [C++] Fix crash in test-io-hdfs
  • ARROW-7355 - [CI] Environment variables are defined twice for the fuzzit builds
  • ARROW-7358 - [CI] [Dev] [C++] ccache disabled on conda-python-hdfs
  • ARROW-7359 - [C++][Gandiva] Don't throw error for locate function for start position exceeding string length
  • ARROW-7360 - [R] Can't use dplyr filter() with variables defined in parent scope
  • ARROW-7361 - [Rust] Build directory is not passed to ci/scripts/rust_test.sh
  • ARROW-7362 - [Python][C++] Added ListArray.Flatten() that properly flattens a ListArray
  • ARROW-7374 - [Dev][C++] Fix cuda-cpp docker build
  • ARROW-7381 - [C++] Unbreak manylinux1 wheels after Iterator refactor
  • ARROW-7386 - [C#] Array offset does not work properly
  • ARROW-7388 - [Python] Skip HDFS tests if libhdfs cannot be located
  • ARROW-7389 - [Python][Packaging] Remove pyarrow.s3fs import check from the recipe
  • ARROW-7393 - [Plasma] Fix plasma executable name in plasma_java build
  • ARROW-7395 - [C++] Do not warn or error on logical "or" with constants
  • ARROW-7397 - [C++][JSON] Fix white space length detection error
  • ARROW-7404 - [C++][Gandiva] Fix utf8 char length error on Arm64
  • ARROW-7406 - [Java] NonNullableStructVector#hashCode should pass hasher to child vectors
  • ARROW-7407 - [Python] Declare NumPy a PEP517 build dependency
  • ARROW-7408 - [C++] Fix compilation of reference benchmarks
  • ARROW-7435 - [C++] Validate all list / binary offsets in ValidateFull()
  • ARROW-7436 - [Archery] Enable more benchmark binaries in archery benchmark
  • ARROW-7437 - [Java] ReadChannel#readFully does not set writer index correctly
  • ARROW-7442 - [Ruby] Add abstract type check to Arrow::DataType.resolve
  • ARROW-7447 - [Java] ComplexCopier does incorrect copy in some cases
  • ARROW-7450 - [C++] Also link boost_filesystem when using static test linkage
  • ARROW-7458 - [GLib] Fix incorrect build dependency in Makefile
  • ARROW-7471 - [CI][Python] Run flake8 on Cython files
  • ARROW-7472 - [Java] Fix some incorrect behavior in UnionListWriter
  • ARROW-7478 - [Rust][DataFusion] Group by expression ignored unless paired with aggregate expression
  • ARROW-7492 - [CI][Crossbow] Nightly homebrew-cpp job fails on Python installation
  • ARROW-7497 - [Python] Stop relying on (deprecated) pandas.util.testing, move to pandas.testing
  • ARROW-7500 - [C++][Dataset] Remove std::regex usage
  • ARROW-7503 - [Rust][Parquet] Fix build failures
  • ARROW-7506 - [Java] JMH benchmarks should be called from main methods
  • ARROW-7508 - [C#] DateTime32 Reading is Broken
  • ARROW-7510 - [C++] Make ArrayData::null_count thread-safe
  • ARROW-7516 - [C#] Fix .NET Benchmarks
  • ARROW-7518 - [Python] Use PYARROW_WITH_HDFS when building wheels, conda packages
  • ARROW-7527 - [Python] Fix pandas/feather tests for unsupported types with pandas master
  • ARROW-7528 - [Python] Remove usage of deprecated pd.np and pd.datetime in tests
  • ARROW-7535 - [C++] Fix ASAN failures in Array::Validate()
  • ARROW-7543 - [R] Fixes R arrow::write_parquet() documentation code examples
  • ARROW-7545 - [C++] [Dataset] Scanning dataset with dictionary type hangs
  • ARROW-7551 - [FlightRPC][C++] Flight test on macOS fails due to Homebrew gRPC
  • ARROW-7552 - [C++][CI] Disable timing-sensitive tests on public CI
  • ARROW-7554 - [C++] Add support for building on FreeBSD
  • ARROW-7559 - [Rust] Incorrect index check assertion in StringArray and BinaryArray
  • ARROW-7561 - [Doc][Python] Add missing conda_env_gandiva.yml in python.rst
  • ARROW-7563 - [Rust] failed to select a version for `byteorder`
  • ARROW-7582 - [Rust][Flight] Unable to compile arrow.flight.protocol.rs
  • ARROW-7583 - [FlightRPC][C++] relax auth tests due to nondeterminism
  • ARROW-7591 - [Python] Fix DictionaryArray.to_numpy() to return decoded numpy array
  • ARROW-7592 - [C++] Fix crashes on corrupt IPC input
  • ARROW-7593 - [CI][Python] Python datasets failing / not run on CI
  • ARROW-7595 - [R][CI] R appveyor job fails due to pacman compression change
  • ARROW-7596 - [Python] Only permit zero-copy DataFrame block construction when split_blocks=True
  • ARROW-7599 - [Java] Fix build break due to change in RangeEqualsVisitor
  • ARROW-7603 - [Packaging][RPM] Add workaround for LLVM on CentOS 8
  • ARROW-7611 - [Packaging][Python] Fix artifacts patterns for wheel
  • ARROW-7612 - [Packaging][Python] Fix artifacts path for Conda on Windows
  • ARROW-7614 - [Python] Limit size of data in test_parquet.py::test_set_data_page_size
  • ARROW-7618 - [C++] Fix crashes or undefined behaviour on corrupt IPC input
  • ARROW-7620 - [Rust] Remove call to flatc
  • ARROW-7621 - [Doc] Fix doc build
  • ARROW-7634 - [Python] Run pyarrow.dataset tests on Appveyor + fix failures to parse Windows file paths
  • ARROW-7638 - [C++][Dataset] Fix a segfault in DirectoryPartitioningFactory
  • ARROW-7639 - [R] Cannot convert Dictionary Array to R when values aren't strings
  • ARROW-7640 - [C++][Dataset][Parquet] Detect missing compression support
  • ARROW-7647 - [C++] Repair JSON parser's handling of ListArrays
  • ARROW-7650 - [C++][Dataset] enable dataset tests on Windows
  • ARROW-7651 - [CI][Crossbow] Nightly macOS wheel builds fail
  • ARROW-7652 - [Python][Dataset] Use implicit cast in ScannerBuilder.filter
  • ARROW-7661 - [Python] Test for optimal CSV chunking
  • ARROW-7689 - [FlightRPC][C++] bump bundled gRPC to 1.25 to fix MacOS test failure
  • ARROW-7690 - [R] Cannot write parquet to OutputStream
  • ARROW-7693 - [CI] Fix test name for Spark integration, add new tests
  • ARROW-7709 - [Python] Preserve column name in conversion from Table column to pandas for non-ns timestamps
  • ARROW-7714 - [Release] Add missing variable expansion
  • ARROW-7718 - [Release] Fix auto-retry in the binary release script
  • ARROW-7723 - [Python] Triage untested functional regression when converting tz-aware timestamp inside struct to pandas/NumPy format
  • ARROW-7727 - [Python] Unable to read a ParquetDataset when schema validation is on.
  • ARROW-8135 - [Python] Problem importing PyArrow on a cluster
  • ARROW-8638 - Arrow Cython API Usage Gives an error when calling CTable API Endpoints
  • PARQUET-1692 - [C++] Don't use the same CMake variable name for thirdparty version and found version
  • PARQUET-1692 - [C++] LogicalType::FromThrift error on Centos 7 RPM
  • PARQUET-1693 - [C++] Fix parquet examples with compression define guards
  • PARQUET-1702 - [C++] Make BufferedRowGroupWriter compatible with parquet encryption
  • PARQUET-1706 - [C++] Wrong dictionary_page_offset when writing only data pages via BufferedPageWriter
  • PARQUET-1707 - [C++] : parquet-arrow-test fails with UBSAN
  • PARQUET-1709 - [C++] Avoid unnecessary temporary std::shared_ptr copies
  • PARQUET-1715 - [C++] Add the Parquet code samples to CI + Refactor Parquet Encryption Samples
  • PARQUET-1720 - [C++] JSONPrint not showing version correctly
  • PARQUET-1747 - [C++] Access to ColumnChunkMetaData fails when encryption is on
  • PARQUET-1766 - [C++] Handle parquet::Statistics NaNs and -0.0f as per upstream parquet-mr
  • PARQUET-1772 - [C++] ParquetFileWriter: Data overwritten in append mode

New Features and Improvements

  • ARROW-412 - [Format][Documentation] Clarify that Buffer.size in Flatbuffers should reflect the actual memory size rather than the padded size
  • ARROW-501 - [C++] Implement concurrent / buffering InputStream for streaming data use cases
  • ARROW-772 - [C++] Implement take kernel functions
  • ARROW-843 - [C++][Dataset] Ensure Schemas are unified in DataSourceDiscovery
  • ARROW-976 - [C++][Python] Provide API for defining and reading Parquet datasets with more ad hoc partition schemes
  • ARROW-1036 - [C++] Define abstract API for filtering Arrow streams (e.g. predicate evaluation)
  • ARROW-1119 - [Python/C++] Implement NativeFile interfaces for Amazon S3
  • ARROW-1175 - [Java] Implement/test dictionary-encoded subfields
  • ARROW-1456 - [Python] Run s3fs unit tests in Travis CI
  • ARROW-1562 - [C++] Numeric kernel implementations for add
  • ARROW-1638 - [Java] IPC roundtrip for null type
  • ARROW-1900 - [C++] Add kernel for min / max
  • ARROW-2428 - [Python] Support pandas ExtensionArray in Table.to_pandas conversion
  • ARROW-2602 - [Packaging] Automate build of development docker containers
  • ARROW-2863 - [Python] Add context manager APIs to RecordBatch*Writer/Reader classes
  • ARROW-3085 - [Rust] Add an adapter for parquet.
  • ARROW-3408 - [C++] Add CSV option to automatically attempt dict encoding
  • ARROW-3444 - [Python] Add Array/ChunkedArray/Table nbytes attribute
  • ARROW-3706 - [Rust] Add record batch reader trait.
  • ARROW-3789 - [Python] Use common conversion path for Arrow to pandas.Series/DataFrame. Zero copy optimizations for DataFrame, add split_blocks and self_destruct options
  • ARROW-3808 - [R] Array extract, including Take method
  • ARROW-3813 - [R] lower level construction of Dictionary Arrays
  • ARROW-4059 - [Rust] Parquet/Arrow Integration
  • ARROW-4091 - [C++] Curate default list of CSV null spellings
  • ARROW-4208 - [CI/Python] Have automatized tests for S3
  • ARROW-4219 - [Rust][Parquet] Initial support for arrow reader.
  • ARROW-4223 - [Python] Support scipy.sparse integration
  • ARROW-4224 - [Python] Support integration with pydata/sparse library
  • ARROW-4225 - [Format][C++] Add CSC sparse matrix support
  • ARROW-4722 - [C++] Implement Bitmap class to modularize handling of bitmaps
  • ARROW-4748 - [Rust][DataFusion] Optimize GROUP BY aggregate queries
  • ARROW-4930 - [C++] Improve find_package() support
  • ARROW-5180 - [Rust] IPC Support
  • ARROW-5181 - [Rust] Initial support for Arrow File reader
  • ARROW-5182 - [Rust] Arrow IPC file writer
  • ARROW-5227 - [Rust] [DataFusion] Re-implement query execution with an extensible physical query plan
  • ARROW-5277 - [C#] MemoryAllocator.Allocate(length: 0) doesn't return null
  • ARROW-5333 - [C++] Clamp build option summary width to 90
  • ARROW-5366 - [Rust] Duration and Interval Arrays
  • ARROW-5400 - [Rust] Test/ensure that reader and writer support zero-length record batches
  • ARROW-5445 - [Website] Remove language that encourages pinning a version
  • ARROW-5454 - [C++] Implement Take on ChunkedArray for DataFrame use
  • ARROW-5502 - [R] file readers should mmap
  • ARROW-5508 - [C++] Create reusable Iterator<T> interface
  • ARROW-5523 - [Python][Packaging] Use HTTPS consistently for downloading wheel dependencies
  • ARROW-5712 - [C++][Parquet] Arrow time32/time64/timestamp ConvertedType not being restored properly
  • ARROW-5767 - [Format] Permit dictionary replacements in IPC protocol
  • ARROW-5801 - [CI] Dockerize (add to docker-compose) all Travis CI Linux tasks
  • ARROW-5802 - [CI][Archery] Dockerify lint utilities
  • ARROW-5804 - [C++] Dockerize C++ CI job with conda-forge toolchain, code coverage from Travis CI
  • ARROW-5805 - [Python] Dockerize (add to docker-compose) Python Travis CI job
  • ARROW-5806 - [CI] Dockerize (add to docker-compose) Integration tests Travis CI entry
  • ARROW-5807 - [JS] Dockerize NodeJS Travis CI entry
  • ARROW-5808 - [GLib][Ruby] Dockerize (add to docker-compose) current GLib + Ruby Travis CI entry
  • ARROW-5809 - [CI][Rust] Travis runs dockerized Rust build
  • ARROW-5810 - [Go] Dockerize Travis CI Go build
  • ARROW-5831 - [Release] Add Python program to download binary artifacts in parallel, allow abort/resume
  • ARROW-5839 - [Python] Test manylinux2010 in CI
  • ARROW-5855 - [Python] Support for Duration (timedelta) type
  • ARROW-5859 - [Python] Support ExtensionArray.to_numpy using storage array
  • ARROW-5971 - [Website] Blog post introducing Arrow Flight
  • ARROW-5994 - [CI] [Rust] Create nightly releases of the Rust implementation
  • ARROW-6003 - [C++] Better input validation and error messaging in CSV reader
  • ARROW-6074 - [FlightRPC][Java] Middleware
  • ARROW-6091 - [Rust][DataFusion] Implement physical execution plan for LIMIT
  • ARROW-6109 - [Integration] Docker image for integration testing can't be built on windows
  • ARROW-6112 - [Java] Support int64 buffer lengths in Java
  • ARROW-6184 - [Java] Provide hash table based dictionary encoder
  • ARROW-6251 - [Developer] Add PR merge tool to apache/arrow-site
  • ARROW-6257 - [C++] Add fnmatch compatible globbing function
  • ARROW-6274 - [Rust][DataFusion] Add support for writing results to CSV
  • ARROW-6277 - [C++][Parquet] Support direct DictionaryArray write of all parquet types
  • ARROW-6283 - [Rust][DataFusion] Implement Context::write_csv to write partitioned CSV results
  • ARROW-6285 - [GLib] Add support for LargeBinary and LargeString types
  • ARROW-6286 - [GLib] Add support for LargeList type
  • ARROW-6299 - [C++] Simplify FileFormat classes to singletons
  • ARROW-6321 - [Python] Ability to create ExtensionBlock on conversion to pandas
  • ARROW-6340 - [R] Implements low-level bindings to Dataset classes
  • ARROW-6341 - [Python] Implement low-level bindings for Dataset
  • ARROW-6352 - [Java] Add implementation of DenseUnionVector
  • ARROW-6367 - [C++][Gandiva] Implement string reverse
  • ARROW-6378 - [C++][Dataset] Implement recursive TreeDataSource
  • ARROW-6386 - [C++][Documentation] Explicit documentation of null slot interpretation
  • ARROW-6394 - [Java] Support conversions between delta vector and partial sum vector
  • ARROW-6396 - [C++] Add overloads of Boolean kernels implementing Kleene logic
  • ARROW-6398 - [C++] Consolidate ScanOptions and ScanContext
  • ARROW-6405 - [Python] Add std::move wrapper for use in Cython
  • ARROW-6452 - [Java] Override ValueVector toString() method
  • ARROW-6463 - [C++][Python] Rename arrow::fs::Selector to FileSelector
  • ARROW-6466 - [Integration][CI] Move integration test code to archery integration command. Dockerize integration tests
  • ARROW-6468 - [C++] Remove unused hashing routines
  • ARROW-6473 - Dictionary encoding format clarifications/future proofing
  • ARROW-6503 - [C++] Add an argument of memory pool object to SparseTensorConverter
  • ARROW-6508 - [C++] Add Tensor and SparseTensor factory function with validations
  • ARROW-6515 - [C++] Clean type_traits.h definitions
  • ARROW-6578 - [C++] Allow casting number to string
  • ARROW-6592 - [Java] Add support for skipping decoding of columns/field in Avro converter
  • ARROW-6594 - [Java] Support logical type encodings from Avro
  • ARROW-6598 - [Java] Sort the code for ApproxEqualsVisitor and provide an interface for custom vector equality
  • ARROW-6608 - [C++] Make default for ARROW_HDFS to be OFF
  • ARROW-6610 - [C++] Add cmake option to disable filesystem layer
  • ARROW-6611 - [C++] Make ARROW_JSON=OFF the default
  • ARROW-6612 - [C++] Add ARROW_CSV CMake build flag
  • ARROW-6619 - [Ruby] Add support for building Gandiva::Expression by Arrow::Schema#build_expression
  • ARROW-6624 - [C++][Python] Add SparseTensor.ToTensor() method
  • ARROW-6625 - [C++][Python] Allow concat_tables to null fill missing columns
  • ARROW-6631 - [C++] Do not build any compression libraries by default in C++ build
  • ARROW-6632 - [C++] Do not build with ARROW_COMPUTE=on and ARROW_DATASET=on by default
  • ARROW-6633 - [C++] Vendor double-conversion library
  • ARROW-6634 - [C++][FOLLOWUP] Remove Flatbuffers EP remnants from C++ Dockerfiles
  • ARROW-6634 - [C++] Vendor Flatbuffers and check in compiled sources
  • ARROW-6635 - [C++] Disable glog integration by default
  • ARROW-6636 - [C++] Do not build command line tools by default
  • ARROW-6637 - [Packaging][FOLLOWUP] Enable necessary components in Autobrew build for R
  • ARROW-6637 - [C++] Further streamline default build, add ARROW_CSV CMake option
  • ARROW-6646 - [Go] Write no IPC buffer metadata for NullType
  • ARROW-6650 - [Rust][Integration] Compare integration JSON with schema & batch
  • ARROW-6656 - [Rust][Datafusion] Add MAX, MIN expressions
  • ARROW-6657 - [Rust][DataFusion] Add Count Aggregate Expression
  • ARROW-6658 - [Rust][Datafusion] Implement AVG expression
  • ARROW-6659 - [Rust][DataFusion] Refactor of HashAggregateExec to support custom merge
  • ARROW-6662 - [Java] Implement equals/approxEquals API for VectorSchemaRoot
  • ARROW-6671 - [C++][Python] Use more consistent names for sparse tensor items
  • ARROW-6672 - [Java] Extract a common interface for dictionary builders
  • ARROW-6685 - [C++] Ignore trailing slashes in S3FS
  • ARROW-6686 - [CI] Pull and push docker images to speed up the nightly builds
  • ARROW-6688 - [Packaging] Include s3 support in the conda packages
  • ARROW-6690 - [Rust][DataFusion] Optimize aggregates without GROUP BY to use SIMD
  • ARROW-6692 - [Rust][DataFusion] Update examples to use physical query plan
  • ARROW-6693 - [Rust] [DataFusion] Update unit tests to use physical query plan
  • ARROW-6694 - [Rust][DataFusion] Integration tests now use physical query plan
  • ARROW-6695 - [Rust][DataFusion] Remove legacy code for executing logical plan
  • ARROW-6696 - [Rust][DataFusion] Implement simple math operations in physical query plan
  • ARROW-6700 - [Rust][DataFusion] Use new Arrow Parquet reader
  • ARROW-6707 - [Java] Improve the performance of JDBC adapters by using nullable information
  • ARROW-6710 - [Java] Add JDBC adapter test to cover cases which contains some null values
  • ARROW-6711 - [C++] Consolidate Filter and Expression
  • ARROW-6721 - [JAVA] Avro adapter benchmark only runs once in JMH
  • ARROW-6722 - [Java] Provide a uniform way to get vector name
  • ARROW-6729 - [C++] Prevent data copying in StlStringBuffer
  • ARROW-6730 - [CI] Use GitHub Actions for "C++ with clang 7" docker image
  • ARROW-6731 - [CI] [Rust] Set up Github Action to run Rust tests
  • ARROW-6732 - [Java] Implement quick sort in a non-recursive way to avoid stack overflow
  • ARROW-6741 - [Release] Update changelog.py to use APACHE_ prefixed JIRA_USERNAME and JIRA_PASSWORD environment variables
  • ARROW-6742 - [C++] Remove boost::filesystem dependency in hdfs_internal.cc
  • ARROW-6743 - [C++] Remove usage of boost::filesystem
  • ARROW-6744 - [Rust] Publicly expose JsonEqual
  • ARROW-6754 - [C++] Merge allocator.h into stl.h
  • ARROW-6758 - [Developer] Install local NodeJS via nvm when running release verification
  • ARROW-6764 - [C++] Create a readahead iterator
  • ARROW-6767 - [JS] Lazily bind batches in scan/scanReverse
  • ARROW-6768 - [C++][Dataset] Add method to convert from Scanner to Table
  • ARROW-6769 - [Dataset][C++] End to end test
  • ARROW-6770 - [CI][Travis] Download Minio quietly
  • ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
  • ARROW-6778 - [C++] Support cast for DurationType
  • ARROW-6782 - [C++] Do not require Boost for minimal C++ build
  • ARROW-6784 - [C++][R] Move filter and take for ChunkedArray, RecordBatch, and Table from Rcpp to C++ library
  • ARROW-6787 - [CI][C++] Decommission "C++ with clang 7 and system packages" Travis CI job
  • ARROW-6788 - [CI][Dev] Exercise merge script tests
  • ARROW-6789 - [Python] Improve ergonomics by automatically boxing Action and Result in do_action RPC
  • ARROW-6790 - [Release] Enable selected integration tests in release verification
  • ARROW-6793 - [R] Arrow C++ binary packaging for Linux
  • ARROW-6797 - [Release] Use a separately cloned arrow-site repository in the website post release script
  • ARROW-6802 - [Packaging][deb][RPM] Update qemu-user-static package URL
  • ARROW-6803 - [Rust][DataFusion] Performance optimization for single partition aggregate queries
  • ARROW-6804 - [CI][Rust] Migrate Travis job to Github Actions
  • ARROW-6807 - [Java][FlightRPC] Expose gRPC service & client
  • ARROW-6810 - [Website] Add docs for R package 0.15 release
  • ARROW-6811 - [R] Assorted post-0.15 release cleanups
  • ARROW-6814 - [C++] Resolve compiler warnings occurred on release build
  • ARROW-6822 - [Website] merge_pr.py is published
  • ARROW-6824 - [Plasma] Allow creation of multiple objects through a single IPC in Plasma Store
  • ARROW-6825 - [C++] Rework CSV reader IO around readahead iterator
  • ARROW-6831 - [R] Update R macOS/Windows builds for change in cmake compression defaults
  • ARROW-6832 - [R] Implement Codec::IsAvailable
  • ARROW-6833 - [R][CI] Add crossbow job for full R autobrew macOS build
  • ARROW-6836 - [Format][KeyValue] field to the Footer table in File.fbs
  • ARROW-6843 - [Website] Disable deploy on pull request
  • ARROW-6847 - [C++] Add range_expression adapter to Iterator
  • ARROW-6850 - [Java] Jdbc converter support Null type
  • ARROW-6852 - [C++] Fix build issue on memory-benchmark
  • ARROW-6853 - [Java] Support vector and dictionary encoder use different hasher for calculating hashCode
  • ARROW-6855 - [FlightRPC][C++][Python] Flight middleware for C++/Python
  • ARROW-6862 - [Developer] Check pull request title
  • ARROW-6863 - [Java] Provide parallel searcher
  • ARROW-6865 - [Java] Improve the performance of comparing an ArrowBuf against a byte array
  • ARROW-6866 - [Java] Improve the performance of calculating hash code for struct vector
  • ARROW-6879 - [Rust] Add explicit SIMD for sum kernel
  • ARROW-6880 - [Rust] Add explicit SIMD for min/max kernel
  • ARROW-6881 - [Rust] Remove "array_ops" in favor of the "compute" sub-module
  • ARROW-6884 - [Python] Format friendlier message in Python when a server-side RPC handler fails
  • ARROW-6887 - [Java] Create prose documentation for using ValueVectors
  • ARROW-6888 - [Java] Support copy operation for vector value comparators
  • ARROW-6889 - [Java] ComplexCopier enable FixedSizeList type & fix RangeEqualsVisitor StackOverFlow
  • ARROW-6891 - [Rust][Parquet] utf8 support for arrow reader.
  • ARROW-6902 - [C++][Compute] Add String/Binary support to Compare kernel
  • ARROW-6904 - [Python] Add support for MapArray
  • ARROW-6907 - [Plasma] Allow Plasma to send batched notifications.
  • ARROW-6911 - [Java] Provide composite comparator
  • ARROW-6912 - [Java] Extract a common base class for avro converter consumers
  • ARROW-6916 - [Developer] Sort tasks by name in Crossbow e-mail report
  • ARROW-6918 - [R] Make docker-compose setup faster
  • ARROW-6919 - [Python] Expose more builders in Cython
  • ARROW-6920 - [Packaging] Build python 3.8 wheels
  • ARROW-6926 - [Python] Support sizeof protocol for Python objects
  • ARROW-6927 - [C++] Add gRPC version check
  • ARROW-6928 - [Rust] Add support for FixedSizeListArray
  • ARROW-6930 - [Java] Create utility class for populating vector values used for test purpose only
  • ARROW-6932 - [JAVA] incorrect log on known extension type
  • ARROW-6933 - [Java] Suppor linear dictionary encoder
  • ARROW-6936 - [Python] Improve error message when unwrapping object fails
  • ARROW-6942 - [Developer] Add support for Parquet in pull request check by GitHub Actions
  • ARROW-6943 - [Website] Translate Apache Arrow Flight introduction to Japanese
  • ARROW-6944 - [Rust] Add String, FixedSizeBinary types
  • ARROW-6949 - [Java] Fix promotable writer to handle nullvectors
  • ARROW-6951 - [C++][Dataset] Column projection in ParquetFragment
  • ARROW-6952 - [C++][Dataset] Implement predicate pushdown with ParqueFileFragment
  • ARROW-6954 - [Python][CI] Add Python 3.8 to CI matrix
  • ARROW-6960 - [R] Add lz4 and zstd to R PKGBUILD
  • ARROW-6961 - [C++][Gandiva] Add string lower function in Gandiva
  • ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds
  • ARROW-6964 - [C++][Dataset] Add multithread support to Scanner::ToTable
  • ARROW-6965 - [C++][Dataset] Optionally expose partition keys as columns
  • ARROW-6967 - [C++][Dataset] IN, IS_VALID filter expressions
  • ARROW-6969 - [C++][Dataset] ParquetScanTask defer memory usage
  • ARROW-6970 - [Packaging][RPM] Add support for CentOS 8
  • ARROW-6973 - [C++][ThreadPool] Use perfect forwarding in Submit
  • ARROW-6975 - [C++] Put make_unique in its own header
  • ARROW-6980 - [R] dplyr backend for RecordBatch/Table
  • ARROW-6984 - [C++] Update LZ4 to 1.9.2 for CVE-2019-17543
  • ARROW-6986 - [R] Add basic Expression class
  • ARROW-6987 - [CI] Travis OSX failing to install sdk headers
  • ARROW-6991 - [Packaging][deb] Add support for Ubuntu 19.10
  • ARROW-6994 - [C++] Fix aggressive RSS inflation on macOS when jemalloc background_thread is not enabled
  • ARROW-6997 - [Packaging][RPM] Add apache-arrow-release
  • ARROW-7000 - [C++][Gandiva] Handle empty inputs in string upper, lower functions
  • ARROW-7003 - [Rust] Generate flatbuffers files in docker build image
  • ARROW-7004 - [Plasma] Make it possible to bump up object in LRU cache
  • ARROW-7006 - [Rust] Bump flatbuffers version to avoid vulnerability
  • ARROW-7007 - [C++] Add use_mmap option to LocalFS
  • ARROW-7014 - [Developer][Release] Add "wheels" verification option to verify-release-candidate.sh for Linux and macOS
  • ARROW-7015 - [Developer] Write script to verify macOS wheels given local environment with conda or virtualenv
  • ARROW-7016 - [Developer][Python] Add Windows batch script to test Python wheels for release candidate
  • ARROW-7019 - [Java] Improve the performance of loading validity buffers
  • ARROW-7026 - [Java] Remove assertions in MessageSerializer/vector/writer/reader
  • ARROW-7031 - [Python] Correct LargeListArray.offsets attribute
  • ARROW-7031 - [Python] Expose the offsets of a ListArray in python
  • ARROW-7032 - [Release] Run the python unit tests in the release verification script
  • ARROW-7034 - [CI][Crossbow] Skip known nightly failures
  • ARROW-7035 - [R] Default arguments are unclear in write_parquet docs
  • ARROW-7036 - [C++] Version up ORC to avoid compile errors
  • ARROW-7037 - [C++ ] Compile error on the combination of protobuf >= 3.9 and clang
  • ARROW-7039 - [Python] Fix pa.table/record_batch typecheck to work without pandas
  • ARROW-7047 - [C++] Insert implicit casts in ScannerBuilder::Finish
  • ARROW-7052 - [C++] Fix linking of datasets example when ARROW_BUILD_SHARED=OFF
  • ARROW-7054 - [Docs] Enable overriding project version with environment variable when building Sphinx docs
  • ARROW-7057 - [C++] Add API to parse URI query strings
  • ARROW-7058 - [C++] FileSystemDataSourceDiscovery should apply partition schemes relative to its base dir
  • ARROW-7060 - [R] Post-0.15.1 cleanup
  • ARROW-7061 - [C++][Dataset] Add ignore file options to FileSystemDataSourceDiscovery
  • ARROW-7062 - [C++][Dataset] Ensure ParquetFileFormat::Open catch parqu…
  • ARROW-7064 - [R] Support null type using vctrs::unspecified()
  • ARROW-7066 - [Python] Allow returning ChunkedArray in arrow_array
  • ARROW-7067 - [CI] Disable code coverage on Travis-CI
  • ARROW-7069 - [C++][Dataset] Replace ConstantPartitionScheme with PrefixDictionaryPartitionScheme
  • ARROW-7070 - [Packaging][deb] Update package names for 1.0.0
  • ARROW-7072 - [Java] Support concating validity bits efficiently
  • ARROW-7082 - [Packaging][deb] Add apache-arrow-archive-keyring package
  • ARROW-7086 - [C++] Provide a wrapper for invoking factories to produce a Result
  • ARROW-7092 - [R] Add vignette for dplyr and datasets
  • ARROW-7093 - [R] Support creating ScalarExpressions for more data types
  • ARROW-7094 - [C++] FileSystemDataSource should use an owning pointer for fs::Filesystem
  • ARROW-7095 - [R] Require an explicit call to pull Datasets into memory
  • ARROW-7096 - [C++] Unified ConcatenateTables APIs
  • ARROW-7098 - [Java] Improve the performance of comparing two memory blocks
  • ARROW-7099 - [C++] Disambiguate function calls in csv parser test
  • ARROW-7101 - [CI] Refactor docker-compose setup and use it with GitHub Actions
  • ARROW-7103 - [R] Various minor cleanups
  • ARROW-7107 - [C++][MinGW] Enable Flight on AppVeyor
  • ARROW-7110 - [GLib] Add filter support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7111 - [GLib] Add take support for GArrowTable, GArrowChunkedArray, and GArrowRecordBatch
  • ARROW-7113 - [Rust] Add unowned buffer.
  • ARROW-7116 - [CI] Use the docker repository provided by apache organization
  • ARROW-7120 - [C++][CI] Add .ccache to the docker-compose volume mounts
  • ARROW-7146 - [R][CI] Various fixes and speedups for the R docker-compose setup
  • ARROW-7147 - [C++][Dataset] Refactor dataset's API to use Result<T>
  • ARROW-7148 - [C++][Dataset] Major API cleanup
  • ARROW-7149 - [C++] Remove experimental status on filesystem APIs
  • ARROW-7155 - [Java][CI] add maven wrapper to make setup process simple
  • ARROW-7159 - [CI] Run HDFS tests as cron task
  • ARROW-7160 - [C++] Update string_view backport
  • ARROW-7161 - [C++] Migrate filesystem APIs from Status to Result
  • ARROW-7162 - [C++] Cleanup warnings in cmake_modules/SetupCxxFlags.cmake
  • ARROW-7166 - [Java] Remove redundant code for Jdbc adapters
  • ARROW-7169 - [C++] Vendor uriparser library
  • ARROW-7171 - [Ruby] Pass Array<Boolean> for Arrow::Table#filter
  • ARROW-7172 - [C++][Dataset] Improve format of Expression::ToString
  • ARROW-7176 - [C++] Fix arrow::ipc compiler warning
  • ARROW-7178 - [C++] Vendor forward compatible std::optional
  • ARROW-7185 - [R][Dataset] Add bindings for IN, IS_VALID expressions
  • ARROW-7186 - [R] Add inline comments to document the dplyr code
  • ARROW-7192 - [Rust] Implement Flight crate
  • ARROW-7193 - [Rust] Arrow stream reader
  • ARROW-7195 - [Ruby] Improve #filter, #take, and #is_in
  • ARROW-7196 - [Ruby] Remove needless BinaryArrayBuilder#append_values
  • ARROW-7197 - [Ruby] Suppress keyword argument related warnings with Ruby 2.7
  • ARROW-7204 - [C++][Dataset] Implicit cast support for InExpression
  • ARROW-7206 - [Java] Avoid string concatenation when calling Preconditions#checkArgument
  • ARROW-7207 - [Rust] Update generated fbs files
  • ARROW-7210 - [C++][R] Allow Numeric <-> Temporal Scalar casts
  • ARROW-7211 - [Rust] Support byte buffers as a parquet sink
  • ARROW-7216 - [Java] Improve the performance of setting/clearing individual bits
  • ARROW-7219 - [Python][CI] Test with pickle5 installed
  • ARROW-7227 - [Python] Added a python wrapper for ConcatenateTablesWithPromotions
  • ARROW-7228 - [Python] Added a python wrapper for RecordBatch.FromStructArray()
  • ARROW-7235 - [C++] Add Result<T> APIs to IO layer
  • ARROW-7236 - [C++] Add Result<T> APIs to arrow/csv
  • ARROW-7240 - [C++] Add Result<T> to APIs to arrow/util
  • ARROW-7246 - [CI][Python] Use Python 3 for docker-compose
  • ARROW-7247 - [CI][Python] Fix wheel build error on macOS
  • ARROW-7248 - [Rust] Automatically Generate IPC Messages
  • ARROW-7255 - [CI] Re-enable source release test on pull request
  • ARROW-7257 - [CI] Fix Homebrew formula audit error by openssl
  • ARROW-7258 - [CI] Fix fuzzit build directory
  • ARROW-7259 - [Java] Support subfield encoder use different hasher
  • ARROW-7260 - [CI] Remove Ubuntu 14.04 test job
  • ARROW-7261 - [Python] Add Python support for Fixed Size List type
  • ARROW-7262 - [C++][Gandiva] Added replace function
  • ARROW-7263 - [C++][Gandiva] Implemented locate function
  • ARROW-7268 - [Rust] Add custom_metadata field from IPC message to Schema.
  • ARROW-7269 - [Python] Add ORC to api documentation
  • ARROW-7270 - [Go] preserve CSV reading behaviour, improve memory usage
  • ARROW-7274 - [C++] Add Result<T> APIs to Decimal class
  • ARROW-7275 - [Ruby] Add support for Arrow::ListDataType.new(data_type)
  • ARROW-7276 - [Ruby][...]
  • ARROW-7277 - [Java][Doc] Add discussion about vector lifecycle
  • ARROW-7279 - [C++] Rename UnionArray::type_ids to type_codes
  • ARROW-7284 - [Java] ensure java implementation meets clarified dictionary spec
  • ARROW-7289 - [C#] ListType constructor argument is redundant
  • ARROW-7290 - [C#] Implement ListArray Builder
  • ARROW-7292 - [CI][C++] Add ASAN / UBSAN run
  • ARROW-7293 - [Dev][C++] Persist ccache in docker-compose build volumes
  • ARROW-7296 - [Python] Add ORC api documentation
  • ARROW-7299 - [GLib] Use Result instead of Status
  • ARROW-7303 - [C++] Refactor CSV benchmarks to use Result APIs
  • ARROW-7306 - [C++] Add Result-returning version of FileSystemFromUri
  • ARROW-7307 - [CI][GLib] Ensure generating documentation
  • ARROW-7309 - [Python] Support HDFS federation viewfs
  • ARROW-7310 - [Python] Expose HDFS implementation for pyarrow.fs
  • ARROW-7311 - [Python] Return filesystem and path from URI
  • ARROW-7312 - [Rust] Implement std::error::Error for ArrowError.
  • ARROW-7317 - [C++] Migrate Iterator to a Result API
  • ARROW-7319 - [C++] Refactor Iterator<T> to yield Result<T>
  • ARROW-7321 - [CI][GLib] Disable development mode
  • ARROW-7322 - [CI][Python] Fall back to arrowdev dockerhub organization for manylinux images
  • ARROW-7323 - [CI][Rust] Use the same toolchain
  • ARROW-7324 - [Rust] Add timezone to timestamp
  • ARROW-7325 - [Rust][Parquet] Update to parquet-format 2.6 and thrift 0.12
  • ARROW-7329 - [Java] AllocationManager: Allow managing different types …
  • ARROW-7333 - [CI][Rust] Remove duplicated nightly job
  • ARROW-7334 - [CI][Python] Use Python 3 on macOS
  • ARROW-7339 - [CMake] Thrift version not respected in CMake configuration version.txt
  • ARROW-7340 - [CI] Prune defunct appveyor build setup
  • ARROW-7344 - [Packaging][Python] Build manylinux2014 wheels
  • ARROW-7346 - [CI] Explicit usage of ccache across the builds
  • ARROW-7347 - [C++] Update bundled Boost to 1.71.0
  • ARROW-7348 - [Rust] Add api to return null bitmap buffer.
  • ARROW-7351 - [Developer] Only suggest cpp-* versions by default for PARQUET issues in merge tool
  • ARROW-7357 - [Go] migrate to x/xerrors
  • ARROW-7366 - [C++][Dataset] Use PartitionSchemeDiscovery in DataSourceDiscovery
  • ARROW-7367 - [Python] Use np.full instead of np.array.repeat in ParquetDatasetPiece
  • ARROW-7368 - [Ruby] Use :arrow_file and :arrow_streaming for format name
  • ARROW-7369 - [GLib] Add garrow_table_combine_chunks
  • ARROW-7370 - [C++] Fix old Protobuf with AUTO detection failure
  • ARROW-7377 - [C++][Dataset] Add ScanOptions::MaterializedFields
  • ARROW-7378 - [C++][Gandiva] Fix loop vectorization in gandiva
  • ARROW-7379 - [C++] Introduce SchemaBuilder companion class and Field::IsCompatibleWith
  • ARROW-7380 - [C++][Dataset] Implement DatasetFactory
  • ARROW-7382 - [C++][Dataset] Insert missing directories in FileSystemDataSourceDiscovery::Make
  • ARROW-7387 - [C#] Support ListType Serialization
  • ARROW-7392 - [Packaging] Add conda packaging tasks for python 3.8
  • ARROW-7398 - [Packaging][Python] Conda builds are failing on macOS
  • ARROW-7399 - [C++][Gandiva] set Mcpu based on host cpu
  • ARROW-7402 - [C++] Add more information on CUDA error
  • ARROW-7403 - [C++][JSON] Enable Rapidjson on Arm64 Neon
  • ARROW-7410 - [Doc][Python] Document filesystem API
  • ARROW-7411 - [C++][Flight] Improve the output of Arrow Flight benchmark
  • ARROW-7413 - [Python] Expose and test the partioning discovery
  • ARROW-7414 - [R][Dataset] Implement *PartitionSchemeDiscovery in R
  • ARROW-7415 - [C++][Dataset] implement IpcFormat
  • ARROW-7416 - [R][Nightly] Fix macos-r-autobrew build on R 3.6.2
  • ARROW-7417 - [C++] Add a docker-compose entry for CUDA 10.1
  • ARROW-7418 - [C++] Fix build error on Ubuntu 16.04
  • ARROW-7420 - [C++] Migrate tensor related APIs to Result-returning version
  • ARROW-7429 - [Java] Enhance code style checking for Java code (remove consecutive spaces)
  • ARROW-7430 - [Python] Add more docstrings to dataset bindings
  • ARROW-7431 - [Python] Add dataset API to reference docs
  • ARROW-7432 - [Python] Add higher level open_dataset function
  • ARROW-7439 - [C++][Dataset] Remove pointer aliases
  • ARROW-7449 - [GLib] Make GObject Introspection optional
  • ARROW-7452 - [GLib] Make GArrowTimeDataType abstract
  • ARROW-7453 - [Ruby]
  • ARROW-7454 - [Ruby] Add support for saving/loading TSV
  • ARROW-7455 - [Ruby] Use Arrow::DataType.resolve for all GArrowDataType input
  • ARROW-7456 - [C++] Add support for YYYY-MM-DDThh and YYYY-MM-DDThh:mm timestamp formats
  • ARROW-7457 - [Doc] fix typos
  • ARROW-7459 - [Python] Fix document lint error
  • ARROW-7460 - [Rust] Improve some kernel performance
  • ARROW-7461 - [Java] fix typos
  • ARROW-7463 - [Doc] fix a broken link and typo
  • ARROW-7464 - [C++] Refine CpuInfo singleton with std::call_once
  • ARROW-7465 - [C++] Add Arrow memory benchmark for Arm64
  • ARROW-7468 - [Python] fix typos
  • ARROW-7469 - [C++] Improve division related bit operations
  • ARROW-7470 - [JS] fix typos
  • ARROW-7474 - [Ruby] Improve CSV save performance
  • ARROW-7475 - [Rust] Arrow IPC Stream writer
  • ARROW-7477 - [Java][FlightRPC] set up gRPC reflection metadata
  • ARROW-7479 - [Rust][Ruby][R] Fix typos
  • ARROW-7481 - [C#] fix typo
  • ARROW-7482 - [C++] Fix typos
  • ARROW-7484 - [C++][Gandiva] Fix typos
  • ARROW-7485 - [C++][Prasma] Fix typos
  • ARROW-7487 - [Developer] Fix typos
  • ARROW-7488 - [GLib] Fix typos and broken links
  • ARROW-7489 - [CI] Fix typos
  • ARROW-7490 - [Java] Avro converter should convert attributes and props to FieldType metadata
  • ARROW-7493 - [Python] Expose sum kernel in pyarrow.compute and support ChunkedArray inputs
  • ARROW-7498 - [Dataset] Rename core classes before stable API
  • ARROW-7502 - [Integration] Remove Spark patch not needed
  • ARROW-7513 - [JS][tutorial] - Rich cols part 1
  • ARROW-7514 - [C#] Make GetValueOffset Obsolete
  • ARROW-7519 - [Python] Build wheels, conda packages with dataset support
  • ARROW-7521 - [Rust] Remove tuple on FixedSizeList
  • ARROW-7523 - [Developer] Relax clang-tidy check
  • ARROW-7526 - [C++][Compute] Optimize small integer sorting
  • ARROW-7532 - [CI] Unskip brew test after Homebrew fixes it upstream
  • ARROW-7537 - [CI][R] Nightly macOS autobrew job should be more verbose if it fails
  • ARROW-7538 - [Java] Clarify actual and desired size in AllocationManager
  • ARROW-7540 - [C++] Install license files and README
  • ARROW-7541 - [GLib] Install license files
  • ARROW-7542 - [CI][C++] Use $(sysctl -n hw.ncpu) instead of $(nproc) on macOS
  • ARROW-7549 - [Java] Reorganize Flight modules to keep top level clean/organized
  • ARROW-7550 - [R][CI] Run donttest examples in CI
  • ARROW-7557 - [C++][Compute] Validate sorting stability
  • ARROW-7558 - [Packaging][deb][RPM] Use the host owner and group for artifacts
  • ARROW-7560 - [Rust] Reduce Rc/Refcell usage
  • ARROW-7565 - [Website] Add support for download URL redirect
  • ARROW-7566 - [CI] Use more recent Miniconda on AppVeyor
  • ARROW-7567 - [Java] Fix races in checkstyle upgdae
  • ARROW-7567 - [Java] Bump Checkstyle from 6.19 to 8.19
  • ARROW-7568 - [Java] Bump Apache Avro from 1.9.0 to 1.9.1
  • ARROW-7569 - [Python] Add API to map Arrow types to pandas ExtensionDtypes in to_pandas conversions
  • ARROW-7570 - [Java] Fix high severity issues
  • ARROW-7571 - [Java] Correct minimal Java version on README
  • ARROW-7572 - [Java] Enforce Maven 3.3+ as mentioned in README
  • ARROW-7573 - [Rust] Reduce boxing and cleanup
  • ARROW-7575 - [R] Linux binary packaging followup
  • ARROW-7576 - [C++][Dev] Improve fuzzing setup
  • ARROW-7577 - [CI][C++] Check OSS-Fuzz build in Github Actions
  • ARROW-7578 - [R] Add support for datasets with IPC files and with multiple sources
  • ARROW-7580 - [Website] 0.16 release post
  • ARROW-7581 - [R] Documentation/polishing for 0.16 release
  • ARROW-7590 - [C++] Don't ignore managed files in thirdparty
  • ARROW-7597 - [C++] More compact CMake configuration summary
  • ARROW-7600 - [C++][Parquet] failing disabled unittest for nested parquet.
  • ARROW-7601 - [Doc][C++] Update fuzzing doc
  • ARROW-7602 - [Archery] Add more archery build options
  • ARROW-7613 - [Rust] Remove redundant :: prefixes
  • ARROW-7622 - [Format] Mark Tensor and SparseTensor fields required
  • ARROW-7623 - [C++] Update generated flatbuffers code
  • ARROW-7626 - [Parquet][GLib] Add support for version macros
  • ARROW-7627 - [C++][Gandiva] Optimize string truncate function
  • ARROW-7629 - [C++][CI] Add fuzz regression files to arrow-testing
  • ARROW-7630 - [C++][CI] Check fuzz crash regressions in CI
  • ARROW-7632 - [C++][CI] Add extension type data to IPC fuzz seed corpus
  • ARROW-7635 - [C++] Add pkg-config support for each components
  • ARROW-7636 - [Python] Clean-up the pyarrow.dataset.partitioning() API
  • ARROW-7644 - Add vcpkg installation instructions
  • ARROW-7645 - [Packaging][deb][RPM] Fix arm64 packaging build
  • ARROW-7648 - [C++] Sanitize local paths on Windows
  • ARROW-7658 - [R] Support dplyr filtering on date/time
  • ARROW-7659 - [Rust] Reduce Rc usage
  • ARROW-7660 - [C++][Gandiva] Optimise castVarchar(string, int) function for single byte characters
  • ARROW-7665 - [R] Build in parallel in linuxLibs.R
  • ARROW-7666 - [Packaging][deb] Always use Ninja to reduce build time
  • ARROW-7667 - [Packaging][deb] Add ubuntu-eoan to nightly jobs
  • ARROW-7668 - [Packaging][RPM] Use Ninja if possible to reduce build time
  • ARROW-7670 - [Python][Dataset] More ergonomical API
  • ARROW-7671 - [Python][Dataset] Add bindings for the DatasetFactory
  • ARROW-7674 - [Dev] Add helpful message for captcha challenge in merge_arrow_pr.py
  • ARROW-7682 - [Packaging] Add support for arm64 APT/Yum repositories
  • ARROW-7683 - [Packaging] Set 0.16.0 as the next version
  • ARROW-7686 - [Packaging][deb][RPM] Include more arrow-*.pc
  • ARROW-7687 - [C++] Fix dead links in README
  • ARROW-7692 - [Rust] Simplify some Option / Result pattern matches
  • ARROW-7694 - [Packaging][deb][RPM] Add support for RC to repository packages
  • ARROW-7695 - [Release] Update java versions to 0.16-SNAPSHOT
  • ARROW-7696 - [Release] Add support for running unit test on release branch
  • ARROW-7697 - [Release] Add a test for updating Linux packages by 00-prepare.sh
  • ARROW-7710 - [Release][C#] Add support for redirecting .NET download URL
  • ARROW-7711 - [C#] Make Date32 test independent of system timezone
  • ARROW-7715 - [Release][APT] Ignore some arm64 verifications
  • ARROW-7716 - [Packaging][APT] Use the "main" component for Ubuntu 19.10
  • ARROW-7719 - [Python][Dataset] Table equality check occasionally fails
  • ARROW-7724 - [Release][Yum] Ignore some arm64 verifications
  • ARROW-7743 - [Rust] [Parquet] Support reading timestamp micros
  • ARROW-7768 - [Rust] Implement Length and TryClone traits for Cursor<Vec<u8>> in reader.rs
  • ARROW-8015 - [Python] Build 0.16.0 wheel install for Windows + Python 3.5 and publish to PyPI
  • PARQUET-517 - [C++] Use arrow::MemoryPool for all heap allocations
  • PARQUET-1300 - [C++] Implement encrypted Parquet read and write support
  • PARQUET-1664 - [C++] Provide API to return metadata string from FileMetadata.
  • PARQUET-1678 - [C++] Provide classes for reading/writing using input/output operators
  • PARQUET-1688 - [C++] StreamWriter/StreamReader can't be built with g++ 4.8.5 on CentOS 7
  • PARQUET-1689 - [C++] Stream API: Allow for columns/rows to be skipped when reading
  • PARQUET-1701 - [C++] Stream API: Add support for optional fields
  • PARQUET-1704 - [C++] Add re-usable encryption buffer to SerializedPageWriter
  • PARQUET-1705 - [C++] Disable shrink-to-fit on the re-usable decryption buffer
  • PARQUET-1712 - [C++] Stop using deprecated APIs in examples
  • PARQUET-1721 - [C++][Parquet] Add missing arrow dependency to parquet.pc
  • PARQUET-1734 - [C++] Fix typo
  • PARQUET-1769 - [C++] Update parquet.thrift to parquet-format 2.8.0

Apache Arrow 0.15.1 (2019-11-01)

Bug Fixes

  • ARROW-6464 - [Java] Refactor FixedSizeListVector#splitAndTransfer with slice API (#5293)
  • ARROW-6728 - [C#] Support reading and writing Date32 and Date64 arrays
  • ARROW-6740 - [C++] Unmap MemoryMappedFile as soon as possible
  • ARROW-6762 - [C++] Support reading JSON files with no newline at end
  • ARROW-6795 - [C#] Fix for reading large (2GB+) files
  • ARROW-6806 - [C++][Python] Fix crash validating an IPC-originating empty array
  • ARROW-6809 - [RUBY] Gem does not install on macOS due to glib2 3.3.7 compilation failure
  • ARROW-6813 - [Ruby] Arrow::Table.load with headers=true leads to exception in Arrow 0.15
  • ARROW-6834 - [C++][TRIAGE] Pin gtest version 1.8.1 to unblock Appveyor builds
  • ARROW-6844 - [C++][Parquet] Fix regression in reading List types with item name that is not "item"
  • ARROW-6857 - [C++] Fix DictionaryEncode for zero-chunk ChunkedArray
  • ARROW-6860 - [Python][C++] Do not link shared libraries monolithically to pyarrow.lib, add libarrow_python_flight.so
  • ARROW-6861 - [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder
  • ARROW-6869 - [C++] Do not return invalid arrays from DictionaryBuilder::Finish when reusing builder. Add "FinishDelta" method and "ResetFull" method
  • ARROW-6873 - [Python] Remove stale CColumn references
  • ARROW-6874 - [Python] Fix memory leak when converting to Pandas object data
  • ARROW-6876 - [C++][Parquet] Use shared_ptr to avoid copying ReaderContext struct, fix performance regression with reading many columns
  • ARROW-6877 - [C++] Add additional Boost versions to support 1.71 and the presumed next 2 future versions
  • ARROW-6878 - [Python] Fix creating array from list of dicts with bytes keys
  • ARROW-6882 - [C++] Ensure the DictionaryArray indices has no dictionary data
  • ARROW-6886 - [C++] Fix arrow::io nvcc compiler warnings
  • ARROW-6898 - [Java] Fix potential memory leak in ArrowWriter and several test classes
  • ARROW-6903 - [Python] Attempt to fix Python wheels with introduction of libarrow_python_flight, disabling of pyarrow.orc
  • ARROW-6905 - [Gandiva][Crossbow] Use xcode9.4 for osx builds, do not build dataset, filesystem
  • ARROW-6910 - [C++][Python] Set jemalloc default configuration to release dirty pages more aggressively back to the OS dirty_decay_ms and muzzy_decay_ms to 0 by default, add C++ / Python option to configure this
  • ARROW-6922 - [Python] Compat with pandas for MultiIndex.levels.names
  • ARROW-6937 - [Packaging][Python] Fix conda linux and OSX wheel nightly builds
  • ARROW-6938 - [Packaging][Python] Disable bz2 in Windows wheels and build ZSTD in bundled mode to triage linking issues
  • ARROW-6962 - [C++][CI] Stop compiling with -Weverything
  • ARROW-6977 - [C++] Disable jemalloc background_thread on macOS
  • ARROW-6983 - [C++] Fix ThreadedTaskGroup lifetime issue
  • ARROW-7422 - [Python] Improper CPU flags failing pyarrow install in ARM devices
  • ARROW-7423 - Pyarrow ARM install fails from source with no clear error
  • ARROW-9349 - [Python] parquet.read_table causes crashes on Windows Server 2016 w/ Xeon Processor

New Features and Improvements

  • ARROW-6610 - [C++] Add cmake option to disable filesystem layer
  • ARROW-6661 - [Java] Implement APIs like slice to enhance VectorSchemaRoot (#5470)
  • ARROW-6777 - [GLib][CI] Unpin gobject-introspection gem
  • ARROW-6852 - [C++] Fix build issue on memory-benchmark
  • ARROW-6927 - [C++] Add gRPC version check
  • ARROW-6963 - [Packaging][Wheel][OSX] Use crossbow's command to deploy artifacts from travis builds

Apache Arrow 0.15.0 (2019-10-05)

New Features and Improvements

  • ARROW-453 - [C++] Filesystem implementation for Amazon S3
  • ARROW-517 - [C++] array comparison, uses D**2 space Myers
  • ARROW-750 - [Format][C++] Add LargeBinary and LargeString types
  • ARROW-1324 - [C++] Add support for bundled Boost with MSVC
  • ARROW-1561 - [C++] Kernel implementations for IsIn
  • ARROW-1566 - [C++] Implement non-materializing sort kernels
  • ARROW-1741 - [C++] Add DictionaryArray::CanCompareIndices
  • ARROW-1786 - [Format] List expected on-wire buffer layouts for each kind of Arrow physical type in specification
  • ARROW-1789 - [Format] Consolidate specification documents and improve clarity for new implementation authors
  • ARROW-1875 - [Java] Write 64-bit ints as strings in integration test JSON files
  • ARROW-2006 - [C++] Add option to trim excess padding when writing IPC messages
  • ARROW-2431 - [Rust] Schema fidelity
  • ARROW-2769 - [Python] Deprecate and rename add_metadata methods
  • ARROW-2931 - [Crossbow] Windows builds are attempting to run linux and osx packaging tasks
  • ARROW-3032 - [C++] Clean up Numpy-related headers
  • ARROW-3204 - [R] Enable R package to be made available on CRAN
  • ARROW-3243 - [C++] Upgrade jemalloc to version 5
  • ARROW-3246 - [C++][Python][Parquet] Direct writing of DictionaryArray to Parquet columns, automatic decoding to Arrow
  • ARROW-3325 - [Python][FOLLOWUP] In Python 2.7, a class's doc member is not writable (#5018)
  • ARROW-3325 - [Python][Parquet] Add "read_dictionary" argument to parquet.read_table, ParquetDataset to enable direct-to-DictionaryArray reads
  • ARROW-3531 - [Python] add Schema.field() method / deprecate field_by_name
  • ARROW-3538 - [Python] ability to override the automated assignment of uuid for filenames when writing datasets
  • ARROW-3579 - [Crossbow] Unintuitive error message when remote branch has not been pushed
  • ARROW-3643 - [Rust] optimize BooleanBufferBuilder::append_slice
  • ARROW-3710 - [Crossbow][Python] Run nightly tests against pandas master
  • ARROW-3772 - [C++][Parquet] Write Parquet dictionary indices directly to DictionaryBuilder rather than routing through dense form
  • ARROW-3777 - [C++] Add Slow input streams and slow filesystem
  • ARROW-3817 - [R] Extract methods for RecordBatch and Table
  • ARROW-3829 - [Python] add arrow_array protocol to support third-party array classes in conversion to Arrow
  • ARROW-3943 - [R] Write vignette for R package
  • ARROW-4036 - [C++] Pluggable Status message, by exposing an abstract delegate class.
  • ARROW-4095 - [C++] Optimize DictionaryArray::Transpose() for trivial transpositions
  • ARROW-4111 - [Python] Create time types from Python sequences of integers
  • ARROW-4218 - [Rust][Parquet] Initial support for array reader.
  • ARROW-4220 - [Python] Add buffered IO benchmarks with simulated high latency, allow duck-typed files in input_stream/output_stream
  • ARROW-4365 - [Rust][Parquet] Implement arrow record reader.
  • ARROW-4398 - [C++][Python][Parquet] Improve BYTE_ARRAY PLAIN encoding write performance. Add BYTE_ARRAY write benchmarks
  • ARROW-4473 - [Website] Add instructions to do a test-deploy of Arrow website and fix bugs
  • ARROW-4507 - [Format] Create outline and introduction for new document.
  • ARROW-4508 - [Format] Copy content from Layout.rst to new document.
  • ARROW-4509 - [Format] Copy content from Metadata.rst to new document.
  • ARROW-4510 - [Format] copy content from IPC.rst to new document.
  • ARROW-4511 - [Format][Docs] Revamp Format documentation, consolidate columnar format docs into a more coherent single document. Add Versioning/Stability page
  • ARROW-4648 - [Doc] Add documentation about C++ file naming
  • ARROW-4648 - [C++] Use underscores in source file names
  • ARROW-4649 - [C++/CI/R] Add nightly job that tests the homebrew formula
  • ARROW-4752 - [Rust] Add explicit SIMD vectorization for the divide kernel
  • ARROW-4810 - [Format][C++] Add LargeList type
  • ARROW-4841 - [C++] Add arrowOptions.cmake with options used to build arrow
  • ARROW-4860 - [C++] Build AWS C++ SDK for Windows in conda-forge
  • ARROW-5134 - [R][CI] Run nightly tests against multiple R versions
  • ARROW-5211 - [Format] Missing documentation under `Dictionary encoding` section on MetaData page
  • ARROW-5216 - [CI] Add Appveyor badge to README
  • ARROW-5307 - [CI][GLib] Enable GTK-Doc
  • ARROW-5337 - [C++] Add RecordBatch::field method, possibly deprecate "column"
  • ARROW-5343 - [C++] Refactor dictionary unification to incremental interface, and use Buffer for transpose map allocations
  • ARROW-5344 - [C++] Use ArrayDataVisitor in dict-to-anything cast
  • ARROW-5351 - [Rust] Take kernel
  • ARROW-5358 - [Rust] Implement equality check for ArrayData and Array
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5439 - [Java] Utilize stream EOS in File format
  • ARROW-5444 - [Release][Website] After 0.14 release, update what is an "official" release
  • ARROW-5458 - [C++] Apache Arrow parallel CRC32c computation optimization
  • ARROW-5480 - [Python] Add unit test asserting specifically that pandas.Categorical roundtrips to Parquet format without special options
  • ARROW-5483 - [Java] add ValueVector constructors that take Field object
  • ARROW-5494 - [Python] Create FileSystem bindings
  • ARROW-5505 - [R] Normalize file and class names, stop masking base R functions, add vignette, improve documentation
  • ARROW-5527 - [C++] Uses Buffer/Builder in HashTable and MemoTable
  • ARROW-5558 - [C++] Support Array::View on arrays with non-zero offset
  • ARROW-5559 - [C++] Add an IpcOptions structure
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5579 - [Java] Shade flatbuffers
  • ARROW-5580 - [C++][Gandiva] Correct definitions of timestamp functions in Gandiva
  • ARROW-5588 - [C++] Better support for building union arrays
  • ARROW-5594 - [C++] add UnionArrays support to Take/Filter kernels
  • ARROW-5610 - [Python] define extension types in Python
  • ARROW-5646 - [Crossbow][Documentation] Move the user guide to the Sphinx documentation
  • ARROW-5681 - [FlightRPC] Add Flight-specific error APIs
  • ARROW-5686 - [R] Review R Windows CI build
  • ARROW-5716 - [Developer] Improve merge PR script to attribute multiple authors
  • ARROW-5717 - [Python] Unify variable dictionaries when converting to pandas
  • ARROW-5719 - [Java] Support in-place vector sorting
  • ARROW-5722 - [Rust] Implement Debug for List/Struct/BinaryArray
  • ARROW-5734 - [Python] Dispatch to Table.from_arrays from pyarrow.table factory function
  • ARROW-5736 - [Format][C++] Support small bit-width indices in sparse tensor
  • ARROW-5741 - [JS] Make numeric vector from functions consistent with TypedArray.from
  • ARROW-5743 - [C++] Add cmake option and macros for enabling large memory tests
  • ARROW-5746 - [Website] Move website source out of apache/arrow
  • ARROW-5747 - [C++] Improve CSV header and column names options
  • ARROW-5758 - [C++][Gandiva][Java] Support casting decimals to varchar and vice versa
  • ARROW-5762 - [JS] Align Map type impl with the spec
  • ARROW-5777 - [C++] Add microbenchmark for some Decimal128 operations
  • ARROW-5778 - [Java] Extract the logic for vector data copying to the super classes
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5786 - [Release] Use arrow-jni profile to run "mvm release:perform"
  • ARROW-5788 - [Rust] Use both "path" and "version" for internal dependencies
  • ARROW-5789 - [C++] Minor fixes for warnings, remove unused ubsan.cc
  • ARROW-5792 - [Rust] Add TypeVisitor for parquet type.
  • ARROW-5798 - [Packaging][deb] Update doc architecture
  • ARROW-5800 - [R] Dockerize R Travis CI tests so they can be run anywhere via docker-compose
  • ARROW-5803 - [CI] Dockerize C++ with clang 7 Travis CI
  • ARROW-5812 - [Java] Refactor method name and param type in BaseIntVector
  • ARROW-5813 - [C++] Fix TensorEquals for different contiguous tensors
  • ARROW-5814 - [Java] Implement a <Object, int> HashMap for DictionaryEncoder
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5830 - [C++] Stop using memcmp in TensorEquals for tensors with float values
  • ARROW-5832 - [Java] Support search operations for vector data
  • ARROW-5833 - [C++] Factor out Status-enriching code
  • ARROW-5834 - [Java] Apply new hash map in DictionaryEncoder
  • ARROW-5835 - [Java] Support Dictionary Encoding for binary type
  • ARROW-5841 - [Website] Add 0.14.0 release note
  • ARROW-5842 - [Java] Revise the semantic of lastSet in ListVector
  • ARROW-5843 - [Java] Improve the readability and performance of BitVectorHelper#getNullCount
  • ARROW-5844 - [Java] Support comparison & sort for more numeric types
  • ARROW-5846 - [Java] Create Avro adapter module and add dependencies
  • ARROW-5853 - [Python] Expose boolean filter kernel on Array
  • ARROW-5861 - [Java] Initial implement to convert Avro record with primitive types
  • ARROW-5862 - [Java] Provide dictionary builder
  • ARROW-5864 - [Python] Simplify Result class cython wrapper
  • ARROW-5865 - [Release] Helper script to rebase PRs on master
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5867 - [C++][Gandiva] add support for cast int to decimal
  • ARROW-5872 - [C++][Gandiva] Support mod(double, double) function in Gandiva
  • ARROW-5876 - [C++][Python] add basic auth flight proto message to C++ and Python
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5880 - [C++][Parquet] Use TypedBufferBuilder instead of ArrayBuilder in writer.cc
  • ARROW-5881 - [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits
  • ARROW-5883 - [Java] Support dictionary encoding for List and Struct type
  • ARROW-5888 - [C++][Parquet][Python] Restore timezone metadata when original Arrow schema has been stored in Parquet metadata
  • ARROW-5891 - [C++][Gandiva] Remove duplicates in function registry
  • ARROW-5892 - [C++][Gandiva] Support function aliases
  • ARROW-5893 - [C++][Python][GLib][Ruby][MATLAB][R] Remove arrow::Column class
  • ARROW-5897 - [Java] Remove duplicated logic in MapVector
  • ARROW-5898 - [Java] Provide functionality to efficiently compute hash code for arbitrary memory segment
  • ARROW-5900 - [Java] Bounds check for decimal args.
  • ARROW-5901 - [Rust] Add equals to json arrays.
  • ARROW-5902 - [Java] Implement hash table and equals & hashCode API for dictionary encoding
  • ARROW-5903 - [Java] Optimise set methods in decimal vector
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5906 - [CI] Turn off ARROW_VERBOSE_THIRDPARTY_BUILD by default in Docker builds
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5909 - [Java] Optimize ByteFunctionHelpers equals & compare logic
  • ARROW-5911 - [Java] Make ListVector and MapVector create reader lazily
  • ARROW-5917 - [Java] Redesign the dictionary encoder
  • ARROW-5918 - [Java] Add get to BaseIntVector interface
  • ARROW-5919 - [R] Test R-in-conda as a nightly build
  • ARROW-5920 - [Java] Support sort & compare for all variable width vectors
  • ARROW-5924 - [Plasma] return a replica of GpuProcessHandle::ptr when create or get an object
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5943 - [GLib][Gandiva] Add support for function aliases
  • ARROW-5944 - [C++][Gandiva] Remove 'div' alias for 'divide'
  • ARROW-5945 - [Rust][DataFusion] Table trait can now be used to build real queries
  • ARROW-5947 - [Rust][DataFusion] Remove serde crate dependency
  • ARROW-5948 - [Rust] [DataFusion] create_logical_plan should not call optimizer
  • ARROW-5955 - [Plasma] Support setting memory quotas per plasma client for better isolation
  • ARROW-5957 - [C++][Gandiva] Implement div function in Gandiva
  • ARROW-5958 - [Python] Link zlib statically in the wheels
  • ARROW-5961 - [R] Be able to run R-only tests even without C++ library
  • ARROW-5962 - [CI][Python] Remove manylinux1 builds from Travis CI
  • ARROW-5967 - [Java] DateUtility#timeZoneList is not correct
  • ARROW-5970 - [Java] Provide pointer to Arrow buffer
  • ARROW-5974 - [C++] Support reading concatenated compressed streams
  • ARROW-5975 - [C++][Gandiva] support castTIMESTAMP(date)
  • ARROW-5976 - [C++] RETURN_IF_ERROR(ctx) should be namespaced
  • ARROW-5977 - [C++][Python] Allow specifying which columns to include
  • ARROW-5979 - [FlightRPC] Expose opaque (de)serialization of protocol types
  • ARROW-5985 - [Developer] Do not suggest setting Fix Version for patch releases by default
  • ARROW-5986 - [Java] Code cleanup for dictionary encoding
  • ARROW-5988 - [Java] Avro adapter implement simple Record type
  • ARROW-5997 - [Java] Support dictionary encoding for Union type
  • ARROW-5998 - [Java] Open a document to track the API changes
  • ARROW-6000 - [Python] Add support for LargeString and LargeBinary types
  • ARROW-6008 - [Release] Stop parallel binary artifacts upload
  • ARROW-6009 - [JS] Ignore NPM errors in the javascript release script
  • ARROW-6013 - [Java] Support range searcher
  • ARROW-6017 - [FlightRPC] Enable creating Flight Locations for unknown schemes
  • ARROW-6020 - [Java] Refactor ByteFunctionHelper#hash with new added ArrowBufHasher
  • ARROW-6021 - [Java] Extract copyFrom and copyFromSafe methods to ValueVector interface
  • ARROW-6022 - [Java] Support equals API in ValueVector to compare two vectors equal
  • ARROW-6023 - [C++][Gandiva] Add functions in Gandiva
  • ARROW-6024 - [Java] Provide more hash algorithms
  • ARROW-6026 - [Doc] Add CONTRIBUTING.md
  • ARROW-6030 - [Java] Efficiently compute hash code for ArrowBufPointer
  • ARROW-6031 - [Java] Support iterating a vector by ArrowBufPointer
  • ARROW-6034 - [C++][Gandiva] Add string functions in Gandiva
  • ARROW-6035 - [Java] Avro adapter support convert nullable value
  • ARROW-6036 - [GLib] Add support for skip rows and column_names CSV read option
  • ARROW-6037 - [GLib] Add a missing version macro
  • ARROW-6039 - [GLib] Add garrow_array_filter()
  • ARROW-6041 - [Website] Blog post announcing R library availability on CRAN
  • ARROW-6042 - [C++][Parquet] Add Dictionary32Builder that always returns 32-bit dictionary indices
  • ARROW-6045 - [C++] Add benchmark for double and float encoding/decoding, as well as NaN encoding
  • ARROW-6048 - [C++] Add ChunkedArray::View method that dispatches to Array::View
  • ARROW-6049 - [C++] Support view from one dictionary type to another in Array::View
  • ARROW-6053 - [Python] Fix pyarrow's RecordBatchStreamReader::Open2 type signature
  • ARROW-6063 - [FlightRPC] implement half-closed semantics for DoPut
  • ARROW-6065 - [C++][Parquet] Clean up parquet/arrow/reader.cc, reduce code duplication, improve readability
  • ARROW-6069 - [Rust][Parquet] Add converter.
  • ARROW-6070 - [Java] Avoid creating new schema before IPC sending
  • ARROW-6077 - [C++][Parquet] Build Arrow "schema tree" from Parquet schema to help with nested data implementation
  • ARROW-6078 - [Java] Implement dictionary-encoded subfields for List type
  • ARROW-6079 - [Java] Implement/test UnionFixedSizeListWriter for FixedSizeListVector
  • ARROW-6080 - [Java] Support search operation for BaseRepeatedValueVector
  • ARROW-6083 - [Java] Refactor Jdbc adapter consume logic
  • ARROW-6084 - [Python] Support LargeList
  • ARROW-6085 - [Rust][DataFusion] Add traits for physical query plan
  • ARROW-6086 - [Rust][DataFusion] Add support for partitioned Parquet data sources
  • ARROW-6087 - [Rust] [DataFusion] Implement parallel execution for CSV scan
  • ARROW-6088 - [Rust][DataFusion] Projection execution plan
  • ARROW-6089 - [Rust][DataFusion] Implement physical plan for "selection" operator
  • ARROW-6090 - [Rust][DataFusion] Physical plan for HashAggregate
  • ARROW-6093 - [Java] reduce branches in algo for first match in VectorRangeSearcher
  • ARROW-6094 - [FlightRPC] Add Flight RPC method getFlightSchema
  • ARROW-6096 - [C++] conditionally use boost regex for gcc < 4.9
  • ARROW-6097 - [Java] Avro adapter implement unions type
  • ARROW-6100 - [Rust] Pin to specific nightly rust for reproducible/stable builds
  • ARROW-6101 - [Rust][DataFusion] Parallel execution of physical query plan
  • ARROW-6102 - [Testing] Add partitioned CSV file to arrow-testing repo
  • ARROW-6104 - [Rust][DataFusion] Remove use of bare trait objects
  • ARROW-6105 - [C++][Parquet][Python] Add test case showing dictionary-encoded subfields in nested type
  • ARROW-6113 - [Java] Support vector deduplicate function
  • ARROW-6115 - [Python] Support LargeBinary and LargeString in conversion to python
  • ARROW-6118 - [Java] Replace google Preconditions with Arrow Preconditions
  • ARROW-6121 - [Tools] Improve merge tool ergonomics
  • ARROW-6125 - [Python] Remove Python APIs deprecated in 0.14.x and prior
  • ARROW-6127 - [Website] Add favicons and meta tags
  • ARROW-6128 - [C++] Suppress a class-memaccess warning
  • ARROW-6130 - [Release] Use 0.15.0 as the next release
  • ARROW-6134 - [C++][Gandiva] Add concat function in Gandiva
  • ARROW-6137 - [C++][Gandiva] Use snprintf instead of stringstream in castVARCHAR(timestamp)
  • ARROW-6137 - [C++][Gandiva] Change output format of castVARCHAR(timestamp) in Gandiva
  • ARROW-6138 - [C++] Add a basic (single RecordBatch) implementation of Dataset
  • ARROW-6139 - [Documentation][R] Build R docs (pkgdown) site and add to arrow-site
  • ARROW-6141 - [C++] Enable memory-mapping a file region
  • ARROW-6142 - [R] Install instructions on linux could be clearer
  • ARROW-6143 - [Java] Unify the copyFrom and copyFromSafe methods for all vectors
  • ARROW-6144 - [C++][Gandiva] Implement random functions in Gandiva
  • ARROW-6155 - [Java] Extract a super interface for vectors whose elements reside in continuous memory segments
  • ARROW-6156 - [Java] Support compare semantics for ArrowBufPointer
  • ARROW-6161 - [C++][Dataset] Implements ParquetFragment
  • ARROW-6162 - [C++][Gandiva] Do not truncate string in castVARCHAR_utf8 if output length is zero
  • ARROW-6164 - [Docs][Format] Document project versioning schema and forward/backward compatibility policies
  • ARROW-6172 - [Java] Provide benchmarks to set IntVector with different methods
  • ARROW-6177 - [C++] Add Array::Validate()
  • ARROW-6180 - [C++][Parquet] Add RandomAccessFile::GetStream that returns InputStream that reads a file segment independent of the file's state, fix concurrent buffered Parquet column reads
  • ARROW-6181 - [R] Only allow R package to install without libarrow on linux
  • ARROW-6183 - [R] Document that you don't have to use tidyselect if you don't want
  • ARROW-6185 - [Java] Provide hash table based dictionary builder
  • ARROW-6187 - [C++] Fallback to storage type when writing ExtensionType to Parquet
  • ARROW-6188 - [GLib] Add garrow_array_is_in()
  • ARROW-6192 - [GLib] Use the same SO version as C++
  • ARROW-6194 - [Java] Add non-static approach in DictionaryEncoder making it easy to extend and reuse
  • ARROW-6196 - [Ruby] Add support for building Arrow::TimeNNArray by .new
  • ARROW-6197 - [GLib] Add garrow_decimal128_rescale()
  • ARROW-6199 - [Java] Avro adapter avoid potential resource leak.
  • ARROW-6203 - [GLib] Add garrow_array_sort_to_indices()
  • ARROW-6204 - [GLib] Add garrow_array_is_in_chunked_array()
  • ARROW-6206 - [Java][Docs] Document environment variables/java properties
  • ARROW-6209 - [Java] Extract set null method to the base class for fixed width vectors
  • ARROW-6212 - [Java] Support vector rank operation
  • ARROW-6216 - [C++][Parquet] Expose codec compression level to user, add to Parquet writer properties
  • ARROW-6217 - [Website] Remove needless _site/ directory
  • ARROW-6219 - [Java] Add API for JDBC adapter that can convert less then the full result set at a time
  • ARROW-6220 - [Java] Add API to avro adapter to limit number of rows returned at a time.
  • ARROW-6225 - [Website] Update arrow-site/README and any other places to point website contributors in right direction
  • ARROW-6229 - [C++][Dataset] implement FileSystemBasedDataSource
  • ARROW-6230 - [R] Reading in Parquet files are 20x slower than reading fst files in R
  • ARROW-6231 - [C++] Allow generating CSV column names
  • ARROW-6232 - [C++] Rename Argsort kernel to SortToIndices
  • ARROW-6237 - [R] Allow compilation flags to be passed for R package with ARROW_R_CXXFLAGS
  • ARROW-6238 - [C++][Dataset] Implement SimpleDataSource, SimpleDataFragment and SimpleScanTask
  • ARROW-6240 - [Ruby] Arrow::Decimal128Array#get_value returns BigDecimal
  • ARROW-6242 - [C++][Dataset] Implement Dataset, Scanner and ScannerBuilder
  • ARROW-6243 - [C++][Dataset] Filter expressions
  • ARROW-6244 - [C++][Dataset] Add partition key to DataSource interface
  • ARROW-6246 - [Website] Add link to R documentation site
  • ARROW-6247 - [Java] Provide a common interface for float4 and float8 vectors
  • ARROW-6249 - [Java] Remove useless class ByteArrayWrapper
  • ARROW-6250 - [Java] Implement ApproxEqualsVisitor comparing approx for floating point
  • ARROW-6252 - [C++][Python] Add Array::Diff in C++ and Array.diff in Python to return diff as string
  • ARROW-6253 - [Python] Expose "enable_buffered_stream" option from parquet::ReaderProperties in pyarrow.parquet.read_table
  • ARROW-6258 - [R] Add macOS build scripts
  • ARROW-6260 - [Website] Use deploy key on Travis to build and push to asf-site
  • ARROW-6262 - [Developer] Show JIRA issue before merging
  • ARROW-6264 - [Java] There is no need to consider byte order in ArrowBufHasher
  • ARROW-6265 - [Java] Avro adapter implement Array/Map/Fixed type
  • ARROW-6267 - [Ruby] Add Arrow::Time for Arrow::Time{32,64}DataType value
  • ARROW-6271 - [Rust][DataFusion] Add example for running SQL against Parquet
  • ARROW-6272 - [Rust][DataFusion] Add register_parquet convenience method to ExecutionContext
  • ARROW-6278 - [R] Read parquet files from raw vector
  • ARROW-6279 - [Python] Add Table.slice, getitem support to match RecordBatch, Array, others
  • ARROW-6284 - [C++] Allow references in std::tuple when converting tuple to arrow array
  • ARROW-6287 - [Rust][DataFusion] TableProvider.scan() returns thread-safe BatchIterator
  • ARROW-6288 - [Java] Implement TypeEqualsVisitor comparing vector type equals considering names and metadata
  • ARROW-6289 - [Java] Add empty() in UnionVector to create instance
  • ARROW-6292 - [C++] Add option to use the mimalloc allocator
  • ARROW-6294 - [C++] Use hyphen for plasma-store-server executable
  • ARROW-6295 - [Rust][DataFusion] ExecutionError Cannot compare Float32 with Float64
  • ARROW-6296 - [Java] Cleanup JDBC interfaces and eliminate one memcopy for binary/varchar fields
  • ARROW-6297 - [Java] Compare ArrowBufPointers by unsinged integers
  • ARROW-6300 - [C++] Add Abort() method to streams
  • ARROW-6303 - [Rust] Add a feature to disable SIMD
  • ARROW-6304 - [Java][Doc] Add a description to each module
  • ARROW-6306 - [Java] Support stable sort by stable comparators
  • ARROW-6310 - [C++] Write 64-bit integers as strings in JSON integration test files
  • ARROW-6311 - [Java] Make ApproxEqualsVisitor accept DiffFunction to make it more flexible
  • ARROW-6313 - [Format] Tracking for ensuring flatbuffer serialized values are aligned in stream/files.
  • ARROW-6314 - [C#] Implement IPC message format alignment changes, provide backwards compatibility and "legacy" option to emit old message format
  • ARROW-6314 - [C++] Implement IPC message format alignment changes, provide backwards compatibility and "legacy" option to emit old message format
  • ARROW-6315 - [Java] Make change to ensure flatbuffer reads are aligned
  • ARROW-6316 - [Go] implement new ARROW format with 32b-aligned buffers
  • ARROW-6317 - [JS] Implement IPC message format alignment changes
  • ARROW-6318 - [Integration] Run tests against pregenerated files
  • ARROW-6319 - [C++] Move the core of NumericTensor<T>::Value() to Tensor::Value<T>()
  • ARROW-6326 - [C++] Nullable fields when converting std::tuple to Table
  • ARROW-6328 - [Developer][crossbow] Click.option-s should have help text
  • ARROW-6329 - [Format] Add a padding for Flatbuffer alignment, use 8-byte EOS
  • ARROW-6331 - [Java] Incorporate ErrorProne into the java build
  • ARROW-6334 - [Java] Improve the dictionary builder API to return the position of the value in the dictionary
  • ARROW-6335 - [Java] Improve the performance of DictionaryHashTable
  • ARROW-6336 - [Python] Add notes to pyarrow.serialize/deserialize to clarify that these functions do not read or write the standard IPC protocol
  • ARROW-6337 - [R] Changed as_tible to as_dataframe in the R package
  • ARROW-6338 - [R] Type function names don't match type names
  • ARROW-6342 - [Python] Add pyarrow.record_batch factory function with same basic API / semantics as pyarrow.table
  • ARROW-6346 - [GLib] Add garrow_array_view()
  • ARROW-6347 - [GLib] Add garrow_array_diff()
  • ARROW-6350 - [Ruby] Remove Arrow::Struct and use Hash instead
  • ARROW-6351 - [Ruby] Improve Arrow#values performance
  • ARROW-6353 - [Python][C++] Expose compression_level option to parquet.write_table
  • ARROW-6355 - [Java] Make range equal visitor reusable
  • ARROW-6356 - [Java] Avro adapter implement Enum type and nested Record
  • ARROW-6357 - [C++] Issue S3 file writes in the background by default
  • ARROW-6358 - [C++] Add FileSystem::DeleteDirContents
  • ARROW-6360 - [R] Update support for compression
  • ARROW-6362 - [C++] Allow customizing S3 credentials provider
  • ARROW-6365 - [R] Should be able to coerce numeric to integer with schema
  • ARROW-6366 - [Java] Make field vectors final explicitly
  • ARROW-6368 - [C++][Dataset] Add interface for "projecting" RecordBatch from one schema to another, inserting null values where needed
  • ARROW-6373 - [C++] Make FixedWidthBinaryBuilder consistent with other fixed width builders in zeroing memory when appending null batches
  • ARROW-6375 - [C++] Extend ConversionTraits to allow efficiently appending list values in STL API
  • ARROW-6379 - [C++] Write no IPC buffer metadata for NullType
  • ARROW-6381 - [C++] BufferOutputStream::Write does extra work that slows down small writes
  • ARROW-6383 - [Java] Report outstanding child allocators on close
  • ARROW-6384 - [C++] Bump dependency versions
  • ARROW-6385 - [C++] Use xxh3 instead of custom hashing code for non-tiny strings
  • ARROW-6391 - [Python][Flight] Add built-in methods on FlightServerBase to start server and wait for it to be available
  • ARROW-6397 - [C++][CI] Generate minio server connect string
  • ARROW-6401 - [Java] Implement dictionary-encoded subfields for Struct type
  • ARROW-6402 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6403 - [Python] Expose FileReader::ReadRowGroups() to Python
  • ARROW-6408 - [Rust] use "if cfg!" pattern
  • ARROW-6413 - [R] Support autogenerating column names
  • ARROW-6415 - [R] Remove usage of R CMD config CXXCPP
  • ARROW-6416 - [Python] Improve API & documentation regarding chunksizes
  • ARROW-6417 - [C++][Parquet] Miscellaneous optimizations yielding slightly better Parquet binary read performance
  • ARROW-6419 - [Website] Blog post about Parquet dictionary performance work coming in 0.15.x release
  • ARROW-6422 - [Gandiva] Fix double-conversion linker issue
  • ARROW-6426 - [FlightRPC][C++][Java] Expose gRPC configuration knobs
  • ARROW-6427 - [GLib] Add support for column names autogeneration CSV read option
  • ARROW-6438 - [R] : Add bindings for filesystem API
  • ARROW-6447 - [C++] Allow rest of arrow_objlib to build in parallel while memory_pool.cc is waiting on jemalloc_ep
  • ARROW-6450 - [C++] Use 2x reallocation strategy in BufferBuilder instead of 1.5x
  • ARROW-6451 - [Format] Add clarifications to Columnar.rst about the contents of "null" slots in Varbinary or List arrays
  • ARROW-6453 - [C++] More informative error messages with S3
  • ARROW-6454 - [LICENSE] Add LLVM's license due to static linkage
  • ARROW-6458 - [Java] Remove value boxing/unboxing for ApproxEqualsVisitor
  • ARROW-6460 - [Java] Add benchmark and large fake data UT for avro adapter
  • ARROW-6462 - [C++] Fix build error on CentOS 6 x86_64 with bundled double-conversion
  • ARROW-6465 - [Python] Improvement to Windows build instructions
  • ARROW-6474 - [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable
  • ARROW-6475 - [C++] Don't try to dictionary encode dictionary arrays
  • ARROW-6477 - [Packaging][Crossbow] Use Azure Pipelines to build linux packages
  • ARROW-6480 - [Crossbow] Summary report e-mailer with polling logic
  • ARROW-6484 - [Java] Enable create indexType for DictionaryEncoding according to dictionary value count
  • ARROW-6487 - [Rust][DataFusion] Introduce common test module
  • ARROW-6489 - [Developer][Documentation] Fix merge script and readme
  • ARROW-6490 - [Java][Memory] Log error for leak in allocator close
  • ARROW-6491 - [Java][Hotfix] fix master fail caused by ErrorProne
  • ARROW-6494 - [C++][Dataset] Implement basic PartitionScheme
  • ARROW-6504 - [Python][Packaging] Add mimalloc to conda packages for better performance
  • ARROW-6505 - [Website] Add new committers
  • ARROW-6518 - [Packaging][Python] Flight failing in OSX Python wheel builds
  • ARROW-6519 - [Java] Use IPC continuation prefix as part of 8-byte EOS
  • ARROW-6524 - [Developer][Packaging] Nightly build report's subject should contain Arrow
  • ARROW-6525 - [C++] Avoid aborting in CloseFromDestructor()
  • ARROW-6526 - [C++] Poison data in debug mode
  • ARROW-6527 - [C++] Add OutputStream::Write(Buffer)
  • ARROW-6531 - [Python] Add detach() method to buffered streams
  • ARROW-6532 - [R] write_parquet() uses writer properties (general and arrow specific)
  • ARROW-6533 - [R] Compression codec should take a "level"
  • ARROW-6534 - [Java] Fix typos and spelling
  • ARROW-6539 - [R] Provide mechanism to write out old format
  • ARROW-6540 - [R] Add Validate() methods
  • ARROW-6541 - [Format][C++] Update Columnar.rst for two-part EOS, update C++ implementation
  • ARROW-6542 - [R] : Add View() method to array types
  • ARROW-6544 - [R] Documentation/polishing for 0.15 release
  • ARROW-6545 - [Go] update IPC writer to use two-part EOS
  • ARROW-6546 - [C++] Add missing FlatBuffers source dependency
  • ARROW-6549 - [C++] Switch to jemalloc 5.2.x
  • ARROW-6556 - [Python] Fix warning for pandas SparseDataFrame removal
  • ARROW-6556 - [Python] Handle future removal of pandas SparseDataFrame
  • ARROW-6557 - [Python] Always return pandas.Series from Array/ChunkedArray.to_pandas. Add mechanism to preserve "column names" from RecordBatch, Table as Series.name
  • ARROW-6558 - [C++] Refactor Iterator to type erased handle
  • ARROW-6559 - [Developer][C++] Add option to pass ARROW_PACKAGE_PREFIX when using 'archery benchmark'
  • ARROW-6563 - [Rust][DataFusion] MergeExec
  • ARROW-6569 - [Website] Add support for auto deployment by GitHub Actions
  • ARROW-6570 - [Python] Use Arrow's allocators for creating NumPy array instead of leaving it to NumPy
  • ARROW-6580 - [Java] Support comparison for unsigned integers
  • ARROW-6584 - [Python][Wheel] Bundle zlib again with the windows wheels
  • ARROW-6588 - [C++] Suppress class-memaccess warning with g++ 9.2.1
  • ARROW-6589 - [C++] Error propagation, tests for /MakeArray(OfNulls|FromScalar)/
  • ARROW-6590 - [C++] Do not require ARROW_JSON to build ARROW_IPC when unit tests are off
  • ARROW-6591 - [R] Ignore .Rhistory files in source control
  • ARROW-6599 - [Rust][DataFusion] Add aggregate traits and SUM implementation to physical query plan
  • ARROW-6601 - [Java] Improve JDBC adapter performance & add benchmark
  • ARROW-6605 - [C++][Filesystem] Add recursion depth control to fs::Selector
  • ARROW-6606 - [C++] Add PathTree tree structure
  • ARROW-6609 - [C++] Add Dockerfile for minimal C++ build
  • ARROW-6613 - [C++] Remove dependency on boost::filesystem
  • ARROW-6614 - [C++][Dataset] Implement FileSystemDataSourceDiscovery
  • ARROW-6616 - [Website] Release announcement blog post for 0.15
  • ARROW-6621 - [Rust][DataFusion] Run DataFusion examples in CI
  • ARROW-6629 - [Doc][C++] Add filesystem docs
  • ARROW-6630 - [Doc] Document C++ file formats
  • ARROW-6644 - [JS] Amend NullType IPC protocol to append no buffers
  • ARROW-6647 - [C++] Stop using member initializer for shared_ptr
  • ARROW-6648 - [Go] Expose the bitutil package
  • ARROW-6649 - [R] print methods for Array, ChunkedArray, Table, RecordBatch
  • ARROW-6653 - [Developer] Add support for auto JIRA link on pull request
  • ARROW-6655 - [Python] Filesystem bindings for S3
  • ARROW-6664 - [C++] Add CMake option to build without SSE4.2 instructions
  • ARROW-6665 - [Rust][DataFusion] Implement physical expression for numeric literal types
  • ARROW-6667 - [Python] remove cyclical object references in pyarrow.parquet
  • ARROW-6668 - [Rust][DataFusion] Implement CAST expression
  • ARROW-6669 - [Rust][DataFusion] Implement binary expression for physical plan
  • ARROW-6675 - [JS] Add scanReverse function to dataFrame and filteredDataframe
  • ARROW-6683 - [Python] Test for fastparquet <-> pyarrow cross-compatibility
  • ARROW-6725 - [CI] Disable 3rdparty fuzzit nightly builds
  • ARROW-6735 - [C++] Suppress sign-compare warning with g++ 9.2.1
  • ARROW-6752 - [Go] implement Stringer for Null array
  • ARROW-6755 - [Release] Improvements to Windows release verification script
  • ARROW-6771 - [Packaging][Python] Missing pytest dependency from conda and wheel builds
  • PARQUET-1468 - [C++] Clean up ColumnReader/internal::RecordReader code duplication

Bug Fixes

  • ARROW-1184 - [Java] Dictionary.equals is not working correctly
  • ARROW-2041 - [Python] pyarrow.serialize has high overhead for list of NumPy arrays
  • ARROW-2248 - [Python] Nightly or on-demand HDFS test builds
  • ARROW-2317 - [Python] Fix C linkage warning with Cython
  • ARROW-2490 - [C++] Normalize input stream concurrency
  • ARROW-3176 - [Python] Overflow in Date32 column conversion to pandas
  • ARROW-3203 - [C++] Build error on Debian Buster
  • ARROW-3651 - [Python] Handle 'datetime' logical type when reconstructing pandas columns from custom metadata
  • ARROW-3652 - [Python][Parquet] Add unit test exhibiting that pandas.CategoricalIndex survives roundtrip to Parquet format
  • ARROW-3762 - [Python] Add large_memory unit test exercising BYTE_ARRAY overflow edge cases from ARROW-3762
  • ARROW-3933 - [C++][Parquet] Handle non-nullable struct children when reading Parquet file, better error messages
  • ARROW-4187 - [C++] Enable file-benchmark on Windows
  • ARROW-4746 - [C++/Python] PyDataTime_Date wrongly casted to PyDataTime_DateTime
  • ARROW-4836 - [C++] Support Tell() on compressed streams
  • ARROW-4848 - [C++] Static libparquet not compiled with -DARROW_STATIC on Windows
  • ARROW-4880 - [Python] Rehabilitate ASV benchmark build scripts
  • ARROW-4883 - [Python] read_csv() returns garbage if given file object in text mode
  • ARROW-5028 - [Python] Avoid malformed ListArray types caused by reaching StringBuilder capacity when converting from Python sequence
  • ARROW-5072 - [Python] write_table fails silently on S3 errors
  • ARROW-5085 - [C++][Parquet][Python] Do not allow reading to dictionary type unless we have implemented support for it
  • ARROW-5086 - [Python][Parquet] Opt in to file memory-mapping when reading Parquet files rather than opting out
  • ARROW-5089 - [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size
  • ARROW-5103 - [Python] Segfault when using chunked_array.to_pandas on array different types (edge case)
  • ARROW-5125 - [Python] Round-trip extreme dates on windows
  • ARROW-5161 - [Python] Cannot convert struct type from Pandas object column
  • ARROW-5220 - [Python] Follow-up to improve error messages and docs for from_pandas schema argument
  • ARROW-5220 - [Python] Specified schema in from_pandas also includes the index
  • ARROW-5292 - [C++] Work around symbol visibility issues so building static libraries is not necessary when building unit tests on WIN32 platform
  • ARROW-5300 - [C++] Remove the ARROW_NO_DEFAULT_MEMORY_POOL macro
  • ARROW-5374 - [Python][C++] Improve ipc.read_record_batch docstring, fix IPC message type error messages generated in C++
  • ARROW-5414 - [C++] default to release build on windows
  • ARROW-5450 - [Python] Always return datetime.datetime in TimestampValue.as_py for units other than nanoseconds
  • ARROW-5471 - [C++][Gandiva] Array offset is ignored in Gandiva projector
  • ARROW-5522 - [Packaging][Documentation] Comments out of date in python/manylinux1/build_arrow.sh
  • ARROW-5525 - [C++] Add Continuous Fuzzing Integration setup with Fuzzit
  • ARROW-5560 - [C++][Plasma] Cannot create Plasma object after OutOfMemory error
  • ARROW-5562 - [C++][Parquet] Write negative zero or small epsilons as positive zero when computing Parquet statistics
  • ARROW-5630 - [C++][Parquet] Fix RecordReader accounting for repeated fields with non-nullable leaf
  • ARROW-5638 - [C++][CMake] Fixes for xcode project builds
  • ARROW-5651 - [Python] Fix Incorrect conversion from strided Numpy array
  • ARROW-5682 - [Python] Raise error when trying to convert non-string dtype to string
  • ARROW-5731 - [CI] Switch turbodbc branch for integration testing
  • ARROW-5753 - [Rust] Fix test failure in CI code coverage
  • ARROW-5772 - [GLib][Plasma][CUDA] Fix a bug that data can't be got
  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5776 - [Gandiva][Crossbow] Use commit id instead of fetch head.
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5817 - [Python] Use pytest mark for flight tests
  • ARROW-5823 - [Rust] CI scripts miss --all-targets cargo argument
  • ARROW-5824 - [Gandiva][C++] Fix decimal null literals.
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5848 - [C++] SO versioning schema after release 1.0.0
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5860 - [Java][Vector] Fix decimal utils to handle negative values.
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5870 - [C++][Docs] Refine source build instructions, do not tell people to install flex/bison if they don't need them
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5884 - [Java] Fix the get method of StructVector
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5894 - [Gandiva][C++] Added a linker script for libgandiva.so to restrict libstdc++ symbols.
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5910 - [Python] Support non-seekable streams in ipc.read_tensor, ipc.read_message, add Message.serialize_to method
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • ARROW-5923 - [C++][Parquet] Reword comment about UBSan and Int96 in writer.cc
  • ARROW-5925 - [Gandiva][C++] fix rounding in decimal to int cast
  • ARROW-5930 - [Python] Make Flight server init phase explicit
  • ARROW-5930 - [FlightRPC][Python] Disable Flight test causing segfault in Travis
  • ARROW-5935 - [C++] ArrayBuilder::type() should be kept accurate
  • ARROW-5946 - [Rust][DataFusion] Fix bug in projection push down logic
  • ARROW-5952 - [Python] fix conversion of chunked dictionary array with 0 chunks
  • ARROW-5959 - [CI] report branch+commit to fuzzit
  • ARROW-5960 - [C++] Fix Boost dependencies link order
  • ARROW-5963 - [R] R Appveyor job does not test changes in the C++ library
  • ARROW-5964 - [C++][Gandiva] Remove overflow check after rounding in BasicDecimal128::FromDouble
  • ARROW-5965 - [Python] Regression: segfault when reading hive table with v0.14
  • ARROW-5966 - [Python] Also use ChunkedStringBuilder when converting NumPy string types to Arrow StringType
  • ARROW-5968 - [Java] Remove duplicate Preconditions check in JDBC adapter
  • ARROW-5969 - [R] Fix R lint Failures
  • ARROW-5973 - [Java] Variable width vectors' get methods should return null when the underlying data is null
  • ARROW-5978 - [FlightRPC][Java] Properly release buffers in Flight integration client
  • ARROW-5989 - [C++] Accommodate openjdk-8 path search prefix
  • ARROW-5990 - [Python] add bounds check to RowGroupMetaData.column
  • ARROW-5992 - [C++][Python] Support String->Binary in Array::View. Add Python bindings for Array::View
  • ARROW-5993 - [Python] Reading a dictionary column from Parquet results in disproportionate memory usage
  • ARROW-5996 - [Java] Avoid potential resource leak in flight service
  • ARROW-5999 - [C++] decouple Iterator from ARROW_DATASETS
  • ARROW-6002 - [C++][Gandiva] test casting int64 to decimal
  • ARROW-6004 - [C++] Turn non-ignored empty CSV lines into null/empty values
  • ARROW-6005 - [C++] extend GetRecordBatchReader test to cover reading a single row group
  • ARROW-6006 - [C++] Do not fail to read empty IPC stream with schema having dictionary types
  • ARROW-6012 - [C++] Fall back on known Apache mirror for Thrift downloads
  • ARROW-6015 - [Python] Add note to python/README.md about installing Visual C++ Redistributable on Windows when using pip
  • ARROW-6016 - [Python] Fix get_library_dirs() when Arrow installed as a system package
  • ARROW-6029 - [R] Improve R docs on how to fix library version mismatch
  • ARROW-6032 - [C++] Ensure 64-bit pointer alignment in CountSetBits()
  • ARROW-6038 - [C++] Faster type equality
  • ARROW-6040 - [Java] Dictionary entries are required in IPC streams even when empty
  • ARROW-6046 - [C++] Do not write excess varbinary offsets in IPC messages from sliced BinaryArray
  • ARROW-6047 - [Rust] Rust nightly 1.38.0 builds failing
  • ARROW-6050 - [Java] Update out-of-date java/flight/README.md
  • ARROW-6054 - [Python] Fix the type erasion bug when serializing structured type ndarray.
  • ARROW-6058 - [C++][Parquet] Validate whole ColumnChunk raw data reads so that underlying filesystem issues are caught earlier
  • ARROW-6059 - [Python] Regression memory issue when calling pandas.read_parquet
  • ARROW-6060 - [C++] ChunkedBinaryBuilder should only grow when necessary, address runaway memory use in Parquet binary column read
  • ARROW-6061 - [C++] Add ARROW_JSON feature flag for configuring arrow builds without RapidJSON
  • ARROW-6066 - [Website] Fix blog post author header
  • ARROW-6067 - [Python] Fix failing large memory Python tests
  • ARROW-6068 - [C++] Allow passing Field instances to StructArray::Make
  • ARROW-6073 - [C++] Reset Decimal128Builder in Finish().
  • ARROW-6082 - [Python] check type of the index_type passed to pa.dictionary()
  • ARROW-6092 - [Python] Fix C++ arrow-python-test on Python 2.7
  • ARROW-6095 - [C++] Fix unit test build when only building static libraries, add cpp-static-only to tests.yml
  • ARROW-6108 - [C++] Workaround Windows CRT crash on invalid locale
  • ARROW-6116 - [C++][Gandiva] Fix bug in TimedTestFilterAdd2
  • ARROW-6117 - [Java] Fix the set method of FixedSizeBinaryVector
  • ARROW-6119 - [Python] PyArrow wheel import fails on Windows Python 3.7
  • ARROW-6120 - [C++] Forbid use of <iostream> in public header files
  • ARROW-6126 - [C++] Return error when an IPC stream terminates in the middle of receiving dictionaries
  • ARROW-6132 - [Python] validate result in ListArray.from_arrays
  • ARROW-6135 - [C++] Make KeyValueMetadata::Equals() order-insensitive
  • ARROW-6136 - [FlightRPC][Java] don't double-close response stream
  • ARROW-6145 - [Java] UnionVector created by MinorType#getNewVector could not keep field type info properly
  • ARROW-6148 - [Packaging] Improve aarch64 support
  • ARROW-6152 - [C++][Parquet] Add parquet::ColumnWriter::WriteArrow method, refactor
  • ARROW-6153 - [R] Address parquet deprecation warning
  • ARROW-6158 - [C++/Python] Validate child array types with type fields of StructArray
  • ARROW-6159 - [C++] Properly indent first line of PrettyPrint with Schema
  • ARROW-6160 - [Java] AbstractStructVector#getPrimitiveVectors fails to work with complex child vectors
  • ARROW-6166 - [Go] Fix index out of bounds panic when slicing a slice
  • ARROW-6167 - [R] macOS binary R packages on CRAN don't have arrow_available
  • ARROW-6168 - [C++] IWYU docker-compose job is broken
  • ARROW-6170 - [R] Faster docker-compose build
  • ARROW-6171 - [R][CI] Fix R library search path
  • ARROW-6174 - [C++] Validate chunks in ChunkedArray::Validate. Fix validation of sliced ListArray, values null checks
  • ARROW-6175 - [Java] Fix MapVector#getMinorType and extend AbstractContainerVector addOrGet complex vector API
  • ARROW-6178 - [Developer] Keep prompting for authors in merge script for multi-author PRs if given bad input
  • ARROW-6182 - [R] Add note to README about r-arrow conda installation
  • ARROW-6186 - [Packaging][deb] Add missing headers to libplasma-dev for Ubuntu 16.04
  • ARROW-6190 - [C++] Define and declare functions regardless of NDEBUG
  • ARROW-6193 - [GLib] Add missing require in test
  • ARROW-6200 - [Java] Method getBufferSizeFor in BaseRepeatedValueVector/ListVector not correct
  • ARROW-6202 - [Java] Add unit test for large resultsets
  • ARROW-6205 - [C++] ARROW_DEPRECATED warning when including io/interfaces.h
  • ARROW-6208 - [Java] Correct byte order before comparing in ByteFunctionHelpers
  • ARROW-6210 - [Java] remove equals API from ValueVector
  • ARROW-6211 - [Java] Remove dependency on RangeEqualsVisitor from ValueVector interface
  • ARROW-6214 - [R] Add R sanitizer docker image
  • ARROW-6215 - [Java] Fix case when ZeroVector is compared against other vector types
  • ARROW-6218 - [Java] Add UINT type test in integration to avoid potential overflow
  • ARROW-6223 - [C++] Configuration error with Anaconda Python 3.7.4
  • ARROW-6224 - [Python] fix deprecated usage of .data (previouly Column.data)
  • ARROW-6227 - [Python] Apply from_pandas option in pyarrow.array consistently across types
  • ARROW-6234 - [Java] ListVector hashCode() is not correct
  • ARROW-6241 - [Java] Failures on master
  • ARROW-6255 - [Rust] [Parquet] Cannot use any published parquet crate due to parquet-format breaking change
  • ARROW-6259 - [C++] Add -Wno-extra-semi-stmt when compiling with clang 8 to work around Flatbuffers bug, suppress other new LLVM 8 warnings
  • ARROW-6263 - [Python] Use RecordBatch::Validate in RecordBatch.from_arrays. Normalize API vs. Table.from_arrays. Add record_batch factory function
  • ARROW-6266 - [Java] Resolve the ambiguous method overload in RangeEqualsVisitor
  • ARROW-6268 - [Java] Empty buffers to have a valid address.
  • ARROW-6269 - [C++] check decimal precision in IPC code
  • ARROW-6270 - [C++] check buffer_index bounds in IpcComponentSource.GetBuffer
  • ARROW-6290 - [Rust][DataFusion] Fix bug in type coercion rule
  • ARROW-6291 - [C++] Do not override ARROW_PARQUET if other PARQUET options are enabled
  • ARROW-6293 - [Rust] datafusion 0.15.0-SNAPSHOT error
  • ARROW-6301 - [C++][Python] Prevent ExtensionType-related race condition in Python process teardown by exposing shared_ptr to global "ExtensionTypeRegistry"
  • ARROW-6302 - [C++][Parquet][Python] Restore ordered type property when reading dictionary type with serialized Arrow schema
  • ARROW-6309 - [C++][Parquet] Stop needless static linking
  • ARROW-6323 - [R] Expand file paths when passing to readers
  • ARROW-6325 - [Python] fix conversion of strided boolean arrays
  • ARROW-6330 - [C++] Include missing API headers
  • ARROW-6332 - [Java][C++][Gandiva] Misc fixes for varwidth vector allocation.
  • ARROW-6339 - [Python] Raise ValueError when accessing unset statistics
  • ARROW-6343 - [Java][Vector] Fix allocation helper.
  • ARROW-6344 - [C++][Gandiva] Handle multibyte characters in substring function
  • ARROW-6345 - [C++][Python] "ordered" flag seemingly not taken into account when comparing DictionaryType values for equality
  • ARROW-6348 - [R] arrow::read_csv_arrow namespace error when package not loaded
  • ARROW-6354 - [C++] Fix failing build when ARROW_PARQUET=OFF
  • ARROW-6363 - [R] segfault in Table__from_dots with unexpected schema
  • ARROW-6364 - [R] Handling unexpected input to time64() et al:
  • ARROW-6369 - [C++] Handle Array.to_pandas case for type=list<bool>
  • ARROW-6371 - [Doc] Row to columnar conversion example mentions arrow::Column in comments
  • ARROW-6372 - [Rust][Datafusion] Casting from Un-signed to Signed Integers not supported
  • ARROW-6376 - [Developer] Use target ref of PR when merging instead of hard-coding "master"
  • ARROW-6387 - [Archery] Errors with make
  • ARROW-6392 - [FlightRPC][Python] check type of list_flights result
  • ARROW-6395 - [Python] Bug when using bool arrays with stride greater than 1
  • ARROW-6406 - [C++] Fix jemalloc URL for offline build in thirdparty/versions.txt
  • ARROW-6411 - [Python][Parquet] Improve performance of DictEncoder::PutIndices
  • ARROW-6412 - [C++] Improve TCP port allocation in tests
  • ARROW-6418 - [C++][Plasma] Remove cmake project directive for plasma
  • ARROW-6423 - [C++] Fix crash when trying to instantiate Snappy CompressedOutputStream
  • ARROW-6424 - [C++] Fix IPC fuzzing test name
  • ARROW-6425 - [C++] ValidateArray fail for slice of list array
  • ARROW-6428 - [CI][Crossbow] Nightly turbodbc job fails
  • ARROW-6430 - [CI][Crossbow] Nightly R docker job fails
  • ARROW-6431 - [Python] Test suite fails without pandas installed
  • ARROW-6432 - [CI][Crossbow] Remove alpine nightly crossbow jobs
  • ARROW-6433 - [Java][CI] Fix java docker image
  • ARROW-6434 - [CI][Crossbow] Nightly HDFS integration job fails
  • ARROW-6435 - [Python] Use pandas null coding consistently on List and Struct types
  • ARROW-6440 - [Packaging][deb] Follow plasma-store-server name change
  • ARROW-6441 - [Packaging][RPM] Follow plasma-store-server name change
  • ARROW-6442 - [CI][Crossbow] Nightly gandiva jar osx build fails
  • ARROW-6443 - [CI][Crossbow] Nightly conda osx builds fail
  • ARROW-6444 - [CI][Crossbow] Nightly conda Windows builds fail (time out)
  • ARROW-6446 - [OSX][Python][Wheel] Turn off ORC feature in the wheel building scripts
  • ARROW-6449 - [R] io "tell()" methods are inconsistently named and untested
  • ARROW-6457 - [C++] Always set CMAKE_BUILD_TYPE if it is not defined
  • ARROW-6461 - [Java] Prevent EchoServer from closing the client socket after writing
  • ARROW-6472 - [Java] ValueVector#accept may has potential cast exception
  • ARROW-6476 - [Java][CI] Fix java docker build script
  • ARROW-6478 - [C++] Revert to jemalloc stable-4 until we understand 5.2.x performance issues
  • ARROW-6481 - [C++] Avoid copying large ConvertOptions
  • ARROW-6488 - [Python] fix equality with pyarrow.NULL to return NULL
  • ARROW-6492 - [Python] Handle pandas_metadata created by fastparquet with missing field_name
  • ARROW-6502 - [GLib][CI] Pin gobject-introspection gem to 3.3.7
  • ARROW-6506 - [C++] Fix validation of ExtensionArray with struct storage type
  • ARROW-6509 - [C++][Gandiva] Re-enable Gandiva JNI tests and fix Travis CI failure
  • ARROW-6509 - [Java][CI] Upgrade maven-surefire-plugin to version 3.0.0-M3, disable Gandiva JNI unit tests temporarily
  • ARROW-6520 - [Python] More consistent handling of specified schema when creating Table
  • ARROW-6522 - [Python] Fix failing pandas tests on older pandas / older python
  • ARROW-6530 - [CI][Crossbow][R] Nightly R job doesn't install all dependencies
  • ARROW-6550 - [C++] Filter expressions PR failing manylinux package builds
  • ARROW-6551 - [Python] Dask Parquet integration test failure
  • ARROW-6552 - [C++] boost::optional in STL test fails compiling in gcc 4.8.2
  • ARROW-6560 - [Python] Fix nopandas integration tests
  • ARROW-6561 - [Python] Fix python tests to pass on pandas master
  • ARROW-6562 - [GLib] Fix returning wrong sliced data of GArrowBuffer
  • ARROW-6564 - [Python] Do not require pandas for invoking Array.array
  • ARROW-6565 - [Rust][DataFusion] Fix intermittent test failure
  • ARROW-6568 - [C++] ChunkedArray constructor needs type when chunks is empty
  • ARROW-6572 - [C++] Fix Parquet decoding returning uninitialized data
  • ARROW-6573 - [Python] Add test case to probe additional behavior in schema-data mismatch in Table.from_pydict
  • ARROW-6576 - [R] Fix sparklyr integration tests
  • ARROW-6586 - [Python][Packaging] Windows wheel builds failing with "DLL load failure"
  • ARROW-6597 - [Python] Sanitize Python datetime handling
  • ARROW-6618 - [Python] Fix read_message() segfault on end of stream
  • ARROW-6620 - [Python][CI] pandas-master build failing due to removal of "to_sparse" method
  • ARROW-6622 - [R] Normalize paths for filesystem API on Windows
  • ARROW-6623 - [CI][Python] Dask docker integration test broken perhaps by statistics-related change
  • ARROW-6639 - [Packaging][RPM] Add support for CentOS 7 on aarch64
  • ARROW-6640 - [C++] Do not reset buffer_pos_ in BufferedInputStream/OutputStream when enlarging buffer
  • ARROW-6641 - [C++] Remove Deprecated WriteableFile warning
  • ARROW-6642 - [Python] Link parent objects in Parquet's metadata and statistics objects
  • ARROW-6651 - Fix conda R job
  • ARROW-6652 - [Python] Fix ChunkedArray.to_pandas to retain timezone
  • ARROW-6652 - [Python] Fix Array.to_pandas to retain timezone
  • ARROW-6660 - [Rust][DataFusion] Minor docs update for 0.15.0 release
  • ARROW-6670 - [CI][R] Fix fixes for R nightly jobs
  • ARROW-6674 - [Python] Fix or ignore the test warnings
  • ARROW-6677 - [FlightRPC][C++] Document Flight in C++
  • ARROW-6678 - [C++][Parquet] Binary data stored in Parquet metadata must be base64-encoded to be UTF-8 compliant
  • ARROW-6679 - [RELEASE] Add license info for the autobrew scripts
  • ARROW-6682 - [C#] Ensure file footer block lengths are always 8 byte aligned.
  • ARROW-6687 - [Rust][DataFusion] Add regression tests for np.nan parquet file
  • ARROW-6687 - [Rust][DataFusion] Bug fix in DataFusion Parquet reader
  • ARROW-6701 - [C++][R] Lint failing on R cpp code
  • ARROW-6703 - [Packaging][Linux] Restore ARROW_VERSION environment variable
  • ARROW-6705 - [Rust][DataFusion] README has invalid github URL
  • ARROW-6709 - [JAVA] Jdbc adapter currentIndex should increment when va…
  • ARROW-6714 - [R] Fix untested RecordBatchWriter case
  • ARROW-6716 - [Rust] Bump nightly to nightly-2019-09-25 to fix CI
  • ARROW-6748 - [RUBY] gem compilation error
  • ARROW-6751 - [CI] ccache doesn't cache on Travis-CI
  • ARROW-6760 - [C++] JSON: improve error message when column changed type
  • ARROW-6773 - [C++] Filter kernel returns invalid data when filtering with an Array slice
  • ARROW-6796 - Certain moderately-sized (~100MB) default-Snappy-compressed Parquet files take enormous memory and long time to load by pyarrow.parquet.read_table
  • ARROW-7112 - Wrong contents when initializinga pyarrow.Table from boolean DataFrame
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files
  • PARQUET-1631 - [C++] ParquetInputWrapper::GetSize returns Tell
  • PARQUET-1640 - [C++] Fix crash in parquet-encoding-benchmark

Apache Arrow 0.14.1 (2019-07-22)

Bug Fixes

  • ARROW-5775 - [C++] Fix thread-unsafe cached data
  • ARROW-5790 - [Python] Raise error when trying to convert 0-dim array in pa.array
  • ARROW-5791 - [C++] Fix infinite loop with more the 32768 columns.
  • ARROW-5816 - [Release] Do not curl in background in verify-release-candidate.sh
  • ARROW-5836 - [Java][FlightRPC] Skip Flight domain socket test when path too long
  • ARROW-5838 - [C++] Delegate OPENSSL_ROOT_DIR to bundled gRPC
  • ARROW-5849 - [C++] Fix compiler warnings on mingw32
  • ARROW-5850 - [CI][R] R appveyor job is broken after release
  • ARROW-5851 - [C++] Fix compilation of reference benchmarks
  • ARROW-5856 - [Python][Packaging] Fix use of C++ / Cython API from wheels
  • ARROW-5863 - [Python] Use atexit module for extension type finalization to avoid segfault
  • ARROW-5868 - [Python] Correctly remove liblz4 shared libraries from manylinux2010 image so lz4 is statically linked
  • ARROW-5873 - [Python] Guard for passed None in Schema.equals
  • ARROW-5874 - [Python] Fix macOS wheels to depend on system or Homebrew OpenSSL
  • ARROW-5878 - [C++][Parquet] Restore pre-0.14.0 Parquet forward compatibility by adding option to unconditionally set TIMESTAMP_MICROS/TIMESTAMP_MILLIS ConvertedType
  • ARROW-5886 - [Python][Packaging] Manylinux1/2010 compliance issue with libz
  • ARROW-5887 - [C#] ArrowStreamWriter writes FieldNodes in wrong order
  • ARROW-5889 - [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType
  • ARROW-5899 - [Python][Packaging] Build and link uriparser statically in Windows wheel builds
  • ARROW-5921 - [C++] Fix multiple nullptr related crashes in IPC
  • PARQUET-1623 - [C++] Fix invalid memory access encountered when reading some parquet files

New Features and Improvements

  • ARROW-5101 - [Packaging] Avoid bundling static libraries in Windows conda packages
  • ARROW-5380 - [C++] Fix memory alignment UBSan errors.
  • ARROW-5564 - [C++] Use uriparser from conda-forge
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5784 - [Release][GLib] Replace c_glib/ after running c_glib/autogen.sh in dev/release/02-source.sh
  • ARROW-5785 - [Rust] Make the datafusion cli dependencies optional
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicated known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5827 - [C++] Require c-ares CMake config
  • ARROW-5828 - [C++] Add required Protocol Buffers versions check
  • ARROW-5866 - [C++] Remove duplicate library in cpp/Brewfile
  • ARROW-5877 - [FlightRPC] Fix Python<->Java auth issues
  • ARROW-5904 - [Java][Plasma] Fix compilation of Plasma Java client
  • ARROW-5908 - [C#] ArrowStreamWriter doesn't align buffers to 8 bytes
  • ARROW-5934 - [Python] Bundle arrow's LICENSE with the wheels
  • ARROW-5937 - [Release] Stop parallel binary upload
  • ARROW-5938 - [Release] Create branch for adding release note automatically
  • ARROW-5939 - [Release] Add support for generating vote email template separately
  • ARROW-5940 - [Release] Add support for re-uploading sign/checksum for binary artifacts
  • ARROW-5941 - [Release] Avoid re-uploading already uploaded binary artifacts
  • ARROW-5958 - [Python] Link zlib statically in the wheels

Apache Arrow 0.14.0 (2019-07-04)

New Features and Improvements

  • ARROW-258 - [Format] clarify definition of Buffer in context of RPC, IPC, File
  • ARROW-653 - [Python / C++] Add debugging function to print an array's buffer contents in hexadecimal
  • ARROW-767 - [C++] Filesystem abstraction
  • ARROW-835 - [Format][C++][Java] Create a new Duration type
  • ARROW-840 - [Python] Expose extension types
  • ARROW-973 - [Website] Add FAQ page
  • ARROW-1012 - [C++] Configurable batch size for parquet RecordBatchReader
  • ARROW-1207 - [C++] Implement MapArray, MapBuilder, MapType classes, and IPC support
  • ARROW-1261 - [Java] Add MapVector with reader and writer
  • ARROW-1278 - [Integration] Adding integration tests for fixed_size_list
  • ARROW-1279 - [Integration] Enable MapType integration tests
  • ARROW-1280 - [C++] add fixed size list type
  • ARROW-1349 - [Packaging] Provide APT and Yum repositories
  • ARROW-1496 - [JS] Upload coverage data to codecov.io
  • ARROW-1558 - [C++] Implement boolean filter (selection) kernel, rename comparison kernel-related functions
  • ARROW-1587 - [Format] Add metadata for user-defined logical types
  • ARROW-1774 - [C++] Add Array::View()
  • ARROW-1833 - [Java] Add accessor methods for data buffers that skip null checking
  • ARROW-1957 - [Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit
  • ARROW-1983 - [C++][Parquet] Add AppendRowGroups and WriteMetaDataFile methods
  • ARROW-2057 - [Python] Expose option to configure data page size threshold in parquet.write_table
  • ARROW-2102 - [C++] Implement Take kernel
  • ARROW-2103 - [C++] Implement take kernel functions - string/binary value type
  • ARROW-2104 - [C++] take kernel functions for nested types
  • ARROW-2105 - [C++] Implement take kernel functions - properly handle special indices
  • ARROW-2186 - [C++] Clean up architecture specific compiler flags
  • ARROW-2217 - [C++] Add option to use dynamic linking for compression library dependencies
  • ARROW-2298 - [Python] Add unit tests to assert that float64 with NaN values can be safely coerced to integer types when converting from pandas
  • ARROW-2412 - [Integration] Add nested dictionary test case, skipped for now
  • ARROW-2467 - [Rust] Add generated IPC code
  • ARROW-2517 - [Java] Add list<decimal> writer
  • ARROW-2618 - [Rust] Bitmap constructor should accept for flag for default state (0 or 1)
  • ARROW-2667 - [C++/Python] Add pandas-like take method to Array
  • ARROW-2707 - [C++] Add Table::Slice
  • ARROW-2709 - [Python] write_to_dataset poor performance when splitting
  • ARROW-2730 - [C++] Set up CMAKE_C_FLAGS more thoughtfully instead of using CMAKE_CXX_FLAGS
  • ARROW-2796 - [C++] Simplify version script used for linking
  • ARROW-2818 - [Python] Better error message when trying to convert sparse pandas data to arrow Table
  • ARROW-2835 - [C++] Make file position undefined after ReadAt()
  • ARROW-2969 - [R] Convert between StructArray and "nested" data.frame column containing data frame in each cell
  • ARROW-2981 - [C++] improve clang-tidy usability
  • ARROW-2984 - [JS] Refactor release verification script to share code with main source release verification script
  • ARROW-3040 - [Go] add support for comparing Arrays
  • ARROW-3041 - [Go] add support for TimeArray
  • ARROW-3052 - [C++] Detect Apache ORC C++ libraries in system/conda toolchain, add to conda requirements
  • ARROW-3087 - [C++] Implement Compare filter kernel
  • ARROW-3144 - [C++/Python] Move "dictionary" member from DictionaryType to ArrayData to allow for variable dictionaries
  • ARROW-3150 - [Python] Enable Flight in Python wheels for Linux and Windows
  • ARROW-3166 - [C++] Consolidate IO interfaces used in arrow/io and parquet-cpp
  • ARROW-3191 - [Java] Make ArrowBuf work with arbitrary underlying memory
  • ARROW-3200 - [C++] Support dictionaries in Flight streams
  • ARROW-3290 - [C++] Toolchain support for secure gRPC
  • ARROW-3294 - [C++][Flight] Support Flight on Windows
  • ARROW-3314 - [R] Set -rpath using pkg-config when building
  • ARROW-3330 - [C++] Spawn multiple Flight performance servers in flight-benchmark to test parallel get performance
  • ARROW-3419 - [C++] Run include-what-you-use checks as nightly build
  • ARROW-3459 - [C++][Gandiva] Add support for variable length output vectors
  • ARROW-3475 - [C++] Allow builders to finish to the corresponding array type
  • ARROW-3570 - [Packaging] Don't bundle test data files with python wheels
  • ARROW-3572 - [Crossbow] Raise more helpful exception if Crossbow queue has an SSH origin URL
  • ARROW-3671 - [Go] implement MonthInterval and DayTimeInterval
  • ARROW-3676 - [Go] implement Decimal128 array
  • ARROW-3679 - [Go] implement read/write IPC for Decimal128
  • ARROW-3680 - [Go] implement Float16 array
  • ARROW-3686 - [Python] support masked arrays in pa.array
  • ARROW-3702 - [R] POSIXct mapped to DateType not TimestampType?
  • ARROW-3714 - [CI] Run RAT checks in pre-commit hooks
  • ARROW-3729 - [C++][Parquet] Use logical annotations in Arrow Parquet reader/writer
  • ARROW-3732 - [R] Add functions to write RecordBatch or Schema to Message value, then read back
  • ARROW-3758 - [R] Build R library and dependencies on Windows in Appveyor CI
  • ARROW-3759 - [R][CI] Build and test (no libarrow) on Windows in Appveyor
  • ARROW-3767 - [C++] Add cast from null to any other type
  • ARROW-3780 - [R] : Failed to fetch data: invalid data when collecting int16
  • ARROW-3791 - [C++ / Python] Add boolean type inference to the CSV parser
  • ARROW-3794 - [R] : Consider mapping INT8 to integer() not raw()
  • ARROW-3804 - [R] Support older versions of R runtime
  • ARROW-3810 - [R] type= argument for Array and ChunkedArray
  • ARROW-3811 - [R] : Support inferring data.frame column as StructArray in array constructors
  • ARROW-3814 - [R] RecordBatch$from_arrays()
  • ARROW-3815 - [R] : refine record batch factory
  • ARROW-3848 - [R] allow nbytes to be missing in RandomAccessFile$Read()
  • ARROW-3897 - [MATLAB] Add MATLAB support for writing numeric datatypes to a Feather file
  • ARROW-3904 - [C++/Python] Validate scale and precision of decimal128 type
  • ARROW-4013 - [Docs][C++] Add how to build on MSYS2
  • ARROW-4020 - [Release] Add a post release script to remove RC
  • ARROW-4047 - [Python] Document use of int96 timestamps and options in Parquet docs
  • ARROW-4086 - [Java] Add apis to debug memory alloc failures
  • ARROW-4121 - [C++] Refactor memory allocation from InvertKernel
  • ARROW-4159 - [C++] Build with -Wdocumentation when using clang and BUILD_WARNING_LEVEL=CHECKIN
  • ARROW-4194 - [Format][Docs] Remove duplicated / out-of-date logical type information from documentation
  • ARROW-4302 - [C++] Add OpenSSL to C++ build toolchain (#4384)
  • ARROW-4337 - [C#] Implemented Fluent API for building arrays and record batches
  • ARROW-4343 - [C++] Add docker-compose test for gcc 4.8 / Ubuntu 14.04 (Trusty), expand Xenial/16.04 Dockerfile to test Flight
  • ARROW-4356 - [CI] Add integration (docker) test for turbodbc
  • ARROW-4369 - [Packaging] Release verification script should test linux packages via docker
  • ARROW-4452 - [Python] Serialize sparse torch tensors
  • ARROW-4453 - [Python] Create Cython wrappers for SparseTensor
  • ARROW-4467 - [Rust][DataFusion] Create a REPL & Dockerfile for DataFusion
  • ARROW-4503 - [C#] Eliminate allocations in ArrowStreamReader when reading from a Stream
  • ARROW-4504 - [C++] Reduce number of C++ unit test executables from 128 to 82
  • ARROW-4505 - [C++] adding pretty print for dates, times, and timestamps
  • ARROW-4566 - [Flight] Add option to run Flight benchmark against separate server
  • ARROW-4596 - [Rust][DataFusion] Implement COUNT
  • ARROW-4622 - [C++][Python] MakeDense and MakeSparse in UnionArray should accept a vector of Field
  • ARROW-4625 - [Flight][Java] Add method to await Flight server termination in Java
  • ARROW-4626 - [Flight] Add application-defined metadata to DoGet/DoPut
  • ARROW-4627 - [Flight] Add application metadata field to DoPut
  • ARROW-4701 - [C++] Add JSON chunker benchmarks
  • ARROW-4702 - [C++] Update dependency versions
  • ARROW-4708 - [C++] add multithreaded json reader
  • ARROW-4708 - [C++] refactoring JSON parser to prepare for multithreaded impl
  • ARROW-4714 - [C++][JAVA] Providing JNI interface to Read ORC file via Arrow C++
  • ARROW-4717 - [C#] Consider exposing ValueTask instead of Task
  • ARROW-4719 - [C#] Implement ChunkedArray, Column and Table in C#
  • ARROW-4741 - [Java] Add missing type javadoc and enable checkstyle
  • ARROW-4787 - [C++] Add support for Null in MemoTable and related kernels
  • ARROW-4788 - [C++] Less verbose API for constructing StructArray
  • ARROW-4800 - [C++] Introduce a Result<T> class
  • ARROW-4805 - [Rust] Write temporal arrays to CSV
  • ARROW-4806 - [Rust] Temporal array casts
  • ARROW-4824 - [Python] Fix error checking in read_csv()
  • ARROW-4827 - [C++] Implement benchmark comparison
  • ARROW-4847 - [Python] Add pyarrow.table factory function
  • ARROW-4904 - [C++] Move implementations in arrow/ipc/test-common.h into libarrow_testing
  • ARROW-4911 - [R] Progress towards completing windows support
  • ARROW-4912 - [C++] add method for easy renaming of a Table's columns
  • ARROW-4913 - [Java][Memory] Add additional methods for observing allocations.
  • ARROW-4945 - [Flight] Enable integration tests in Travis
  • ARROW-4956 - [C#] Allow ArrowBuffers to wrap external Memory
  • ARROW-4959 - [C++][Gandiva][Crossbow] Gandiva crossbow packaging changes.
  • ARROW-4968 - [Rust] Assert that struct array field types match data in…
  • ARROW-4971 - [Go] Add type equality test function
  • ARROW-4972 - [Go] implement ArrayEquals
  • ARROW-4973 - [Go] implement ArraySliceEqual
  • ARROW-4974 - [Go] implement ArrayApproxEqual
  • ARROW-4990 - [C++] Support Array-Array comparison
  • ARROW-4993 - [C++] Add simple build configuration summary
  • ARROW-5000 - [Python] Fix 'SO' DeprecationWarning in setup.py
  • ARROW-5007 - [C++] Remove DCHECK in intrinsic headers
  • ARROW-5020 - [CI] Split Gandiva-related packages into separate .yml file
  • ARROW-5027 - [Python] Python bindings for JSON reader
  • ARROW-5037 - [Rust] [DataFusion] Refactor aggregate module
  • ARROW-5038 - [Rust][DataFusion] Implement AVG aggregate function
  • ARROW-5039 - [Rust][DataFusion] Re-implement CAST support
  • ARROW-5040 - [C++] ArrayFromJSON can't parse Timestamp from strings
  • ARROW-5045 - [Rust] Code coverage silently failing in CI
  • ARROW-5053 - [Rust][DataFusion] Use ARROW_TEST_DATA env var
  • ARROW-5054 - [Release][Flight] Test Flight in Linux/macOS release verification scripts
  • ARROW-5056 - [Packaging] Adjust conda recipes to use ORC conda-forge package on unix systems
  • ARROW-5061 - [Release] Improve 03-binary performance
  • ARROW-5062 - [Java][FlightRPC] Shade com.google.guava usage in Flight
  • ARROW-5063 - [FlightRPC][Java] Test that Flight client connections are independent
  • ARROW-5064 - [Release] Pass PKG_CONFIG_PATH to glib in the verification script
  • ARROW-5066 - [Integration] Add flags to enable/disable implementations in integration/integration_test.py
  • ARROW-5071 - [Archery] Implement running benchmark suite
  • ARROW-5076 - [Release] Improve post binary upload performance
  • ARROW-5077 - [Rust] Change Cargo.toml to use release versions
  • ARROW-5078 - [Documentation] Sphinx is failed by RemovedInSphinx30Warning
  • ARROW-5079 - [Release] Add a script that releases C# package
  • ARROW-5080 - [Release] Add a script that releases Rust packages
  • ARROW-5081 - [C++] Use PATH_SUFFIXES when searching for dependencies
  • ARROW-5083 - [Developer] PR merge script improvements: set already-released Fix Version, display warning when no components set
  • ARROW-5088 - [C++] Only add -Werror in debug builds. Add C++ documentation about compiler warning levels
  • ARROW-5091 - [Flight] Rename FlightGetInfo message to FlightInfo
  • ARROW-5093 - [Packaging] Add support for selective binary upload
  • ARROW-5094 - [Packaging] Add APT/Yum verification scripts
  • ARROW-5102 - [C++] Reduce header dependencies
  • ARROW-5108 - [Go] implement reading primitive arrays from Arrow file
  • ARROW-5109 - [Go] implement reading binary/string arrays from Arrow file
  • ARROW-5110 - [Go] implement reading struct arrays from Arrow file
  • ARROW-5111 - [Go] implement reading list arrays from Arrow file
  • ARROW-5112 - [Go] implement writing IPC Arrow stream/file
  • ARROW-5113 - [C++] Fix DoPut with dictionary arrays, add tests
  • ARROW-5115 - [JS] Add Vector Builders and high-level stream primitives
  • ARROW-5116 - [Rust] move kernel related files under compute/kernels
  • ARROW-5124 - [C++] Add support for Parquet in MinGW build
  • ARROW-5126 - [Rust][Parquet] Convert parquet column desc to arrow data type
  • ARROW-5127 - [Rust][Parquet] Add page iterator.
  • ARROW-5136 - [Flight] Call options
  • ARROW-5137 - [Flight] Implement auth API
  • ARROW-5145 - [C++] More input validation in release mode
  • ARROW-5150 - [Ruby] Add Arrow::Table#raw_records
  • ARROW-5155 - [GLib][Ruby] Add support for building union arrays from data type
  • ARROW-5157 - [Website] Add MATLAB to powered by Apache Arrow website
  • ARROW-5162 - [Rust][Parquet] Rename mod reader to arrow.
  • ARROW-5163 - [Gandiva] Cast timestamp/date are incorrectly evaluating year 0097 to 1997
  • ARROW-5164 - [Gandiva][C++] Introduce murmur32 for 32 bit types.
  • ARROW-5165 - [Python] update dev installation docs for --build-type + validate in setup.py
  • ARROW-5168 - [GLib] Add garrow_array_take()
  • ARROW-5171 - [C++] Use LESS instead of LOWER in compare enum
  • ARROW-5172 - [Go] implement reading fixed-size binary arrays from Arrow file
  • ARROW-5178 - [Python] Add Table.from_pydict()
  • ARROW-5179 - [Python] Return plain dicts, not OrderedDict, on Python 3.7+
  • ARROW-5185 - [C++] Add support for Boost with CMake configuration file
  • ARROW-5187 - [Rust] Add ability to convert StructArray to RecordBatch
  • ARROW-5188 - [Rust] Add temporal types to struct builders
  • ARROW-5189 - [Rust][Parquet] Format / display individual fields within a parquet row
  • ARROW-5190 - [R] : Discussion: tibble dependency in R package
  • ARROW-5191 - [Rust] Expose CSV and JSON reader schemas
  • ARROW-5203 - [GLib] Add support for Compare filter
  • ARROW-5204 - [C++] Improve builder performance
  • ARROW-5212 - [Go] Support reserve for the data buffer in the BinaryBuilder
  • ARROW-5218 - [C++] Improve build when third-party library locations are specified
  • ARROW-5219 - [C++] Build protobuf_ep in parallel when using Ninja build
  • ARROW-5222 - [Python] Revise pyarrow installation instructions for macOS
  • ARROW-5225 - [Java] Improve performance of BaseValueVector#getValidityBufferSizeFromCount
  • ARROW-5226 - [Gandiva] Add cmp functions for decimals
  • ARROW-5238 - [Python] Convert arguments to pyarrow.dictionary
  • ARROW-5241 - [Python] expose option to disable writing statistics to parquet file
  • ARROW-5250 - [Java] Add javadoc comments to public methods, remove style check suppression.
  • ARROW-5252 - [C++] Use standard-compliant std::variant backport
  • ARROW-5256 - [C++] Add support for LLVM 7.1
  • ARROW-5257 - [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo
  • ARROW-5258 - [C++/Python] Collect file metadata of dataset pieces
  • ARROW-5261 - [C++] Add missing scalar defintions for Intervals
  • ARROW-5262 - [Python] Fix typo
  • ARROW-5264 - [Java] Allow enabling/disabling boundary checking by environmental variable
  • ARROW-5266 - [Go] implement read/write IPC for Float16
  • ARROW-5268 - [GLib] Add GArrowJSONReader
  • ARROW-5269 - [C++][Archery] Mark relevant benchmarks as regression
  • ARROW-5275 - [C++] Generic filesystem tests
  • ARROW-5281 - [Rust] Extract DataPageBuilder to test common
  • ARROW-5284 - [Rust] Replace libc with std::alloc for memory allocation
  • ARROW-5286 - [Python] support struct type in from_pandas
  • ARROW-5288 - [Documentation] Enhance the contribution guidelines page
  • ARROW-5289 - [C++] Move arrow/util/concatenate* to arrow/array
  • ARROW-5290 - [Java] Provide a flag to enable/disable null-checking in vector's get methods
  • ARROW-5291 - [Python] Add wrapper for take kernel on Array
  • ARROW-5298 - [Rust] Add debug implementation for buffer data.
  • ARROW-5299 - [C++] ListArray comparison is incorrect
  • ARROW-5309 - [Python] clarify that Schema.append returns new object
  • ARROW-5311 - [C++] use more specific error status types in take
  • ARROW-5313 - [Format] Comments on Field table are a bit confusing
  • ARROW-5317 - [Rust][Parquet] impl IntoIterator for SerializedFileReader
  • ARROW-5319 - [C++][CI][travis skip]
  • ARROW-5321 - [Gandiva][C++] add isnull impl for string types
  • ARROW-5323 - [CI][skip travis]
  • ARROW-5328 - [R] Add shell scripts to do a full package rebuild and test locally
  • ARROW-5329 - [MATLAB] Add support for building MATLAB interface to Feather directly within MATLAB
  • ARROW-5334 - [C++] Ensure all type classes end with "Type"
  • ARROW-5335 - [Python] Raise exception on variable dictionaries in conversion to Python/pandas
  • ARROW-5339 - [C++] Add jemalloc URL to thirdparty/versions.txt so download_dependencies.sh gets it
  • ARROW-5341 - [C++][Documentation] developers/cpp.rst should mention documentation warnings
  • ARROW-5342 - [Format] Formalize "extension types" in Arrow protocol metadata
  • ARROW-5346 - [C++] Revert changed to vendored datetime library
  • ARROW-5349 - [C++][Parquet] Add method to set file path in a parquet::FileMetaData instance
  • ARROW-5361 - [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144
  • ARROW-5363 - [GLib] Fix coding styles
  • ARROW-5364 - [C++] Use ASCII rather than UTF-8 in BuildUtils.cmake comment
  • ARROW-5365 - [C++][CI] Enable ASAN/UBSAN in CI
  • ARROW-5368 - [C++] Disable jemalloc by default with MinGW
  • ARROW-5369 - [C++] Add support for glog on Windows
  • ARROW-5370 - [C++] Use system uriparser if available
  • ARROW-5372 - [GLib] Add support for null/boolean values CSV read option
  • ARROW-5378 - [C++] Local filesystem implementation
  • ARROW-5384 - [Go] implement FixedSizeList array
  • ARROW-5389 - [C++] Add Temporary Directory facility
  • ARROW-5392 - [C++][CI] Disable static build with MinGW on AppVeyor
  • ARROW-5393 - [R] Add tests and example for read_parquet()
  • ARROW-5395 - [C++] Utilize stream EOS in File format
  • ARROW-5396 - [JS] Support files and streams with no record batches
  • ARROW-5401 - [CI][skip appveyor]
  • ARROW-5404 - [C++] force usage of nonstd::sv_lite::string_view instead of std::string_view
  • ARROW-5407 - [C++] Allow building only integration test targets
  • ARROW-5413 - [C++] Skip UTF8 BOM in CSV files
  • ARROW-5415 - [Release] Release script should update R version everywhere
  • ARROW-5416 - [Website] Add Homebrew to project installation page
  • ARROW-5418 - [CI][R] Run code coverage and report to codecov.io
  • ARROW-5420 - [Java] Implement or remove getCurrentSizeInBytes in Variab…
  • ARROW-5427 - [Python] pandas conversion preserve_index=True to force RangeIndex serialization
  • ARROW-5428 - [C++] Add option to set "read extent" in arrow::io::BufferedInputStream
  • ARROW-5429 - [Java] Provide alternative buffer allocation policy
  • ARROW-5432 - [Python] Add NativeFile.read_at()
  • ARROW-5433 - [C++][Parquet] Improve parquet-reader columns information, strip trailing whitespace from test case
  • ARROW-5434 - [Memory][Java] Introduce wrappers for backward compatibility.
  • ARROW-5436 - [Python] parquet.read_table add filters keyword
  • ARROW-5438 - [JS] EOS bytes for sequential readers
  • ARROW-5441 - [C++] Implement FindArrowFlight.cmake
  • ARROW-5442 - [Website] Clarify what makes a release artifact "official"
  • ARROW-5443 - [Crossbow] Turn parquet build off for Gandiva.
  • ARROW-5447 - [Ruby] Ensure flushing test gz file
  • ARROW-5449 - [C++] Test extended-length paths on Windows
  • ARROW-5451 - [C++][Gandiva] Support cast/round functions for decimal
  • ARROW-5452 - [R] Add API documentation website (pkgdown)
  • ARROW-5461 - [Java] Add micro-benchmarks for Float8Vector and allocators
  • ARROW-5463 - [Rust] Add AsRef trait for Buffer.
  • ARROW-5464 - [Archery] Fix default diff --benchmark-filter
  • ARROW-5465 - [Crossbow] Support writing submitted job definition yaml to a file
  • ARROW-5466 - [Java] Dockerize Java builds in Travis CI, run multiple JDKs in single entry
  • ARROW-5467 - [Go] implement read/write IPC for Time32/64 arrays
  • ARROW-5468 - [Go] implement read/write IPC for Timestamp arrays
  • ARROW-5469 - [Go] implement read/write IPC for Date32/64 arrays
  • ARROW-5470 - [CI] Fix Travis-CI R job that broke with the local fs patch
  • ARROW-5472 - [Development] Add warning to PR merge tool if no JIRA component is set
  • ARROW-5474 - [C++] Document Boost 1.58 as minimum supported version, add docker-compose entry for it, fix broken cpp/Dockerfile* builds
  • ARROW-5475 - [Python] Add Python binding for arrow::Concatenate
  • ARROW-5476 - [Java][Memory] Fix Netty Arrow Buf.
  • ARROW-5477 - [C++] Check required RapidJSON version
  • ARROW-5478 - [Packaging] Drop Ubuntu 14.04 support
  • ARROW-5481 - [GLib] Add "error" parameter document
  • ARROW-5485 - [C++] Install libraries from googletest_ep into build output directory on non-Windows platforms.
  • ARROW-5485 - [Crossbow] Disable unit tests in Gandiva macOS crossbow job until underlying issue resolved
  • ARROW-5486 - [GLib] Add binding of gandiva::FunctionRegistry and related things
  • ARROW-5488 - [R] Workaround when C++ lib not available
  • ARROW-5490 - [C++] Remove ARROW_BOOST_HEADER_ONLY
  • ARROW-5491 - [C++] Remove unecessary semicolons following MACRO definitions
  • ARROW-5492 - [R] Add "col_select" argument to read_* functions to read subset of columns
  • ARROW-5495 - [C++] Update some dependency URLs from http to https
  • ARROW-5496 - [R][CI] Fix relative paths in R codecov.io reporting
  • ARROW-5498 - [C++][CI] Fix Flatbuffers related error with MinGW
  • ARROW-5499 - [R] Alternate bindings for when libarrow is not found
  • ARROW-5500 - [R] read_csv_arrow() signature should match readr::read_csv()
  • ARROW-5503 - [R] : add read_json()
  • ARROW-5504 - [R] : move use_threads argument to global option
  • ARROW-5509 - [R] Add basic write_parquet
  • ARROW-5511 - [Packaging] Enable Flight in Conda packages
  • ARROW-5512 - [C++] Rough API skeleton for C++ Datasets API / framework
  • ARROW-5513 - [Java] Refactor method name for getstartOffset to use camel case
  • ARROW-5516 - [Python][Documentation] Development page for pyarrow has a missing dependency in using pip
  • ARROW-5518 - [Java] Set VectorSchemaRoot rowCount to 0 on allocateNew and clear
  • ARROW-5524 - [C++] Turn off PARQUET_BUILD_ENCRYPTION in CMake if OpenSSL not found (#4494)
  • ARROW-5526 - [GitHub] Add more prominent notice to ISSUE_TEMPLATE.md to direct bug reports to JIRA
  • ARROW-5529 - [Flight] Allow serving with multiple TLS certificates
  • ARROW-5531 - [Python] Implement Array.from_buffers for varbinary and nested types, add DataType.num_buffers property
  • ARROW-5533 - [C++][Plasma] make plasma client thread safe
  • ARROW-5534 - [GLib] Add garrow_table_concatenate()
  • ARROW-5535 - [GLib] Add garrow_table_slice()
  • ARROW-5537 - [JS] Support delta dictionaries in RecordBatchWriter and DictionaryBuilder
  • ARROW-5538 - [C++] Restrict minimum OpenSSL version to 1.0.2
  • ARROW-5541 - [R] : cast from negative int32 to uint32 and uint64 are now safe
  • ARROW-5544 - [Archery] Don't return non-zero on regressions
  • ARROW-5545 - [C++][Docs] Clarify expectation of UTC values for timestamps with time zones
  • ARROW-5547 - [C++][FlightRPC] Support pkg-config for Arrow Flight
  • ARROW-5552 - [Go] make Schema, Field and simpleRecord implement Stringer
  • ARROW-5554 - [Python] Added a python wrapper for arrow::Concatenate()
  • ARROW-5555 - [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries
  • ARROW-5556 - [Doc][Python] Document JSON reader
  • ARROW-5557 - [C++] Add VisitBits benchmark
  • ARROW-5565 - [Python][Docs] Add instructions how to use gdb to debug C++ libraries when running Python unit tests
  • ARROW-5567 - [C++] Fix build error of memory-benchmark
  • ARROW-5571 - [R] Rework handing of ARROW_R_WITH_PARQUET
  • ARROW-5574 - [R] documentation error for read_arrow()
  • ARROW-5581 - [Java] Provide interfaces and initial implementations for vector sorting
  • ARROW-5582 - [Go] implement RecordEqual
  • ARROW-5586 - [R] convert Array of LIST type to R lists
  • ARROW-5587 - [Java] Add more style check rule for Java code
  • ARROW-5590 - [R] Run "no libarrow" R build in the same CI entry if possible
  • ARROW-5591 - [Go] implement read/write IPC for Duration & Intervals
  • ARROW-5597 - [Packaging] Add Flight deb packages
  • ARROW-5600 - [R] R package namespace cleanup
  • ARROW-5602 - [Java][Gandiva] Add tests for round/cast
  • ARROW-5604 - [Go] improve coverage of TypeTraits
  • ARROW-5609 - [C++] Set CMP0068 CMake policy to avoid macOS warnings
  • ARROW-5612 - [Python][Doc] Add prominent note that date_as_object option changed with Arrow 0.13
  • ARROW-5621 - [Go] implement read/write IPC for Decimal128 arrays
  • ARROW-5622 - [C++][Dataset] Support pkg-config for Arrow Datasets
  • ARROW-5625 - [R] convert Array of struct type to data frame columns
  • ARROW-5632 - [Doc] Basic instructions for using Xcode with Arrow
  • ARROW-5633 - [Python] Enable bz2 in Linux wheels
  • ARROW-5635 - [C++] Added a Compact() method to Table.
  • ARROW-5637 - [Java][C++][Gandiva] Complete In Expression Support
  • ARROW-5639 - [Java] Remove floating point computation from getOffsetBufferValueCapacity
  • ARROW-5641 - [GLib] Remove enums files generated by GNU Autotools from Git targets
  • ARROW-5643 - [FlightRPC] Add ability to override SSL hostname checking
  • ARROW-5650 - [Python] Update manylinux dependency versions
  • ARROW-5652 - [CI] Fix lint docker image
  • ARROW-5653 - [CI] Fix cpp docker image
  • ARROW-5656 - [Python][Packaging] Fix macOS wheel builds, add Flight support
  • ARROW-5659 - [C++] Add support for finding OpenSSL installed by Homebrew
  • ARROW-5660 - [GLib][CI] Use Xcode 10.2
  • ARROW-5661 - [Gandiva][C++] support hash functions for decimals in gandiva
  • ARROW-5662 - [C++] Add support for BOOST_SOURCE=AUTO|BUNDLED|SYSTEM
  • ARROW-5663 - [Packaging][RPM] Update CentOS packages for 0.14.0
  • ARROW-5664 - [Crossbow] Execute nightly crossbow tests on CircleCI instead of Travis
  • ARROW-5668 - [C++/Python] Include 'not null' in schema fields pretty print
  • ARROW-5669 - [Python][Packaging] Add ARROW_TEST_DATA env variable to Crossbow Linux Wheel build
  • ARROW-5670 - [Crossbow] get_apache_mirror.py fails with TLS error on macOS with Python 3.5
  • ARROW-5671 - [crossbow] mac os python wheels failing
  • ARROW-5672 - [Java] Refactor redundant method modifier
  • ARROW-5683 - [R] Add snappy to Rtools Windows builds
  • ARROW-5684 - [Packaging][deb] Add support for Ubuntu 19.04
  • ARROW-5685 - [Packaging][deb] Add support for Apache Arrow Datasets
  • ARROW-5687 - [C++] Remove remaining uses of ARROW_BOOST_VENDORED
  • ARROW-5690 - [Packaging][Python] Fix macOS wheel building
  • ARROW-5694 - [Python] Support list of Decimals in conversion to pandas
  • ARROW-5695 - [C#][Release] Run sourcelink test in verify-release-candidate.sh
  • ARROW-5696 - [C++][Gandiva] Introduce castVarcharVarchar
  • ARROW-5699 - [C++] Optimize decimal128 parsing
  • ARROW-5701 - [C++][Gandiva] Build expr with specific sv
  • ARROW-5702 - [C++] parquet::arrow::FileReader::GetSchema()
  • ARROW-5704 - [C++] Stop using ARROW_TEMPLATE_EXPORT for SparseTensorImpl
  • ARROW-5705 - [Java] Optimize BaseValueVector#computeCombinedBufferSize logic
  • ARROW-5706 - [Java] Remove type conversion in getValidityBufferValueCapacity
  • ARROW-5707 - [Java] Improve the performance and code structure for ArrowRecordBatch
  • ARROW-5710 - [C++] Allow compiling Gandiva with Ninja on Windows
  • ARROW-5715 - [Release] Verify Ubuntu 19.04 APT repository
  • ARROW-5718 - [R] auto splice data frames in record_batch() and table()
  • ARROW-5720 - [C++] Create benchmarks for decimal related classes.
  • ARROW-5721 - [Rust] Move array related code into a separate module
  • ARROW-5724 - [R][CI] AppVeyor build should use ccache
  • ARROW-5725 - [Crossbow] Port conda recipes to azure pipelines
  • ARROW-5726 - [Java] Implement a common interface for int vectors
  • ARROW-5727 - [Python][CI] Install pytest-faulthandler before running tests
  • ARROW-5748 - [Packaging][deb] Add support for Debian GNU/Linux buster
  • ARROW-5749 - [Python] Added python binding for Table::CombineChunks
  • ARROW-5751 - [Python][Packaging] Ensure that c-ares is linked statically in Python wheels
  • ARROW-5752 - [Java] Improve the performance of ArrowBuf#setZero
  • ARROW-5755 - [Rust][Parquet] Derive clone for Type.
  • ARROW-5768 - [Release] Remove needless empty lines at the end of CHANGELOG.md
  • ARROW-5773 - [R] Clean up documentation before release
  • ARROW-5780 - [C++] Add benchmark for Decimal operations
  • ARROW-5782 - [Release] Setup test data for Flight in dev/release/01-perform.sh
  • ARROW-5783 - [Release][C#] Exclude dummy.git from RAT check
  • ARROW-5785 - [Rust] Rust datafusion implementation should not depend on rustyline
  • ARROW-5787 - [Release][Rust] Use local modules to verify RC
  • ARROW-5793 - [Release] Avoid duplicate known host SSH error in dev/release/03-binary.sh
  • ARROW-5794 - [Release] Skip uploading already uploaded binaries
  • ARROW-5795 - [Release] Add missing waits on uploading binaries
  • ARROW-5796 - [Release][APT] Update expected package list
  • ARROW-5797 - [Release][APT] Update supported distributions
  • ARROW-5818 - [Java][Gandiva] support varlen output vectors
  • ARROW-5820 - [Release] Remove undefined variable check from verify script
  • ARROW-5826 - [Website] Blog post for 0.14.0 release announcement
  • PARQUET-1243 - [C++] Throw more informative exception when reading a length-0 Parquet file
  • PARQUET-1411 - [C++] Add parameterized logical annotations to Parquet metadata
  • PARQUET-1422 - [C++] Use common Arrow IO interfaces throughout codebase
  • PARQUET-1517 - [C++] Crypto package updates to match the final spec
  • PARQUET-1523 - [C++] Vectorize Comparator interface, remove virtual calls on inner loop. Refactor Statistics to not require PARQUET_EXTERN_TEMPLATE
  • PARQUET-1569 - [C++] Consolidate shared unit testing header files
  • PARQUET-1582 - [C++] Add ToString method to ColumnDescriptor
  • PARQUET-1583 - [C++] Remove superfluous parquet::Vector class
  • PARQUET-1586 - [C++] Add --dump options to parquet-reader tool to dump def/rep levels
  • PARQUET-1603 - [C++] rename parquet::LogicalType to parquet::ConvertedType

Bug Fixes

  • ARROW-61 - [Java] Method can return the value bigger than long MAX_VALUE
  • ARROW-352 - [Format] Interval(DAY_TIME) has no unit
  • ARROW-1837 - [Java][Integration] Fix unsigned round trip integration tests
  • ARROW-2119 - [IntegrationTest] Add test case with a stream having no record batches
  • ARROW-2136 - [Python] Check null counts for non-nullable fields when converting from pandas.DataFrame with supplied schema
  • ARROW-2256 - [C++] Fix libfuzzer builds for clang-7
  • ARROW-2461 - [Python] Build manylinux2010 wheels
  • ARROW-2590 - [Python] Pyspark python_udf serialization error on grouped map (Amazon EMR)
  • ARROW-3344 - [Python] Disable flaky Plasma test
  • ARROW-3399 - [Python] Implementing numpy matrix serialization
  • ARROW-3650 - [Python] warn on converting DataFrame with mixed type column names
  • ARROW-3801 - [Python] Pandas-Arrow roundtrip makes pd categorical index not writeable
  • ARROW-4021 - [Ruby] Error building red-arrow on msys2
  • ARROW-4076 - [Python] Validate ParquetDataset schema after filtering
  • ARROW-4139 - [Python][Parquet] Wrap new parquet::LogicalType, cast min/max statistics based on LogicalType
  • ARROW-4301 - [Java] use arrow-jni profile for both gandiva/orc
  • ARROW-4301 - [Java][Gandiva] Update version manually
  • ARROW-4324 - [Python] Triage broken type inference logic in presence of a mix of NumPy dtype-having objects and other scalar values
  • ARROW-4350 - [Python] Fix conversion from Python to Arrow with nested lists and NumPy dtype=object items
  • ARROW-4433 - [R] Segmentation fault when instantiating arrow::table from data frame
  • ARROW-4447 - [C++] Investigate dynamic linking for libthift
  • ARROW-4516 - [Python] Error while creating a ParquetDataset on a path without `_common_dataset` but with an empty `_tempfile`
  • ARROW-4523 - [JS] Add row proxy generation benchmark
  • ARROW-4651 - [Flight] Use URIs instead of host/port pair
  • ARROW-4665 - [C++] With glog activated, DCHECK macros are redefined
  • ARROW-4675 - [Python] Fix pyarrow.deserialize failure when reading payload in Python 3 payload generated in Python 2
  • ARROW-4694 - [CI] Improve detect-changes.py on Travis PRs
  • ARROW-4723 - [Python] Ignore "hidden" files that starts with underscore
  • ARROW-4725 - [C++] Enable dictionary builder tests with MinGW build
  • ARROW-4823 - [C++][Python] Do not close raw file handle in ReadaheadSpooler, check that file handles passed to read_csv are not closed
  • ARROW-4832 - [Python] pandas Index metadata for RangeIndex is incorrect
  • ARROW-4845 - [R] Compiler warnings on Windows MingW64
  • ARROW-4851 - [Java] BoundsChecking.java defaulting behavior for old drill parameter seems off
  • ARROW-4877 - [Plasma] CI failure in test_plasma_list
  • ARROW-4884 - [C++] conda-forge thrift-cpp package not available via pkg-config or cmake
  • ARROW-4885 - [C++/Python] Enable Decimal parsing in CSV
  • ARROW-4886 - [Rust] Cast to list with offset
  • ARROW-4923 - [Java] Add methods to set long value at given index in DecimalVector
  • ARROW-4934 - [Python] Address deprecation notice that will be a bug in Python 3.8
  • ARROW-5019 - [C#] ArrowStreamWriter doesn't work on a non-seekable stream
  • ARROW-5049 - [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark
  • ARROW-5051 - [GLib][Gandiva] Don't return temporary memory
  • ARROW-5055 - [Ruby][MSYS2] libparquet needs to be installed in MSYS2 for ruby
  • ARROW-5058 - [Release] Fix typos in vote e-mail template
  • ARROW-5059 - [C++][Gandiva] cbrt_* floating point tests can fail due to exact comparisons
  • ARROW-5065 - [Rust] cast kernel does not support casting from Int64
  • ARROW-5068 - [Gandiva][Packaging] Fix gandiva nightly builds after the CMake refactor
  • ARROW-5090 - Parquet linking fails on MacOS due to @rpath in dylib
  • ARROW-5092 - [C#] Create a dummy .git directory to download the source files from GitHub with Source Link
  • ARROW-5095 - [Flight][C++] Expose server error message in DoGet
  • ARROW-5096 - [Packaging][deb] Add missing plasma-store-server packages
  • ARROW-5097 - [Packaging][CentOS6] Remove needless dependencies
  • ARROW-5098 - [Website] Update how to install .deb by APT
  • ARROW-5100 - [JS] Remove swap while collapsing contiguous buffers
  • ARROW-5117 - [Go] fix panic when nil or empty slices are appended to builders
  • ARROW-5119 - [Go] fix Boolean stringer implementation
  • ARROW-5122 - [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory
  • ARROW-5128 - [Packaging][CentOS][Conda] Numpy not found in nightly builds
  • ARROW-5129 - [Rust] Column writer bug: check dictionary encoder when adding a new data page
  • ARROW-5130 - [C++][Python] Limit exporting of std::* symbols
  • ARROW-5132 - [Java] Errors on building gandiva_jni.dll on Windows with Visual Studio 2017
  • ARROW-5138 - [Python] Add documentation about pandas preserve_index option
  • ARROW-5140 - [Bug?][Parquet] Can write a jagged array column of strings to disk, but hit `ArrowNotImplementedError` on read
  • ARROW-5142 - , ARROW-5732, ARROW-5735: [CI] Emergency fixes
  • ARROW-5144 - [Python] ParquetDataset and ParquetPiece not serializable
  • ARROW-5146 - [Dev] Fix project name inference in merge script
  • ARROW-5147 - [C++] Add missing dependencies to Brewfile
  • ARROW-5148 - [Gandiva] Allow linking with RTTI-disabled LLVM builds
  • ARROW-5149 - [Packaging][Wheel] Pin LLVM to version 7 in windows builds
  • ARROW-5152 - [Python] Fix CMake warnings
  • ARROW-5159 - [Rust] Unable to build benches in arrow crate.
  • ARROW-5160 - [C++] Don't evaluate expression twice in ABORT_NOT_OK
  • ARROW-5166 - [Python][Parquet] Statistics for uint64 columns may overflow
  • ARROW-5167 - [C++] Upgrade string-view-light to latest
  • ARROW-5169 - [Python] preserve field nullability of specified schema in Table.from_pandas
  • ARROW-5173 - [Go] handle multiple concatenated record batches
  • ARROW-5174 - [Go] implement Stringer for DataTypes
  • ARROW-5177 - [C++/Python] Check column index when reading Parquet column
  • ARROW-5183 - [CI] Fix AppVeyor failure
  • ARROW-5184 - [Rust] Broken links and other documentation warnings
  • ARROW-5186 - [Plasma] Fix crash caused by improper free on CUDA memory
  • ARROW-5194 - [C++][Plasma] TEST(PlasmaSerialization, GetReply) is failing
  • ARROW-5195 - [C++] Detect null strings in CSV string columns
  • ARROW-5201 - [Python] handle collections.abc deprecation warnings
  • ARROW-5208 - [Python] Add mask argument to pyarrow.infer_type, do not look at masked values when inferring output type in pyarrow.array
  • ARROW-5214 - [C++] Fix thirdparty download script
  • ARROW-5217 - [Rust][DataFusion] Fix failing tests
  • ARROW-5232 - [Java] Avoid runaway doubling of vector size
  • ARROW-5233 - [Go] Migrate to flatbuffers-v1.11.0
  • ARROW-5237 - [Python] populate pandasapi.version
  • ARROW-5240 - [C++][CI] pin cmake_format
  • ARROW-5242 - [C++] Update vendored HowardHinnant/date to master
  • ARROW-5243 - [Java][Gandiva] Add decimal compare tests
  • ARROW-5245 - [CI][C++] Unpin cmake format (current version is 5.1)
  • ARROW-5246 - [Go] use Go-1.12.x in CI
  • ARROW-5249 - [Java] Add auth capability to Flight async operations (#4238)
  • ARROW-5253 - [C++] Fix snappy external build
  • ARROW-5254 - [Flight][Java] Change Flight doAction to allow multiple responses in Java
  • ARROW-5255 - [Java] Proof-of-concept of Java extension types
  • ARROW-5260 - [Python] Fix crash when deserializating from components in another process
  • ARROW-5274 - [JavaScript] Wrong array type for countBy
  • ARROW-5283 - [C++][Plasma] Erase object id in client when abort object
  • ARROW-5285 - [C++][Plasma] Implement to release GpuProcessHandle
  • ARROW-5293 - [C++] Take kernel on DictionaryArray does not preserve ordered flag
  • ARROW-5294 - [Python][CI] Fix manylinux1 build
  • ARROW-5296 - [Java] Ignore timeout-based Flight tests for now
  • ARROW-5301 - [Python] update parquet docs on multithreading
  • ARROW-5304 - [C++] fix thread-safe on CudaDeviceManager::GetInstance
  • ARROW-5306 - [CI][GLib] Disable GTK-Doc
  • ARROW-5308 - [Go] remove deprecated Feather format
  • ARROW-5314 - [Go] fix bug for String Arrays with offset
  • ARROW-5314 - [Go] Fix bug for FixedSizeBinary with offset
  • ARROW-5318 - [Python] pyarrow hdfs reader overrequests
  • ARROW-5325 - [Archery][Benchmark] Output properly formatted jsonlines from benchmark diff cli command
  • ARROW-5330 - [CI][skip appveyor]
  • ARROW-5332 - [R] Update R package README with richer installation instructions
  • ARROW-5348 - [Java][CI] Add missing gandiva javadoc
  • ARROW-5360 - [Rust] Update rustyline to fix build
  • ARROW-5362 - [C++] Fix compression test memory usage
  • ARROW-5371 - [Release] Add tests for dev/release/00-prepare.sh
  • ARROW-5373 - [Java] Add missing details for Gandiva Java Build
  • ARROW-5376 - [C++] Workaround for gcc 5.4.0 bug
  • ARROW-5383 - [Go] Update flatbuf for new Duration type
  • ARROW-5387 - [Go] properly handle sub-slice of List
  • ARROW-5388 - [Go] use arrow.TypeEquals in array.NewChunked
  • ARROW-5390 - [CI][skip appveyor]
  • ARROW-5397 - [FlightRPC] Add TLS certificates for testing Flight
  • ARROW-5398 - [Python] Fix Flight tests
  • ARROW-5403 - [C++] Use GTest shared libraries with BUNDLED build, always use BUNDLED with MSVC
  • ARROW-5411 - [C++][Python] Build error building on Mac OS Mojave
  • ARROW-5412 - [Integration] Add Java option for netty reflection
  • ARROW-5419 - [C++] Allow recognizing empty strings as null strings in CSV files
  • ARROW-5421 - [Packaging][Crossbow] Duplicated key in nightly test configuration
  • ARROW-5422 - [CI] [C++] Build failure with Google Benchmark
  • ARROW-5430 - [Python] Raise ArrowInvalid for pyints larger than int64
  • ARROW-5435 - [Java] Add test for IntervalYearVector#getAsStringBuilder
  • ARROW-5437 - [Python] Missing pandas pytest marker from parquet tests
  • ARROW-5446 - [C++][CMake] Install arrow/util/config.h into CMAKE_INSTALL_INCLUDEDIR
  • ARROW-5448 - [C++][CI][MinGW][skip travis]
  • ARROW-5453 - [C++] Update to cmake-format=0.5.2 and pin again
  • ARROW-5455 - [Rust] Build broken by 2019-05-30 Rust nightly
  • ARROW-5456 - [GLib][Plasma] Fix dependency order on building document
  • ARROW-5457 - [GLib][Plasma] Fix environment variable name for test
  • ARROW-5459 - [Go] implement Stringer for float16 DataType
  • ARROW-5462 - [Go] support writing zero-length List arrays
  • ARROW-5479 - [Rust][DataFusion] Use ARROW_TEST_DATA instead of relative path for testing
  • ARROW-5487 - [Docs] Fix Sphinx failure
  • ARROW-5493 - [Go][Integration] add Go support for IPC integration tests
  • ARROW-5507 - [Plasma][CUDA] Fix compile error
  • ARROW-5514 - [C++] Fix pretty-printing uint64 values
  • ARROW-5517 - [C++] Only check header basename for 'internal' when collecting public headers
  • ARROW-5520 - [Packaging][deb] Add support for building on arm64
  • ARROW-5521 - [Packaging] Use Apache RAT 0.13
  • ARROW-5528 - [C++] Fixed a bug when Concatenate() arrays with no value buffers.
  • ARROW-5532 - [JS] Field Metadata Not Read
  • ARROW-5551 - [Go] implement FixedSizeArrays with 2-buffers layout
  • ARROW-5553 - [Ruby] Use the official packages to install Apache Arrow
  • ARROW-5576 - [C++] Query ASF mirror system for URL and use when downloading Thrift
  • ARROW-5577 - [C++][Alpine] Correct googletest shared library paths on non-Windows to fix Alpine build
  • ARROW-5583 - [Java] When the isSet of a NullableValueHolder is 0, the buffer field should not be used
  • ARROW-5584 - [Java] Add import for link reference in FieldReader javadoc
  • ARROW-5589 - [C++] Add missing nullptr check during flatbuffer decoding
  • ARROW-5592 - [Go] implement Duration array
  • ARROW-5596 - [Python] Fix Python-3 syntax only in test_flight.py
  • ARROW-5601 - [C++][Gandiva] fail if the output type is not supported
  • ARROW-5603 - [Python] Register custom pytest markers to avoid warnings
  • ARROW-5605 - [C++] Verify Flatbuffer messages in more places to prevent crashes due to bad inputs
  • ARROW-5606 - [Python] deal with deprecated RangeIndex.start/stop/_step
  • ARROW-5608 - [C++][parquet] Fix invalid memory access when using parquet::arrow::ColumnReader
  • ARROW-5615 - [C++] gcc 5.4.0 doesn't want to parse inline C++11 string R literal
  • ARROW-5616 - [C++][Python] Fix -Wwrite-strings warning when building against Python 2.7 headers
  • ARROW-5617 - [C++] thrift_ep 0.12.0 fails to build when using ARROW_BOOST_VENDORED=ON
  • ARROW-5619 - [C++] Make get_apache_mirror.py workable with Python 3.5
  • ARROW-5623 - [GLib][CI] Use system Meson on macOS
  • ARROW-5624 - [C++] Fix typo causing build failure when -Duriparser_SOURCE=BUNDLED
  • ARROW-5626 - [C++] Fix caching of expressions with decimals
  • ARROW-5629 - [C++] Fix Coverity issues
  • ARROW-5631 - [C++] Fix FindBoost targets with cmake3.2
  • ARROW-5644 - [Python] test_flight.py::test_tls_do_get appears to hang
  • ARROW-5647 - [Python] Accessing a file from Databricks using pandas read_parquet using the pyarrow engine fails with : Passed non-file path: /mnt/aa/example.parquet
  • ARROW-5648 - [C++] Avoid using codecvt
  • ARROW-5654 - [C++][Python] Add ChunkedArray::Validate method that checks chunk types for consistency, invoke in Python
  • ARROW-5657 - [C++] "docker-compose run cpp" broken in master
  • ARROW-5674 - [Python] Missing pandas pytest markers from test_parquet.py
  • ARROW-5675 - [Doc] Fix typo in Xcode workflow documentation
  • ARROW-5678 - [R][Lint] Fix hadolint docker linting error
  • ARROW-5693 - [Go] skip IPC integration tests for Decimal128
  • ARROW-5697 - [GLib] Use system pkg-config in c_glib/Dockerfile to correctly find system libraries such as libglib
  • ARROW-5698 - [R] Fix docker-compose build
  • ARROW-5709 - [C++] Fix gandiva-date_time_test failure on Windows
  • ARROW-5714 - [JS] Inconsistent behavior in Int64Builder with/without BigNum
  • ARROW-5723 - [C++][Arrow] Fix crossbow failure
  • ARROW-5728 - [Python] Pin jpype1 version to 0.6.3 due to CI breakage from 0.7.0
  • ARROW-5729 - [Python][Java] ArrowType.Int object has no attribute 'isSigned'
  • ARROW-5730 - [Python][CI] Selectively skip test cases in the dask integration test
  • ARROW-5732 - [C++] macOS builds failing idiosyncratically on master with warnings from pmmintrin.h
  • ARROW-5735 - [C++] Appveyor builds failing persistently in thrift_ep build
  • ARROW-5737 - [Crossbow] Use Python version version 2.7 in the gandiva tasks
  • ARROW-5738 - [Crossbow][Conda] OSX package builds are failing with missing intrinsics
  • ARROW-5739 - [CI] Fix python docker image
  • ARROW-5750 - [Java] Fix java compilation errors
  • ARROW-5754 - [C++] Add override mark for ~GrpcStreamWriter
  • ARROW-5765 - [C++] Fix TestDictionary.Validate in release mode, add docker-compose job for testing C++ release build
  • ARROW-5769 - [Release] Ensure setting up test data in dev/release/00-prepare.sh
  • ARROW-5770 - [C++] Fix -Wpessimizing-move in result.h
  • ARROW-5771 - [Python] Add pytz to conda_env_python.yml to fix python-nopandas build
  • ARROW-5774 - [Java][Documentation] Document the need to checkout git submodules for flight
  • ARROW-5781 - [Archery] Ensure benchmark clone accepts remote in revision
  • ARROW-5791 - [Python] pyarrow.csv.read_csv hangs + eats all RAM
  • ARROW-5816 - [Release] Parallel curl does not work reliably in verify-release-candidate-sh
  • ARROW-5922 - [Python] Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API
  • PARQUET-1402 - [C++] Parquet files with dictionary page offset as 0 is not readable
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1405 - Fix writing statistics into DataPageHeader
  • PARQUET-1565 - [C++] Add default case to catch all unhandled physical types
  • PARQUET-1571 - [C++] Fix BufferedInputStream when buffer exactly exhausted
  • PARQUET-1574 - [C++] fix parquet-encoding-test
  • PARQUET-1581 - [C++] Fix undefined behavior in encoding.cc

Apache Arrow 0.13.0 (2019-04-01)

Bug Fixes

  • ARROW-295 - [Documentation] Add DOAP file
  • ARROW-1171 - [C++] Segmentation faults on Fedora 24 with pyarrow-manylinux1 and self-compiled turbodbc
  • ARROW-2392 - [C++] Check schema compatibility when writing a RecordBatch
  • ARROW-2399 - [Rust] Builder<T> should not provide a set() method
  • ARROW-2598 - [Python] table.to_pandas segfault
  • ARROW-3086 - [GLib] GISCAN fails due to conda-shipped openblas
  • ARROW-3096 - [Python] Update Python source build instructions given Anaconda/conda-forge toolchain migration
  • ARROW-3133 - [C++] Remove allocation from Binary Boolean Kernels.
  • ARROW-3133 - [C++] Remove allocations from InvertKernel
  • ARROW-3208 - [C++] Fix Cast dictionary to numeric segfault
  • ARROW-3426 - [CI] Java integration test very verbose
  • ARROW-3564 - [C++] Fix dictionary encoding logic for Parquet 2.0
  • ARROW-3578 - [Release] Resolve all hard and symbolic links in tar.gz
  • ARROW-3593 - [R] CI builds failing due to GitHub API rate limits
  • ARROW-3606 - [Crossbow] Fix flake8 crossbow warnings
  • ARROW-3669 - [Python] Raise error on Numpy byte-swapped array
  • ARROW-3843 - [C++][Python] Allow a "degenerate" Parquet file with no columns
  • ARROW-3923 - [Java] JDBC Time Fetches Without Timezone
  • ARROW-4007 - [Java][Plasma] Plasma JNI tests failing
  • ARROW-4050 - [Python][Parquet] core dump on reading parquet file
  • ARROW-4081 - [Go] Sum methods panic when the array is empty
  • ARROW-4104 - [Java] race in AllocationManager during release
  • ARROW-4108 - [Python/Java] Spark integration tests do not work
  • ARROW-4117 - [Python] "asv dev" command fails with latest revision
  • ARROW-4140 - [C++][Gandiva] Compiled LLVM bitcode file path may result in libraries being non-relocatable
  • ARROW-4145 - [C++] Find Windows-compatible strptime implementation
  • ARROW-4181 - [Python] Fixes for Numpy struct array conversion
  • ARROW-4192 - [CI] Fix broken dev/run_docker_compose.sh script
  • ARROW-4213 - [Flight] Fix incompatibilities between C++ and Java
  • ARROW-4244 - [Format] Clarify padding/alignment rationale/recommendation.
  • ARROW-4250 - [C++] adding explicit epsilon for ApproxEquals and corresponding assert macro
  • ARROW-4252 - [C++] Fix missing Status code and newline
  • ARROW-4253 - [GLib] Cannot use non-system Boost specified with $BOOST_ROOT
  • ARROW-4254 - [C++][Gandiva] Build with Boost from Ubuntu Trusty apt
  • ARROW-4255 - [C++] Eagerly initialize name_to_index_ to avoid race
  • ARROW-4261 - [C++] Make CMake paths for IPC, Flight, Thrift, and Plasma subproject compatible
  • ARROW-4264 - [C++] Clarify use of DCHECKs in Kernels
  • ARROW-4267 - [C++/Parquet] Handle duplicate and struct columns in RowGroup reads
  • ARROW-4274 - [C++][Gandiva] split decimal into two parts
  • ARROW-4275 - [C++][Gandiva] Fix slow decimal test
  • ARROW-4280 - Update README.md to reflect parquet deps
  • ARROW-4282 - [Rust] builder benchmark is broken
  • ARROW-4284 - [C#] File / Stream serialization fails due to type mismatch / missing footer
  • ARROW-4295 - [C++][Plasma] Fix incorrect log message
  • ARROW-4296 - [Plasma] Use one mmap file by default, prevent crash with -f
  • ARROW-4308 - [Python] pyarrow has a hard dependency on pandas
  • ARROW-4311 - [Python] Regression on pq.ParquetWriter incorrectly handling source string
  • ARROW-4312 - [C++] Only run 2 * os.cpu_count() clang-format instances at once
  • ARROW-4319 - [C++][Plasma] plasma/store.h pulls in flatbuffer dependency
  • ARROW-4320 - [C++] Add tests for non-contiguous tensors
  • ARROW-4322 - [C++] Don't use GLIBCXXUSE_CXX11_ABI=0 anymore in docker scripts
  • ARROW-4323 - [Packaging] Fix failing OSX clang conda forge builds
  • ARROW-4326 - [C++] Development instructions in python/development.rst will not work for many Linux distros with new conda-forge toolchain
  • ARROW-4327 - [Python] Add requirements-build.txt convenience file
  • ARROW-4328 - Add a ARROW_USE_OLD_CXXABI configure var to R
  • ARROW-4329 - Python should include the parquet headers
  • ARROW-4342 - [Gandiva][Java] Ignore flaky test.
  • ARROW-4347 - [CI][Python] Also run Python builds when Java affected.
  • ARROW-4349 - [C++] Add static linking option for benchmarks, fix Windows benchmark build failures
  • ARROW-4351 - [C++] Fix CMake errors when neither building shared libraries nor tests
  • ARROW-4355 - [C++] Reorder testing code into src/arrow/testing
  • ARROW-4360 - [C++] Query homebrew for Thrift
  • ARROW-4364 - [C++] Fix CHECKIN warnings
  • ARROW-4366 - [Docs] Change extension from format/README.md to format/README.rst
  • ARROW-4367 - [C++] StringDictionaryBuilder segfaults on Finish with only null entries
  • ARROW-4368 - [Docs] Fix install document for Ubuntu 16.04 or earlier
  • ARROW-4370 - [Python][Bool] to pandas
  • ARROW-4374 - [C++] DictionaryBuilder does not correctly report length and null_count
  • ARROW-4381 - [CI] Update linter container build instructions
  • ARROW-4382 - [C++] Improve new cpplint output readability
  • ARROW-4384 - [C++] Running "format" target on new Windows 10 install opens "how do you want to open this file" dialog
  • ARROW-4385 - [Packaging] Fix PyArrow version update pattern on release
  • ARROW-4389 - [R] Don't install clang-tools in test job
  • ARROW-4395 - [JS] Fix ts-node error running bin/arrow2csv
  • ARROW-4400 - [CI] Switch to https repo for llvm
  • ARROW-4403 - [Rust] Fix format errors
  • ARROW-4404 - [CI] AppVeyor toolchain build does not build anything
  • ARROW-4407 - [C++] Cache compiler for CMake external projects
  • ARROW-4410 - [C++] Fix edge cases in InvertKernel
  • ARROW-4413 - [Python] Fix pa.hdfs.connect() on Python 2
  • ARROW-4414 - [C++] Stop using cmake COMMAND_EXPAND_LISTS because it breaks package builds for older distros
  • ARROW-4417 - [C++] Fix doxygen build
  • ARROW-4420 - [INTEGRATION] Make spark integration test pass and test against spark's master branch
  • ARROW-4421 - [C++][Flight] Handle large RPC messages in Flight
  • ARROW-4434 - [Python] Allow creating trivial StructArray
  • ARROW-4440 - [C++] Revert recent changes to flatbuffers EP causing flakiness
  • ARROW-4457 - [Python] Allow creating Decimal array from Python ints
  • ARROW-4469 - [CI] Pin conda-forge binutils version to 2.31 for now
  • ARROW-4471 - [C++] Pass AR and RANLIB to all external projects
  • ARROW-4474 - Use signed integers in FlightInfo payload size fields
  • ARROW-4480 - [Python] Drive letter removed when writing parquet file
  • ARROW-4487 - [C++] Appveyor toolchain build does not actually build the project
  • ARROW-4494 - [Java] arrow-jdbc JAR is not uploaded on release
  • ARROW-4496 - [Python] Pin to gfortran<4
  • ARROW-4498 - [Plasma] Fix building Plasma with CUDA enabled
  • ARROW-4500 - [C++] Remove pthread / librt hacks causing linking issues in some Linux environments
  • ARROW-4501 - Fix out-of-bounds read in DoubleCrcHash
  • ARROW-4525 - [Rust][Parquet] Enable conversion of ArrowError to ParquetError
  • ARROW-4527 - [Packaging][Linux] Use LLVM 7
  • ARROW-4532 - [Java] fix bug causing very large varchar value buffers
  • ARROW-4533 - [Python] Document how to run hypothesis tests
  • ARROW-4535 - [C++] Fix MakeBuilder to preserve ListType's field name
  • ARROW-4536 - [GLib] Add data_type argument in garrow_list_array_new
  • ARROW-4538 - [Python] Remove index column from subschema in write_to_dataframe
  • ARROW-4549 - [C++] Can't build benchmark code on CUDA enabled build
  • ARROW-4550 - [JS] Fix AMD pattern
  • ARROW-4559 - [Python] Allow Parquet files with special characters in their names
  • ARROW-4563 - [Python] Validate decimal128() precision input
  • ARROW-4571 - [Format] Tensor.fbs file has multiple root_type declarations
  • ARROW-4573 - [Python] Add Flight unit tests
  • ARROW-4576 - [Python] Fix error during benchmarks
  • ARROW-4577 - [C++] Don't set interface link libs on arrow_shared where there are none
  • ARROW-4581 - [C++] Do not require googletest_ep or gbenchmark_ep for library targets
  • ARROW-4582 - [Python/C++] Acquire the GIL on Py_INCREF
  • ARROW-4584 - [Python] Add built wheel to manylinux1 dockerignore
  • ARROW-4585 - [C++] Add protoc dependency to flight_testing
  • ARROW-4587 - [C++] Fix segfaults around DoPut implementation
  • ARROW-4597 - [C++] Targets for system Google Mock shared library are missing
  • ARROW-4601 - [Python] Add license header to dockerignore
  • ARROW-4606 - [Rust] [DataFusion] FilterRelation created RecordBatch with empty schema
  • ARROW-4608 - [C++] cmake script assumes that double-conversion installs static libs
  • ARROW-4617 - [C++] Support double-conversion<3.1
  • ARROW-4624 - [C++] Fix building benchmarks
  • ARROW-4629 - [Python] Pandas arrow conversion slowed down by imports
  • ARROW-4635 - [Java] allocateNew to use last capacity
  • ARROW-4639 - [CI] Switch off GFLAGS_SHARED for osx
  • ARROW-4641 - [C++][Flight] Suppress strict aliasing warnings from "unsafe" casts in client.cc
  • ARROW-4642 - [R] change f to file in read_parquet_file()
  • ARROW-4653 - [C++] Fix bug in decimal multiply
  • ARROW-4654 - [C++] Explicit flight.cc source dependencies
  • ARROW-4657 - Don't build benchmarks in release verify script
  • ARROW-4658 - [C++] Shared gflags is also a run-time conda requirement
  • ARROW-4659 - [CI] ubuntu/debian nightlies fail because of missing gandiva files
  • ARROW-4660 - [C++] Use set_target_properties for defining GFLAGS_IS_A_DLL
  • ARROW-4664 - [C++] Do not execute expressions inside DCHECK macros in release builds
  • ARROW-4669 - [Java] Add validity checks to slice
  • ARROW-4672 - [CI] Fix clang-7 build entry
  • ARROW-4680 - [CI][Rust] Travis CI builds fail with latest Rust 1.34.0…
  • ARROW-4684 - [Python] CI failures in test_cython.py
  • ARROW-4687 - [Python] Stop Flight server on incoming signals
  • ARROW-4688 - [C++][Parquet] Chunk binary column reads at 2^31 - 1 byte boundaries to avoid splitting chunk inside nested string cell
  • ARROW-4696 - Better CUDA detection in release verification script
  • ARROW-4699 - [C++] remove json chunker's requirement of null terminated buffers
  • ARROW-4704 - [GLib][CI] Ensure killing plasma_store_server
  • ARROW-4710 - [C++][R] New linting script skip files with "cpp" extension
  • ARROW-4712 - [C++][CI] fix build (sum.cc) has warnings in clang
  • ARROW-4721 - [Rust][DataFusion] Propagate schema in filter
  • ARROW-4724 - [C++][CI] Enable Python build and test in MinGW build
  • ARROW-4728 - [JS] Fix Table#assign when passed zero-length RecordBatches
  • ARROW-4737 - run C# tests in CI
  • ARROW-4744 - [C++][CI] Change mingw builds back to debug. Cleanup up some version warnings
  • ARROW-4750 - [C++] RapidJSON triggers Wclass-memaccess on GCC 8+
  • ARROW-4760 - [C++] protobuf 3.7 defines EXPECT_OK that clashes with Arrow's macro
  • ARROW-4766 - [C++] Fix empty array cast segfault
  • ARROW-4767 - [C#] ArrowStreamReader crashes while reading the end of a stream
  • ARROW-4768 - [C++][CI] Don't run flaky tests in MinGW build
  • ARROW-4774 - [C++] Fix FileWriter::WriteTable segfault
  • ARROW-4775 - [Site] Site navbar cannot be expanded
  • ARROW-4783 - [C++][CI] Disable arrow thread-pool test on mingw to avoid appveyor timeouts
  • ARROW-4793 - [Ruby] Suppress unused variable warning
  • ARROW-4796 - [Flight/Python] Keep underlying Python object alive in FlightServerBase.do_get
  • ARROW-4802 - [Python] Follow symlinks when deriving Hadoop classpath for HDFS
  • ARROW-4807 - [Rust] Fix csv_writer benchmark
  • ARROW-4811 - [C++] Fix misbehaving CMake dependency on flight_grpc_gen
  • ARROW-4813 - [Ruby] Add tests for == and !=
  • ARROW-4820 - [Python] hadoop class path derived not correct
  • ARROW-4822 - [C++/Python] Check for None on calls to equals
  • ARROW-4828 - [Python] manylinux1 docker-compose context should be python/manylinux1
  • ARROW-4850 - [CI] Ensure integration_test.py returns non-zero on failures
  • ARROW-4853 - [Rust] Array slice doesn't work on ListArray and StructArray
  • ARROW-4857 - [C++/Python/CI] docker-compose in manylinux1 crossbow jobs too old
  • ARROW-4866 - [C++] Fix zstd_ep build for Debug, static CRT builds. Add separate CMake variable for propagating compiler toolchain to ExternalProjects
  • ARROW-4867 - [Python] Respect ordering of columns argument passed to Table.from_pandas
  • ARROW-4869 - [C++] Fix gmock usage in compute/kernels/util-internal-test.cc
  • ARROW-4870 - [Ruby] Fix mys2_mingw_dependencies
  • ARROW-4871 - [Java/Flight] Handle large Flight messages
  • ARROW-4872 - [Python] Keep backward compatibility for ParquetDatasetPiece
  • ARROW-4879 - [C++] cmake can't use conda's flatbuffers
  • ARROW-4881 - [C++] remove references to ARROW_BUILD_TOOLCHAIN
  • ARROW-4900 - [C++] polyfill __cpuidex on mingw-w64
  • ARROW-4903 - [C++] Fix static/shared-only builds
  • ARROW-4906 - [Format] Write about SparseMatrixIndexCSR format is sorted
  • ARROW-4918 - [C++] Add cmake-format to pre-commit
  • ARROW-4928 - [Python] Fix Hypothesis test failures
  • ARROW-4931 - [C++] CMake fails on gRPC ExternalProject
  • ARROW-4938 - [Glib] Undefined symbols error occurred when GIR file is being generated.
  • ARROW-4942 - [Ruby] Remove needless omits in tests
  • ARROW-4948 - [JS] Nightly test failure
  • ARROW-4950 - [C++] Fix CMake 3.2 build
  • ARROW-4952 - [C++] Floating-point comparisons should consider NaNs unequal
  • ARROW-4953 - [Ruby] Not loading libarrow-glib
  • ARROW-4954 - [Python] Fix test failure with Flight enabled
  • ARROW-4958 - [C++] Parquet benchmarks depend on its static test libs
  • ARROW-4961 - [C++] Add documentation note that GTest_SOURCE=BUNDLED is current required on Windows
  • ARROW-4962 - [C++] Warning level to CHECKIN can't compile on modern GCC
  • ARROW-4976 - [JS] Invalidate RecordBatchReader node/dom streams on reset()
  • ARROW-4982 - [GLib][CI] Run tests on AppVeyor
  • ARROW-4984 - Check if Flight gRPC server starts properly
  • ARROW-4986 - [CI] Travis fails to install llvm@7
  • ARROW-4989 - [C++] Find re2 on Ubuntu if asked to
  • ARROW-4991 - [CI] Bump travis node version to 11.12
  • ARROW-4997 - [C#] ArrowStreamReader doesn't consume whole stream and doesn't implement sync read.
  • ARROW-5009 - [C++] Remove using std::.* where I could find them
  • ARROW-5010 - [Release] Fix source release docker
  • ARROW-5012 - [C++] Install testing headers
  • ARROW-5023 - [Release] Fix default value syntax in 02-source.sh
  • ARROW-5024 - [Release] Fix missing variable with --arrow-version
  • ARROW-5025 - [Python][Packaging] Fix gandiva.dll detection
  • ARROW-5026 - [Python][Packaging] Fix gandiva.dll detection on non Windows
  • ARROW-5029 - [C++] Fix compilation warnings in release mode
  • ARROW-5031 - [Dev] Run CUDA Python tests in release verification script
  • ARROW-5042 - [Release] Use the correct dependency source in verification script
  • ARROW-5043 - [Release][Ruby] Fix dependency error in verification script
  • ARROW-5044 - [Release][Rust] Use stable toolchain for format check in verification script
  • ARROW-5046 - [Release][C++] Exclude fragile Plasma test from verification script
  • ARROW-5047 - [Release] Always set up parquet-testing in verification script
  • ARROW-5048 - [Release][Rust] Set up arrow-testing in verification script
  • ARROW-5050 - [C++] cares_ep should build before grpc_ep
  • ARROW-5087 - [Debian] APT repository no longer contains libarrow-dev
  • ARROW-5658 - [JAVA] Provide ability to resync VectorSchemaRoot if types change
  • PARQUET-1482 - [C++] Add branch to TypedRecordReader::ReadNewPage for …
  • PARQUET-1494 - [C++] Recognize statistics built with UNSIGNED sort order by parquet-mr 1.10.0 onwards
  • PARQUET-1532 - [C++] Fix build error with MinGW

New Features and Improvements

  • ARROW-47 - [C++] Preliminary arrow::Scalar object model
  • ARROW-331 - [Doc] Add statement about Python 2.7 compatibility
  • ARROW-549 - [C++] Add arrow::Concatenate function to combine multiple arrays into a single Array
  • ARROW-572 - [C++] Apply visitor pattern in IPC metadata
  • ARROW-585 - [C++] Experimental public API for user-defined extension types and arrays
  • ARROW-694 - [C++] Initial parser interface for reading JSON into RecordBatches
  • ARROW-1425 - [Python][Documentation] Examples of convert Timestamps to/from pandas via arrow
  • ARROW-1572 - [C++] Implement "value counts" kernels for tabulating value frequencies
  • ARROW-1639 - [Python] Serialize RangeIndex as metadata via Table.from_pandas instead of converting to a column of integers
  • ARROW-1642 - [GLib] Build GLib using Meson in Appveyor
  • ARROW-1807 - [JAVA] Reduce Heap Usage (Phase 3): consolidate buffers
  • ARROW-1896 - [C++] Do not allocate memory inside CastKernel. Clean up template instantiation to not generate dead identity cast code
  • ARROW-2015 - [Java] Replace Joda time with Java 8 time
  • ARROW-2022 - [Format] Add metadata to message
  • ARROW-2112 - [C++] Enable cpplint to be run on Windows
  • ARROW-2243 - [C++] Enable IPO/LTO
  • ARROW-2409 - [Rust] Deny warnings in CI.
  • ARROW-2460 - [Rust] Schema and DataType::Struct should use Vec<Rc<Field>>
  • ARROW-2487 - [C++] Provide a variant of AppendValues that takes bytemaps for the nullability
  • ARROW-2523 - [Rust] Implement CAST operations for arrays
  • ARROW-2620 - [Rust] Integrate memory pool abstraction with rest of codebase
  • ARROW-2627 - [Python] Add option to pass memory_map argument to ParquetDataset
  • ARROW-2904 - [C++] Use FirstTimeBitmapWriter instead of SetBit functions in builder.h/cc
  • ARROW-3066 - [Wiki] Add "How to contribute" to developer wiki
  • ARROW-3084 - [Python] Do we need to build both unicode variants of pyarrow wheels?
  • ARROW-3107 - [C++] arrow::PrettyPrint for Column instances
  • ARROW-3121 - [C++] Mean aggregate kernel
  • ARROW-3123 - [C++] Implement Count aggregate kernel
  • ARROW-3135 - [C++] Add helper functions for validity bitmap propagation in kernel context
  • ARROW-3149 - [C++] Use gRPC (when it exists) from conda-forge for CI builds
  • ARROW-3162 - [Python][Flight] Enable implementing Flight servers in Python
  • ARROW-3162 - Flight Python bindings
  • ARROW-3239 - [C++] Implement simple random array generation
  • ARROW-3255 - [C++/Python] Migrate Travis CI jobs off Xcode 6.4
  • ARROW-3289 - [C++] Implement Flight DoPut
  • ARROW-3292 - [C++] Test Flight RPC in Travis CI
  • ARROW-3295 - [Packaging] Package gRPC libraries in conda-forge for use in builds, packaging
  • ARROW-3297 - [Python] Python bindings for Flight C++ client
  • ARROW-3311 - [R] Functions for deserializing IPC components from arrow::Buffer or from IO interface
  • ARROW-3328 - [Flight] Allow for optional unique flight identifier to be sent with FlightGetInfo
  • ARROW-3361 - [R] Also run cpplint on Rcpp source files
  • ARROW-3364 - [Docs] Add docker-compose integration documentation
  • ARROW-3367 - [INTEGRATION] Port Spark integration test to the docker-compose setup
  • ARROW-3422 - [C++] Uniformly add ExternalProject builds to the "toolchain" target. Fix gRPC EP build on Linux
  • ARROW-3434 - [Packaging] Add Apache ORC C++ library to conda-forge
  • ARROW-3435 - [C++] Add option to use dynamic linking with re2
  • ARROW-3511 - [Gandiva] Link filter and project operations
  • ARROW-3532 - [Python] Emit warning when looking up for duplicate struct or schema fields
  • ARROW-3550 - [C++] use kUnknownNullCount for the default null_count argument
  • ARROW-3554 - [C++] Reverse traits for C++
  • ARROW-3594 - [Packaging] Build "cares" library in conda-forge
  • ARROW-3595 - [Packaging] Build boringssl in conda-forge
  • ARROW-3596 - [Packaging] Build gRPC in conda-forge
  • ARROW-3619 - [R] Expose global thread pool optins
  • ARROW-3631 - [C#] Add Appveyor configuration
  • ARROW-3653 - [C++][Python] Support data copying between different GPU devices
  • ARROW-3735 - [Python] Add test for calling cast() with None
  • ARROW-3761 - [R] Bindings for CompressedInputStream, CompressedOutputStream
  • ARROW-3763 - [C++] Write Parquet ByteArray / FixedLenByteArray reader batches directly into arrow::BinaryBuilder
  • ARROW-3769 - [C++] Add support for reading non-dictionary encoded binary Parquet columns directly as DictionaryArray
  • ARROW-3770 - [C++] Validate schema for each table written with parquet::arrow::FileWriter
  • ARROW-3816 - [R] nrow.RecordBatch method
  • ARROW-3824 - [R] Add basic build and test documentation
  • ARROW-3838 - [Rust] CSV Writer
  • ARROW-3846 - [Gandiva][C++] Build Gandiva C++ libraries and get unit tests passing on Windows
  • ARROW-3882 - [Rust] Cast Kernel for most types
  • ARROW-3903 - [Python] Random array generator for Arrow conversion and Parquet testing
  • ARROW-3926 - [Python] Add Gandiva bindings to Python manylinux1 wheels
  • ARROW-3951 - [Go] implement a CSV writer
  • ARROW-3954 - [Rust] Add Slice to Array and ArrayData
  • ARROW-3965 - [Java] JDBC-To-Arrow Configuration
  • ARROW-3966 - [Java] JDBC Column Metadata in Arrow Field Metadata
  • ARROW-3972 - [C++] Migrate to LLVM 7. Add option to disable using ld.gold
  • ARROW-3981 - [C++] Rename json.h
  • ARROW-3985 - [C++] Let ccache preserve comments
  • ARROW-4012 - [Website] Add documentation how to install Apache Arrow on MSYS2
  • ARROW-4014 - [C++] Fix "LIBCMT" warnings on MSVC
  • ARROW-4023 - [Gandiva] Address long CI times in macOS builds
  • ARROW-4024 - [Python] Raise minimal Cython version to 0.29
  • ARROW-4031 - [C++] Refactor bitmap building
  • ARROW-4040 - [Rust] Add array_ops method for filtering an array
  • ARROW-4056 - [C++] Unpin boost-cpp in conda_env_cpp.yml
  • ARROW-4061 - [Rust][Parquet] Implement spaced version for non-diction…
  • ARROW-4068 - [Gandiva] Support building with Xcode 6.4
  • ARROW-4071 - [Rust] Add rustfmt as a pre-commit hook
  • ARROW-4072 - [Rust] Set default value for PARQUET_TEST_DATA
  • ARROW-4092 - [Rust] Implement common Reader / DataSource trait for CSV and Parquet
  • ARROW-4094 - [Python] Store RangeIndex in Parquet files as metadata rather than a physical data column
  • ARROW-4110 - [C++] Do not generate distinct cast kernels when input and output type are the same
  • ARROW-4123 - [C++] Enable linting tools to be run on Windows
  • ARROW-4124 - [C++] Draft Aggregate and Sum kernels
  • ARROW-4142 - [Java] JDBC Array -> Arrow ListVector
  • ARROW-4165 - [C++] Port cpp/apidoc/Windows.md and other files to Sphinx / rst
  • ARROW-4180 - [Java] Make CI tests use logback.xml
  • ARROW-4196 - [Rust] Add explicit SIMD vectorization for arithmetic ops in "array_ops"
  • ARROW-4198 - [Gandiva] Added support to cast timestamp
  • ARROW-4204 - [Gandiva] add support for decimal subtract
  • ARROW-4205 - [Gandiva] Support for decimal multiply
  • ARROW-4206 - [Gandiva] support decimal divide and mod
  • ARROW-4212 - [C++][Python] CudaBuffer view of arbitrary device memory object
  • ARROW-4230 - [C++] Fix Flight builds with gRPC/Protobuf/c-ares
  • ARROW-4232 - [C++] Follow conda-forge compiler ABI migration
  • ARROW-4234 - [C++] Improve memory bandwidth test
  • ARROW-4235 - [GLib] Use "column_builder" in GArrowRecordBatchBuilder
  • ARROW-4236 - [java] Distinct plasma client create exceptions
  • ARROW-4245 - [Rust] Add Rustdoc header to source files
  • ARROW-4247 - [Packaging] Update verify script for 0.12.0
  • ARROW-4251 - [C++][Release] Add option to set ARROW_BOOST_VENDORED environment variable in verify-release-candidate.sh
  • ARROW-4262 - [Website] Preview to Spark with Arrow and R improvements
  • ARROW-4263 - [Rust] Donate DataFusion
  • ARROW-4265 - [C++] Automatic conversion between Table and std::vector<std::tuple<..>>
  • ARROW-4268 - [C++] Native C type TypeTraits
  • ARROW-4271 - [Rust] Move Parquet specific info to Parquet Readme
  • ARROW-4273 - [Release] Fix verification script to use cf201901 conda-forge label
  • ARROW-4277 - [C++] Add gmock to the toolchain
  • ARROW-4281 - [CI] Use Ubuntu Xenial VMs on Travis-CI
  • ARROW-4285 - [Python] Use proper builder interface for serialization
  • ARROW-4287 - [C++] Ensure minimal bison version on OSX for Thrift
  • ARROW-4289 - [C++] Forward AR and RANLIB to thirdparty builds
  • ARROW-4290 - [C++/Gandiva] Support detecting correct LLVM version in Homebrew
  • ARROW-4291 - [Dev] Support selecting features in release verification scripts
  • ARROW-4294 - [C++][Plasma] Add support for evicting Plasma objects to external store
  • ARROW-4297 - [C++] Fix build error with MinGW-w64 32-bit
  • ARROW-4298 - [Java] Add javax.annotation-api dependency for JDK >= 9
  • ARROW-4299 - [Ruby] Depend on the same version as Red Arrow
  • ARROW-4300 - [C++] Restore apache-arrow Homebrew recipe and define process for maintaining and updating for releases
  • ARROW-4303 - [Gandiva/Python] Build LLVM with RTTI in manylinux1 container
  • ARROW-4305 - [Rust] Fix parquet version number in README
  • ARROW-4307 - [C++] Fix Doxygen warnings
  • ARROW-4310 - [Website] Update install document for 0.12.0
  • ARROW-4313 - Define general benchmark database schema
  • ARROW-4315 - [Website] Add Go and Rust to list of supported languages
  • ARROW-4318 - [C++] Add Tensor::CountNonZero
  • ARROW-4321 - [CI] Setup conda-forge channel globally in docker containers
  • ARROW-4330 - [C++] More robust discovery of pthreads
  • ARROW-4331 - [C++] Extend Scalar Datum to support more types
  • ARROW-4332 - [Website] Improve documentation for publishing site
  • ARROW-4334 - [CI] Setup conda-forge channel globally in travis builds
  • ARROW-4335 - [C++] Better document sparse tensor support
  • ARROW-4336 - [C++] Change default build type to RELEASE
  • ARROW-4339 - [C++][Python] Developer documentation overhaul for 0.13 release
  • ARROW-4340 - [C++][CI] Build IWYU for LLVM 7 in iwyu docker-compose job
  • ARROW-4341 - [C++] Refactor Primitive builders and BooleanBuilder to use TypedBufferBuilder<T>
  • ARROW-4344 - [Java] Further cleanup mvn output, upgrade rat plugin
  • ARROW-4345 - [C++] Add Apache 2.0 license file to the Parquet-testing repository
  • ARROW-4346 - [C++] Fix class-memaccess warning on gcc 8.x
  • ARROW-4352 - [C++] Add support for system Google Test
  • ARROW-4353 - [CI] Add MinGW builds
  • ARROW-4358 - [CI] Restore support for trusty in CI
  • ARROW-4361 - [Website] Update commiters list
  • ARROW-4362 - [Java] Test OpenJDK 11 in CI
  • ARROW-4363 - [CI][C++] Add CMake format checks
  • ARROW-4372 - [C++] Embed precompiled bitcode in the gandiva library
  • ARROW-4373 - [Packaging] Travis fails to deploy conda packages on OSX
  • ARROW-4375 - [CI] Sphinx dependencies were removed from docs conda environment
  • ARROW-4376 - [Rust] Implement from_buf_reader for csv::Reader
  • ARROW-4377 - [Rust] Implement std::fmt::Debug for PrimitiveArrays
  • ARROW-4379 - [Python] Register serializers for collections.Counter and collections.deque.
  • ARROW-4383 - [C++] Use the CMake's standard find features
  • ARROW-4386 - [Rust] Temporal array support
  • ARROW-4388 - [Go] add DimNames() method to tensor Interface
  • ARROW-4393 - [Rust] coding style: apply 90 characters per line limit
  • ARROW-4396 - [JS] Update Typedoc for TypeScript 3.2
  • ARROW-4397 - [C++] Add dim_names in Tensor and SparseTensor
  • ARROW-4399 - [C++] Do not use extern template class with NumericArray<T> and NumericTensor<T>
  • ARROW-4401 - [Python] Alpine dockerfile fails to build because pandas requires numpy as build dependency
  • ARROW-4406 - [Python] Exclude HDFS directories in S3 from ParquetManifest
  • ARROW-4408 - [CPP/Doc] Remove outdated Parquet documentation
  • ARROW-4422 - [Plasma] Enforce memory limit in plasma, rather than relying on dlmalloc_set_footprint_limit
  • ARROW-4423 - [C++] Upgrade vendored gmock/gtest to 1.8.1
  • ARROW-4424 - [Python] Install tensorflow and keras-preprocessing in manylinux1 container
  • ARROW-4425 - Add link to 'Contributing' page in the top-level Arrow README
  • ARROW-4430 - [C++] Fix untested TypedByteBuffer<T>::Append method
  • ARROW-4431 - [C++] Fixes for gRPC vendored builds
  • ARROW-4435 - Minor fixups to csharp .sln and .csproj file
  • ARROW-4436 - [Documentation] Update building.rst to reflect pyarrow req
  • ARROW-4442 - [JS] Add explicit type annotation to Chunked typeId getter
  • ARROW-4444 - [Testing] Add DataFusion test files to arrow-testing repo
  • ARROW-4445 - [C++][Gandiva] Run Gandiva-LLVM tests in Appveyor
  • ARROW-4446 - [C++][Python] Run Gandiva C++ unit tests in Appveyor, get build and tests working in Python
  • ARROW-4448 - [Java][Flight] Disable flaky TestBackPressure
  • ARROW-4449 - [Rust] Convert File to T: Read + Seek for schema inference
  • ARROW-4454 - [C++] fix unused parameter warnings
  • ARROW-4455 - [Plasma] Suppress class-memaccess warnings
  • ARROW-4459 - [Testing] Add arrow-testing repo as submodule
  • ARROW-4460 - [Website] DataFusion Blog Post
  • ARROW-4461 - [C++] Expose bit map operations that work with raw pointers
  • ARROW-4462 - [C++] Upgrade LZ4 v1.7.5 to v1.8.3 to compile with VS2017
  • ARROW-4464 - [Rust][DataFusion] Add support for LIMIT
  • ARROW-4466 - [Rust][DataFusion] Add support for Parquet data source
  • ARROW-4468 - [Rust] Implement BitAnd/BitOr for &Buffer (with SIMD) (#3571)
  • ARROW-4472 - [Website][Python] Blog post about string memory use work in Arrow 0.12
  • ARROW-4475 - [Python] Fix recursive serialization of self-containing objects
  • ARROW-4476 - [Rust][DataFusion] Update README to cover DataFusion and new testing git submodule
  • ARROW-4481 - [Website] Remove generated specification docs from site after docs migration
  • ARROW-4483 - [Website] Add myself to contributors.yaml to fix broken link in blog post
  • ARROW-4485 - [CI] Determine maintenance approach to pinned conda-forge binutils package
  • ARROW-4486 - [Python][CUDA] Add base argument to foreign_buffer
  • ARROW-4488 - [Rust][u8] > for Buffer does not ensure correct padding
  • ARROW-4489 - [Rust] PrimitiveArray.value_slice performs bounds checking when it should not
  • ARROW-4490 - [Rust] Add explicit SIMD vectorization for boolean ops in "array_ops"
  • ARROW-4491 - [Python] Use StringConverter and stringstream instead of std::stoi and std::to_string
  • ARROW-4499 - [CI] Unpin flake8 in lint script, fix warnings in dev/
  • ARROW-4502 - [C#] Add support for zero-copy reads
  • ARROW-4506 - [Ruby] Add Arrow::RecordBatch#raw_records
  • ARROW-4513 - [Rust] Implement BitAnd/BitOr for &Bitmap
  • ARROW-4517 - [JS] remove version number as it is not used
  • ARROW-4518 - [JS] add jsdelivr to package.json
  • ARROW-4528 - [C++] Update lint docker container to LLVM-7
  • ARROW-4529 - [C++] Add test for BitUtil::RoundDown
  • ARROW-4531 - [C++] Support slices for SumKernel
  • ARROW-4537 - [CI] Suppress shell warning on travis-ci
  • ARROW-4539 - [Java] Fix child vector count for lists. (#3625)
  • ARROW-4540 - [Rust] Basic JSON reader
  • ARROW-4543 - [C#] Update Flat Buffers code to latest version
  • ARROW-4546 - Update LICENSE.txt with parquet-cpp licenses
  • ARROW-4547 - [Python][Documentation] Update python/development.rst with instructions for CUDA-enabled builds
  • ARROW-4556 - [Rust] Preserve JSON field order when inferring schema
  • ARROW-4558 - [C++][Flight] Implement gRPC customizations without UB
  • ARROW-4560 - [R] array() needs to take single input, not ...
  • ARROW-4562 - [C++] Avoid copies when serializing Flight data
  • ARROW-4564 - [C++] IWYU docker image silently fails
  • ARROW-4565 - [R] Fix decimal record batches with no null values
  • ARROW-4568 - [C++] Add version macros to headers
  • ARROW-4572 - [C++] Remove memory zeroing from PrimitiveAllocatingUnaryKernel
  • ARROW-4583 - [Plasma] Fix some small bugs reported by code scan tool
  • ARROW-4586 - [Rust] Remove arrow/mod.rs as it is not needed
  • ARROW-4589 - [Rust] Projection push down query optimizer rule
  • ARROW-4590 - [Rust] Add explicit SIMD vectorization for comparison ops in "array_ops"
  • ARROW-4592 - [GLib] Stop configure immediately when GLib isn't available
  • ARROW-4593 - [Ruby][out_of_range] returns nil
  • ARROW-4594 - [Ruby] returns Arrow::Struct instead of Arrow::Array
  • ARROW-4595 - [Rust] Implement Table API (a.k.a DataFrame)
  • ARROW-4598 - [CI] Remove needless LLVM_DIR for macOS
  • ARROW-4599 - [C++] Add support for system GFlags
  • ARROW-4602 - [Rust][DataFusion] Integrate query optimizer with ExecutionContext
  • ARROW-4603 - [Rust] [DataFusion] Execution context should allow in-memory data sources to be registered
  • ARROW-4604 - [Rust] [DataFusion] Add benchmarks for SQL query execution
  • ARROW-4605 - [Rust] Move filter and limit code from DataFusion into compute module
  • ARROW-4609 - [C++] Use google benchmark from toolchain
  • ARROW-4610 - [Plasma] Avoid Crash in Plasma Java Client
  • ARROW-4611 - [C++] Rework CMake logic
  • ARROW-4612 - [Python] Use cython from PyPI for windows wheels build
  • ARROW-4613 - [C++] Set CMAKE_INSTALL_LIBDIR in gtest thirdparty build
  • ARROW-4614 - [C++/CI] Activate flight build in ci/docker_build_cpp.sh
  • ARROW-4615 - [C++] Add checked_pointer_cast
  • ARROW-4616 - [C++] Log message in BuildUtils as STATUS
  • ARROW-4618 - [Docker] Makefile to build dependent docker images
  • ARROW-4619 - [R] Fix the autobrew script
  • ARROW-4620 - [C#] Add unit tests for "Types" in arrow/csharp
  • ARROW-4623 - [R] update Rcpp version
  • ARROW-4628 - [Rust][DataFusion] Implement type coercion query optimizer rule
  • ARROW-4632 - [Ruby] Add BigDecimal#to_arrow
  • ARROW-4634 - [Rust][Parquet] Reorganize test_common
  • ARROW-4637 - [Python] Conditionally import pandas symbols if they are used. Do not require pandas as a test dependency
  • ARROW-4638 - [R] install instructions using brew
  • ARROW-4640 - [Python] Add docker-compose configuration to build and test the project without pandas installed
  • ARROW-4643 - [C++] Force compiler diagnostic colors
  • ARROW-4644 - [C++/Docker] Build Gandiva in the docker containers
  • ARROW-4645 - [C++/Packaging] Ship Gandiva with OSX and Windows wheels
  • ARROW-4646 - [C++/Packaging] Ship gandiva with the conda-forge packages
  • ARROW-4655 - [Packaging] Parallelize binary upload
  • ARROW-4662 - [Python] Add support of type_codes in UnionType
  • ARROW-4667 - [C++] Suppress unused function warnings with MinGW
  • ARROW-4670 - [Rust] array_ops::sum performance optimizations
  • ARROW-4671 - [C++] MakeBuilder doesn't support Type::DICTIONARY
  • ARROW-4673 - [C++] Implement Scalar::Equals and Datum::Equals
  • ARROW-4676 - [C++] Add support for debug build with MinGW
  • ARROW-4678 - [Rust] Minimize unstable feature usage
  • ARROW-4679 - [Rust] Implement in-memory data source for DataFusion
  • ARROW-4681 - [Rust][DataFusion] Partition aware data sources
  • ARROW-4686 - [Dev] Only accept 'y' or 'n' in merge_arrow_pr.py prompts
  • ARROW-4689 - [Go] Add support for wasm
  • ARROW-4690 - Building TensorFlow compatible wheels for Arrow
  • ARROW-4692 - [Flight] Explain sidecar in a bit more detail
  • ARROW-4693 - [CI] Build boost with multiprecision
  • ARROW-4697 - [C++] Add URI parsing facility
  • ARROW-4703 - [C++] Upgrade dependency versions
  • ARROW-4705 - [Rust] Improve error handling in csv reader
  • ARROW-4707 - [C++] moving BitsetStack to BitUtil::
  • ARROW-4718 - [C#] Add ArrowStreamReader/Writer ctor with bool leaveOpen
  • ARROW-4727 - [Rust] Add equality check for schemas
  • ARROW-4730 - [C++] Add docker-compose entry for testing Fedora build with system packages
  • ARROW-4731 - [C++] Add docker-compose entry for testing Ubuntu Xenial build with system packages
  • ARROW-4732 - [C++] Add docker-compose entry for testing Debian Testing build with system packages
  • ARROW-4733 - [C++] Add CI entry that builds without the conda-forge toolchain but with system packages
  • ARROW-4734 - [Go] Add option to write a header for CSV writer
  • ARROW-4735 - [Go] Optimize CSV writer CPU/Mem performances
  • ARROW-4739 - [Rust] LogicalPlan can now be passed to threads
  • ARROW-4740 - [Java] Upgrade to JUnit 5.
  • ARROW-4743 - [Java] Add javadoc missing in classes and methods in java…
  • ARROW-4745 - [C++][Documentation] Document notes from replicating Static_Crt_Build on windows
  • ARROW-4749 - [Rust] Return Result for RecordBatch::new()
  • ARROW-4751 - [C++] Add pkg-config to conda_env_cpp.yml now that it's available on Windows
  • ARROW-4754 - [Java] Randomize port and retry binding server when bind fails
  • ARROW-4756 - Update readme for triggering docker builds
  • ARROW-4758 - [C++][Flight] Fix intermittent build failure
  • ARROW-4769 - [Rust] Improve array limit fn where max_records >= len
  • ARROW-4772 - [C++] new ORC adapter interface for stripe and row iteration
  • ARROW-4776 - [C++] Add DictionaryBuilder constructor which takes a dictionary array
  • ARROW-4777 - [C++/Python] manylinux1: Update lz4 to 1.8.3
  • ARROW-4778 - [C++/Python] manylinux1: Update Thrift to 0.12.0
  • ARROW-4782 - [C++] Prototype array and scalar expression types to help with building an deferred compute graph
  • ARROW-4786 - [C++/Python] Support better parallelisation in manylinux1 base build
  • ARROW-4789 - [C++] Deprecate and and later remove arrow::io::ReadableFileInterface
  • ARROW-4790 - [Python/Packaging] Update manylinux docker image in crossbow task
  • ARROW-4791 - [Rust] Remove unused dependencies
  • ARROW-4794 - [Python] Make pandas an optional test dependency
  • ARROW-4797 - [Plasma] Allow client to check store capacity and avoid server crash
  • ARROW-4801 - [GLib] Suppress Meson warnings
  • ARROW-4808 - [Java][Vector] More util methods to set decimal vector.
  • ARROW-4812 - [Rust] [DataFusion] Table.scan() should return one iterator per partition
  • ARROW-4817 - [Rust] [DataFusion] Small re-org of modules
  • ARROW-4818 - [Rust] [DataFusion] Parquet data source does not support null values
  • ARROW-4826 - [Go] export Flush method for CSV writer
  • ARROW-4831 - [C++] CMAKE_AR is not passed to ZSTD thirdparty dependency
  • ARROW-4833 - [Release] Document how to update the brew formula in the release management guide
  • ARROW-4834 - [R] Feature flag when building parquet
  • ARROW-4835 - [GLib] Add boolean operations
  • ARROW-4837 - [C++] Support c++filt on a custom path in the run-test.sh script
  • ARROW-4839 - [C#] Add NuGet package metadata and instructions.
  • ARROW-4843 - [Rust] [DataFusion] Parquet data source should support DATE
  • ARROW-4846 - [Java] Upgrade to jackson 2.9.8
  • ARROW-4849 - [C++] Add docker-compose entry for testing Ubuntu Bionic build with system packages
  • ARROW-4854 - [Rust] Use zero-copy slice for limit kernel
  • ARROW-4855 - [Packaging] Generate default package version based on cpp tags in crossbow.py
  • ARROW-4858 - [Flight/Python] enable FlightDataStream to be implemented in Python
  • ARROW-4859 - [GLib] Add garrow_numeric_array_mean()
  • ARROW-4862 - [C++] Fix gcc warnings in CHECKIN
  • ARROW-4862 - [GLib] Add GArrowCastOptions::allow-invalid-utf8 property
  • ARROW-4865 - [Rust] Support list casts
  • ARROW-4873 - [C++] Clarify documentation about how to use external ARROW_PACKAGE_PREFIX while also using CONDA dependency resolution
  • ARROW-4878 - [C++] Append \Library to CONDA_PREFIX when using ARROW_DEPENDENCY_SOURCE=CONDA
  • ARROW-4882 - [GLib] Add sum functions
  • ARROW-4887 - [GLib] Add garrow_array_count()
  • ARROW-4889 - [C++] Add STATUS messages for Protobuf in CMake
  • ARROW-4891 - [C++] Add zlib headers to include directories
  • ARROW-4892 - [Rust][DataFusion] Move SQL parser and planner into SQL module
  • ARROW-4893 - [C++] conda packages should use inside of conda-build
  • ARROW-4894 - [Rust][DataFusion] Remove all uses of panic! from aggregate.rs
  • ARROW-4895 - [Rust][DataFusion] Move error.rs to root of crate
  • ARROW-4896 - [Rust][DataFusion] Remove all uses of panic! from DataFusion tests
  • ARROW-4897 - [Rust][DataFusion] Improve rustdocs
  • ARROW-4898 - [C++] Old versions of FindProtobuf.cmake use ALL-CAPS for variables
  • ARROW-4899 - [Rust][DataFusion] Remove panic from expression.rs
  • ARROW-4901 - [Go] add AppVeyor CI
  • ARROW-4905 - [C++][Plasma] Remove dlmalloc symbols from client library
  • ARROW-4907 - [CI] Add docker container to inspect docker context
  • ARROW-4908 - [Rust][DataFusion] Add support for date/time parquet types encoded as INT32/INT64
  • ARROW-4909 - [CI] Use hadolint to lint Dockerfiles
  • ARROW-4910 - [Rust][DataFusion] Remove all uses of unimplemented!
  • ARROW-4915 - [GLib][C++] Add arrow::NullBuilder support for GLib
  • ARROW-4922 - [Packaging] Use system libraries for .deb and .rpm
  • ARROW-4924 - [Ruby] Add Decimal128#to_s(scale=nil)
  • ARROW-4925 - [Rust] [DataFusion] Remove duplicate implementations of collect_expr
  • ARROW-4926 - [Rust][DataFusion] Update README for 0.13.0
  • ARROW-4929 - [GLib] Add garrow_array_count_values()
  • ARROW-4932 - [GLib] Use G_DECLARE_DERIVABLE_TYPE macro
  • ARROW-4933 - [R] Autodetect Parquet support using pkg-config
  • ARROW-4937 - [R] Clean pkg-config related logic
  • ARROW-4939 - [Python] Add wrapper for "sum" kernel
  • ARROW-4940 - [Rust] Enable warnings for missing docs, add docs in datafusion
  • ARROW-4944 - [C++] Raise minimal required thrift-cpp to 0.11 in conda environment
  • ARROW-4946 - [C++] Support detection of flatbuffers without FlatbuffersConfig.cmake
  • ARROW-4947 - [Flight/C++] Remove redundant schema parameter to Flight client DoGet
  • ARROW-4951 - [C++] Turn off cpp benchmarks in cpp docker images
  • ARROW-4955 - [GLib] Add garrow_file_is_closed()
  • ARROW-4964 - [Ruby] Add closed check if available on auto close
  • ARROW-4969 - [C++] Set RPATH in correct order for test executables on OSX
  • ARROW-4977 - [Ruby] Add support for building on Windows
  • ARROW-4978 - [Ruby] Fix wrong internal variable name for table data
  • ARROW-4979 - [GLib] Add missing lock to garrow::GIOInputStream
  • ARROW-4980 - [GLib] Use GInputStream as the parent of GArrowInputStream
  • ARROW-4981 - [Ruby] Add support for CSV data encoding conversion
  • ARROW-4983 - [Plasma] Unmap memory upon destruction of the PlasmaClient
  • ARROW-4994 - [Website] Update details for ptgoetz
  • ARROW-4995 - [R] Support for winbuilder for CRAN checks
  • ARROW-4996 - [Plasma] Enable uninstalling of signal handler and fix log_dir
  • ARROW-5003 - [R] remove dependency on withr
  • ARROW-5006 - [R] parquet.cpp does not include enough Rcpp
  • ARROW-5011 - [Release] Add support in source release script for custom git hash
  • ARROW-5013 - [Rust][DataFusion] Refactor runtime expression support
  • ARROW-5014 - [Java] Fix typos in Flight module
  • ARROW-5018 - [Release] Include JavaScript implementation
  • ARROW-5032 - [C++] Install headers in vendored/datetime directory
  • ARROW-5041 - [C++] add GTest_SOURCE=BUNDLED to verify-release-candidate.bat
  • ARROW-5075 - [Release] Add 0.13.0 release note
  • ARROW-5084 - [Website] Blog post / release announcement for 0.13.0
  • PARQUET-1477 - [C++] sync thrift to final crypto spec
  • PARQUET-1508 - [C++] Read ByteArray data directly into arrow::BinaryBuilder and BinaryDictionaryBuilder. Refactor encoders/decoders to use cleaner virtual interfaces
  • PARQUET-1519 - [C++] Hide TypedColumnReader implementation behind virtual interfaces, remove use of "extern template class"
  • PARQUET-1521 - [C++] Use pure virtual interfaces for parquet::TypedColumnWriter, remove use of 'extern template class'
  • PARQUET-1525 - [C++] remove dependency on getopt in parquet tools

Apache Arrow 0.12.1 (2019-02-25)

Bug Fixes

  • ARROW-3564 - [C++] Fix dictionary encoding logic for Parquet 2.0
  • ARROW-4255 - [C++] Eagerly initialize name_to_index_ to avoid race
  • ARROW-4267 - [C++/Parquet] Handle duplicate and struct columns in RowGroup reads
  • ARROW-4323 - [Packaging] Fix failing OSX clang conda forge builds
  • ARROW-4367 - [C++] StringDictionaryBuilder segfaults on Finish with only null entries
  • ARROW-4374 - [C++] DictionaryBuilder does not correctly report length and null_count
  • ARROW-4492 - [Python] Failure reading Parquet column as pandas Categorical in 0.12
  • ARROW-4501 - Fix out-of-bounds read in DoubleCrcHash
  • ARROW-4582 - [Python/C++] Acquire the GIL on Py_INCREF
  • ARROW-4629 - [Python] Pandas arrow conversion slowed down by imports
  • ARROW-4636 - [Python/Packaging] Crossbow builds for conda-osx fail on upload with Ruby linkage errors
  • ARROW-4647 - [Packaging] dev/release/00-prepare.sh fails for minor version changes

New Features and Improvements

  • ARROW-4291 - [Dev] Support selecting features in release verification scripts
  • ARROW-4298 - [Java] Add javax.annotation-api dependency for JDK >= 9
  • ARROW-4373 - [Packaging] Travis fails to deploy conda packages on OSX

Apache Arrow 0.12.0 (2019-01-20)

New Features and Improvements

  • ARROW-45 - [Python] Add unnest/flatten function for List types
  • ARROW-536 - [C++] Provide non-SSE4 versions of functions that use CPU intrinsics for older processors
  • ARROW-554 - [C++] Add functions to unify dictionary types and arrays
  • ARROW-766 - [C++] Introduce zero-copy "StringPiece" type
  • ARROW-854 - [Format] Add tentative SparseTensor format
  • ARROW-912 - [Python] Recommend that Python developers use -DCMAKE_INSTALL_LIBDIR=lib when building Arrow C++ libraries
  • ARROW-1019 - [C++] Implement compressed streams
  • ARROW-1055 - [C++] GPU support library development
  • ARROW-1262 - [Packaging] Packaging automation in arrow-dist
  • ARROW-1423 - [C++] Create non-owned CudaContext from context handle provided by thirdparty user
  • ARROW-1492 - [C++] Type casting function kernel suite
  • ARROW-1688 - [Java] Fail build on checkstyle warnings
  • ARROW-1696 - [C++] Add (de)compression benchmarks
  • ARROW-1822 - [C++] Add SSE4.2-accelerated hash kernels and use if host CPU supports
  • ARROW-1993 - [Python] Add function for determining implied Arrow schema from pandas.DataFrame
  • ARROW-1994 - [Python] Test against Pandas master
  • ARROW-2183 - [C++] Add helper CMake function for globbing the right header files
  • ARROW-2211 - [C++] Use simpler hash functions for integers
  • ARROW-2216 - [CI] CI descriptions and envars are misleading
  • ARROW-2337 - Use Boost shared libraries in Windows release verification script. Parquet fixes
  • ARROW-2374 - [Rust] Add support for array of List<T>
  • ARROW-2475 - [Format] Confusing array length description
  • ARROW-2476 - [Python/Question] Maximum length of an Array created from ndarray
  • ARROW-2483 - [Rust] use bit-packing for boolean vectors
  • ARROW-2504 - [Website] Add ApacheCon NA link
  • ARROW-2535 - [Python] Provide pre-commit hooks that check flake8
  • ARROW-2560 - [Rust] The Rust README should include Rust-specific information on contributing
  • ARROW-2624 - [Python] Random schema generator for Arrow conversion and Parquet testing
  • ARROW-2637 - [C++/Python] Build support and instructions for development on Alpine Linux
  • ARROW-2648 - [Packaging] Follow up packaging tasks
  • ARROW-2653 - [C++] Refactor hash table support
  • ARROW-2670 - [C++/Python] Add Ubuntu 18.04 / gcc7 as a nightly build
  • ARROW-2673 - [Python] Add documentation + docstring for ARROW-2661
  • ARROW-2684 - [Python] Various documentation improvements
  • ARROW-2712 - [C#] Initial C# .NET library
  • ARROW-2720 - [C++] Defer setting of -std=c++11 compiler option to CMAKE_CXX_STANDARD, use CMake option for -fPIC
  • ARROW-2759 - [Plasma] Export plasma notification socket
  • ARROW-2803 - [C++] Put hashing function into src/arrow/util
  • ARROW-2807 - [Python][Parquet] Add memory_map= option to parquet.read_table, read_pandas, read_schema
  • ARROW-2808 - [Python] Add MemoryPool tests
  • ARROW-2919 - [C++/Python] Improve HdfsFile error messages, fix Python unit test suite
  • ARROW-2968 - [R] Multi-threaded conversion from Arrow table to R data.frame
  • ARROW-2995 - [CI] Build Python libraries in same run when running C++ unit tests so project does not need to be rebuilt again right away
  • ARROW-3020 - [C++/Python] Allow empty arrow::Table objects to be written as empty Parquet row groups
  • ARROW-3038 - [Go] implement String array
  • ARROW-3063 - [Go] remove list of TODOs from go/README
  • ARROW-3070 - [Packaging] Use Bintray
  • ARROW-3108 - [C++] arrow::PrettyPrint for Table instances
  • ARROW-3126 - [Python] Make Buffered* IO classes available to Python, incorporate into input_stream, output_stream factory functions
  • ARROW-3131 - [Go] add Go1.11 to the build matrix
  • ARROW-3161 - [Packaging] Ensure to run pyarrow unit tests in conda and wheel builds
  • ARROW-3169 - [C++] Break up array-test into multiple compilation units
  • ARROW-3184 - [C++] Enable modular builds and installs with ARROW_OPTIONAL_INSTALL option. Remove ARROW_GANDIVA_BUILD_TESTS
  • ARROW-3194 - [JAVA] Use split length in splitAndTransfer to set value count
  • ARROW-3199 - [Plasma] File descriptor send and receive retries
  • ARROW-3209 - [C++] Rename libarrow_gpu to libarrow_cuda
  • ARROW-3230 - [Python] Missing comparisons on ChunkedArray, Table
  • ARROW-3233 - [Python] Add prose documentation for CUDA support
  • ARROW-3248 - [C++] Add "arrow" prefix to Arrow core unit tests, use PREFIX instead of file name for csv, io, ipc tests. Modular target cleanup
  • ARROW-3254 - [C++] Add option to ADD_ARROW_TEST to compose a test executable from multiple .cc files containing unit tests
  • ARROW-3260 - [CI][skip appveyor]
  • ARROW-3272 - [Java][Docs] Add documentation about Java code style
  • ARROW-3273 - [Java] Fix checkstyle for Javadocs
  • ARROW-3278 - [Python] Retrieve StructType's and StructArray's field by name
  • ARROW-3291 - [C++] Add string_view-based constructor for BufferReader
  • ARROW-3293 - [C++] Test Flight RPC in Travis CI
  • ARROW-3296 - [Python] Add Flight support to manylinux1 wheels
  • ARROW-3303 - [C++] API for creating arrays from simple JSON string
  • ARROW-3306 - [R] Objects and support functions different kinds of arrow::Buffer
  • ARROW-3307 - [R] Convert chunked arrow::Column to R vector
  • ARROW-3310 - [R] Create wrapper classes for various Arrow IO interfaces
  • ARROW-3312 - [R] Use same .clang-format file for both R binding C++ code and main C++ codebase
  • ARROW-3315 - [R] Support for multi-threaded conversions from RecordBatch, Table to R data.frame
  • ARROW-3318 - [C++] Push down read-all-batches operation on RecordBatchReader into C++
  • ARROW-3323 - [Java] Fix checkstyle naming
  • ARROW-3331 - [Gandiva][C++] Add re2 to toolchain
  • ARROW-3340 - [R] support for dates and time classes
  • ARROW-3347 - [Rust] Implement PrimitiveArrayBuilder
  • ARROW-3353 - [Packaging] Build python 3.7 wheels
  • ARROW-3355 - [R] Support for factors
  • ARROW-3358 - [Gandiva][C++] Deprecate Gandiva Status.
  • ARROW-3362 - [R] Guard against null buffers
  • ARROW-3366 - [R] Dockerfile for docker-compose setup
  • ARROW-3368 - [Integration/CI/Python] Add dask integration test to docker-compose setup
  • ARROW-3380 - [Python] Support reading gzipped CSV files
  • ARROW-3381 - [C++] Add bz2 codec
  • ARROW-3383 - [Gandiva][Java] Fix java build
  • ARROW-3384 - [Gandiva] Sync remaining commits from gandiva repo
  • ARROW-3385 - [Gandiva][C++][Java] Crossbow support for deploying gandiva jars
  • ARROW-3387 - [C++] Implement Binary to String cast
  • ARROW-3398 - [Rust] Update existing Builder to use MutableBuffer internally
  • ARROW-3402 - [Gandiva][C++] Utilize common bitmap operation implementations in precompiled IR routines
  • ARROW-3407 - [C++] Add UTF8 handling to CSV conversion
  • ARROW-3409 - [C++] Streaming compression and decompression interfaces
  • ARROW-3421 - [C++] Add include-what-you-use setup to primary docker-compose.yml
  • ARROW-3427 - [C++] Add Windows support, Unix static libs for double-conversion package in conda-forge
  • ARROW-3429 - [Packaging] Add binary upload script
  • ARROW-3430 - [Packaging] Add workaround to verify 0.11.0
  • ARROW-3431 - [GLib] Include Gemfile to archive
  • ARROW-3432 - [Packaging] Expand variables in commit message
  • ARROW-3433 - [C++] Validate re2 with Windows toolchain, EP
  • ARROW-3439 - [R] R language bindings for Feather format
  • ARROW-3440 - [Gandiva] fix readme for builds
  • ARROW-3441 - [Gandiva] Use common unit test creation facilities, do not produce multiple executables for the same unit tests
  • ARROW-3442 - [C++] Allow dynamic linking of (most) unit tests
  • ARROW-3450 - [R] Wrap MemoryMappedFile class
  • ARROW-3451 - [C++/Python] pyarrow and numba CUDA interop
  • ARROW-3455 - [Gandiva][C++] Support pkg-config for Gandiva
  • ARROW-3456 - [CI] Reuse docker images and optimize docker-compose containers
  • ARROW-3460 - [Packaging] Add a script to rebase master on local release branch
  • ARROW-3461 - [Packaging] Add a script to upload RC artifacts as the official release
  • ARROW-3462 - [Packaging] Update CHANGELOG for 0.11.0
  • ARROW-3463 - [Website] Update for 0.11.0
  • ARROW-3464 - [Packaging] Build shared libraries for gandiva fat JAR via crossbow
  • ARROW-3465 - [Documentation] Fix gen_apidocs' docker image
  • ARROW-3469 - [Gandiva] Add gandiva travis OSX entry
  • ARROW-3472 - [Gandiva] remove gandiva_helpers library
  • ARROW-3473 - [Format] Clarify that 64-bit lengths and null counts are permitted, but not recommended
  • ARROW-3474 - [GLib] Extend gparquet API with get_schema and read_column
  • ARROW-3479 - [R] Support to write record_batch as stream
  • ARROW-3482 - [C++] Build with JEMALLOC by default
  • ARROW-3487 - [Gandiva] simplify fns that return errors
  • ARROW-3488 - [Packaging] Separate crossbow task definition files for packaging and tests
  • ARROW-3489 - [Gandiva][C++] Added support for IN expressions
  • ARROW-3490 - [R] streaming of arrow objects to streams
  • ARROW-3492 - [C++] Build jemalloc in parallel
  • ARROW-3493 - [Java] Make sure bound checks are off
  • ARROW-3499 - [R] Expose arrow::ipc::Message type
  • ARROW-3501 - [Gandiva] Enable building with gcc 4.8.x on Ubuntu Trusty, similar distros
  • ARROW-3504 - [Plasma] Add support for Plasma Client to put/get raw bytes without pyarrow serialization.
  • ARROW-3505 - [R] Read record batch and table
  • ARROW-3506 - [Packaging] Nightly tests for docker-compose images
  • ARROW-3508 - [C++] Build against double-conversion from conda-forge
  • ARROW-3515 - [C++] Introduce NumericTensor class
  • ARROW-3518 - Detect HOMEBREW_PREFIX automatically
  • ARROW-3519 - [Gandiva] Arena for varlen output fns
  • ARROW-3521 - [GLib] Run Python using find_program in meson.build
  • ARROW-3529 - [Ruby] Import Red Parquet
  • ARROW-3530 - [Java/Python] Add conversion for pyarrow.Schema from org.apache…pojo.Schema
  • ARROW-3533 - [Python/Documentation] Use sphinx_rtd_theme instead of Bootstrap
  • ARROW-3536 - [C++] Add UTF8 validation functions
  • ARROW-3537 - [Rust] Implement Tensor Type
  • ARROW-3539 - [CI/Packaging] Update scripts to build against vendored jemalloc
  • ARROW-3540 - [Rust] Incorporate BooleanArray into PrimitiveArray
  • ARROW-3542 - [C++] Use unsafe appends when building array from CSV
  • ARROW-3545 - [C++/Python] Use "field" terminology with StructType, specify behavior with duplicate field names
  • ARROW-3547 - [R] Protect against Null crash when reading from RecordBatch
  • ARROW-3548 - [Plasma] Add CreateAndSeal object store method for faster puts for small objects.
  • ARROW-3551 - Update MapD to OmniSci on Powered By page
  • ARROW-3553 - [R] Error when losing data on int64, uint64 conversions to double
  • ARROW-3555 - [Plasma] Unify plasma client get function using metadata.
  • ARROW-3556 - [CI] Disable optimizations on Windows
  • ARROW-3557 - [Python] Set Cython language level
  • ARROW-3558 - [Plasma] Remove fatal error when calling get on unsealed object.
  • ARROW-3559 - [Plasma] Static linking for plasma_store_server.
  • ARROW-3562 - [R] Disallow creation of objects with shared_ptr<T>(nullptr), use bits64::integer64
  • ARROW-3563 - [C++] Declare public link dependencies so arrow_static, plasma_static automatically pull in transitive dependencies
  • ARROW-3566 - [Format] Clarify the type of dictonary encoded field
  • ARROW-3567 - [Gandiva][GLib] Add GLib bindings of Gandiva
  • ARROW-3568 - [Packaging] Run pyarrow unittests for windows wheels
  • ARROW-3569 - [Packaging] Run pyarrow unittests when building conda package
  • ARROW-3574 - [Plasma] Use static libraries in plasma library.
  • ARROW-3575 - [Python] New documentation page for CSV reader
  • ARROW-3576 - [Python] Implemented compressed streams
  • ARROW-3577 - [Go] implement Chunked array
  • ARROW-3581 - [Gandiva][C++] Use protobuf as shared library when -DARROW_PROTOBUF_USE_SHARED=ON
  • ARROW-3582 - [CI] fix incantation for C++/Java detection tool
  • ARROW-3583 - [Python/Java] Create RecordBatch from VectorSchemaRoot
  • ARROW-3584 - [Go] Implement Table, Schema and Column
  • ARROW-3587 - [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)
  • ARROW-3588 - [Java] Fix checkstyle for header license
  • ARROW-3589 - [Gandiva] Make gandiva JNI wrappers optional
  • ARROW-3591 - [R] Support for collecting decimal types
  • ARROW-3592 - [Python] Allow getting view of a binary scalar
  • ARROW-3597 - [Gandiva] gandiva should integrate with ADD_ARROW_TEST for tests
  • ARROW-3600 - [CI/Packaging] Add Ubuntu 18.10
  • ARROW-3601 - [Rust] Add instructions for publishing to crates.io
  • ARROW-3602 - [Gandiva][Python] Initial Gandiva Cython bindings
  • ARROW-3603 - [Gandiva][C++] Support building with ARROW_BOOST_VENDORED=ON
  • ARROW-3605 - [Plasma] Remove dependence of plasma/events.h on ae.h.
  • ARROW-3607 - [Java] delete() method via JNI for plasma
  • ARROW-3608 - [R] Support for time32 and time64 array types
  • ARROW-3609 - [Gandiva] Convert Gandiva benchmark tests as gbenchmark t…
  • ARROW-3610 - [C++] Add interface to turn stl_allocator into arrow::MemoryPool
  • ARROW-3611 - [Python] Give better error message when type_id has wrong type.
  • ARROW-3612 - [Go] implement RecordBatch and RecordBatchReader
  • ARROW-3615 - [R] Support for NaN
  • ARROW-3616 - [Java] Fix remaining checkstyle issues
  • ARROW-3618 - [Packaging/Documentation] Add -c conda-forge option to avoid PackagesNotFoundError
  • ARROW-3620 - [Python] Document pa.cpu_count() in Sphinx API docs
  • ARROW-3621 - [Go] implement Table, Record, RecordReader and TableReader
  • ARROW-3622 - [Go] implement Schema.Equal
  • ARROW-3623 - [Go] implement Field.Equal
  • ARROW-3624 - [Python/C++] Support for zero-sized device buffers and device-to-device copying
  • ARROW-3625 - [Go] add examples for Table, Record and {Table,Record}Reader
  • ARROW-3626 - [Go] implement CSV reader
  • ARROW-3627 - [Go] add RecordBatchBuilder
  • ARROW-3629 - [Python] Add write_to_dataset to Python Sphinx API listing
  • ARROW-3630 - [Plasma][GLib] Add GLib bindings of Plasma
  • ARROW-3632 - [Packaging] Update deb names in dev/tasks/tasks.yml in release process
  • ARROW-3633 - [Packaging] Update deb names in dev/tasks/tasks.yml for 0.12.0
  • ARROW-3636 - [C++/Python] Update arrow/python/pyarrow_api.h
  • ARROW-3638 - [C++][Python] Move reading from Feather as Table feature to C++ from Python
  • ARROW-3639 - [Packaging] Run gandiva nightly packaging tasks
  • ARROW-3640 - [Go] implement Tensors
  • ARROW-3641 - [Python] Remove unneeded public keyword from pyarrow public C APIs
  • ARROW-3642 - [C++] Add arrowConfig.cmake generation
  • ARROW-3644 - [Rust] Implement ListArrayBuilder
  • ARROW-3645 - [Python] Document compression support in Sphinx
  • ARROW-3646 - [Python] High-level IO API
  • ARROW-3647 - [R] Fix R bit64 crash and formatting
  • ARROW-3648 - [Plasma][Java] Add API to get metadata and data at the same time
  • ARROW-3649 - [Rust] Refactor MutableBuffer's resize
  • ARROW-3656 - [C++] Allow whitespace in numeric CSV fields
  • ARROW-3657 - [R] there is no package called bit64
  • ARROW-3659 - [CI] Fix Travis matrix entry 2 documentation to use gcc
  • ARROW-3660 - [C++] Don't unnecessarily lock MemoryMappedFile for resizing in readonly files
  • ARROW-3661 - [Gandiva][GLib] Use "_" as word separator in constant name
  • ARROW-3662 - [C++] Add a const overload to MemoryMappedFile::GetSize
  • ARROW-3664 - [Rust] Add benchmark for PrimitiveArrayBuilder
  • ARROW-3665 - [Rust] Implement StructArrayBuilder
  • ARROW-3666 - [C++] Improve C++ parser performance
  • ARROW-3672 - & ARROW-3673: [Go] add support for time32 and time64 array
  • ARROW-3673 - [Go] implement Time64 array
  • ARROW-3674 - [Go] Implement Date32 and Date64 array types
  • ARROW-3675 - [Go] implement Date64 array
  • ARROW-3677 - [Go] Add fixed-length binary builder and array
  • ARROW-3681 - [Go] Add benchmarks for CSV reader
  • ARROW-3682 - [Go] unexport encoding/csv.Reader from CSV reader
  • ARROW-3683 - [Go] add functional-option style to configure the CSV reader
  • ARROW-3684 - [Go] Add chunking ability to CSV reader
  • ARROW-3692 - [Gandiva][Ruby] Add Ruby bindings of Gandiva
  • ARROW-3693 - [R] Invalid buffer for empty characters with null data
  • ARROW-3694 - [Java] Avoid superfluous string creation when logging level is disabled
  • ARROW-3695 - [Gandiva] Use add_arrow_lib()
  • ARROW-3696 - [C++] Add feather::TableWriter::Write(table)
  • ARROW-3697 - [Ruby]
  • ARROW-3701 - [Gandiva] add op for decimal 128
  • ARROW-3708 - [Packaging] Support CMake files in Linux packages
  • ARROW-3713 - [Rust] Implement BinaryArrayBuilder
  • ARROW-3718 - [Gandiva] Remove spurious gtest include
  • ARROW-3719 - [GLib] Support read/write table to/from Feather
  • ARROW-3720 - [GLib] Use "indices" instead of "indexes"
  • ARROW-3721 - [Gandiva][Python] Support all Gandiva literals
  • ARROW-3722 - [C++] Allow specifying types of CSV columns
  • ARROW-3723 - [Plasma][Ruby] Add Ruby bindings of Plasma
  • ARROW-3724 - [GLib] Update .gitignore
  • ARROW-3725 - [GLib] Add field readers to GArrowStructDataType
  • ARROW-3726 - [Rust] Add CSV reader with example
  • ARROW-3727 - [Python] Document use of foreign_buffer()
  • ARROW-3731 - MVP to read parquet in R library
  • ARROW-3733 - [GLib] Add to_string() to GArrowTable and GArrowColumn
  • ARROW-3736 - [CI/Docker] Ninja test in docker-compose run cpp hangs
  • ARROW-3738 - [C++] Parse ISO8601-like timestamps in CSV columns
  • ARROW-3741 - [R] Add support for arrow::compute::Cast to convert Arrow arrays from one type to anothe
  • ARROW-3743 - [Ruby] Add support for saving/loading Feather
  • ARROW-3744 - [Ruby] Use garrow_table_to_string() in Arrow::Table#to_s
  • ARROW-3746 - [Gandiva][Python] Print list of functions registered with gandiva
  • ARROW-3747 - [C++] Switch order of struct members in Decimal128
  • ARROW-3748 - [GLib] Add GArrowCSVReader
  • ARROW-3749 - [GLib] Fix typos
  • ARROW-3751 - [Gandiva][Python] Add more cython bindings for gandiva
  • ARROW-3752 - [C++] Remove unused status::ArrowError
  • ARROW-3753 - [Gandiva] Remove debug print
  • ARROW-3755 - [GLib] Add GArrowCompressedInputStream and GArrowCompressedOutputStream
  • ARROW-3760 - [R] Support Arrow CSV reader
  • ARROW-3773 - [C++] Remove redundant AssertArraysEqual function from before monorepo merge
  • ARROW-3778 - [C++] Compile parts of test-util.h that we can once, link with unit tests
  • ARROW-3781 - [C++] Implement BufferedOutputStream::SetBufferSize. Allocate buffer from MemoryPool
  • ARROW-3782 - [C++] Implement BufferedInputStream to pair with BufferedOutputStream
  • ARROW-3784 - [R] Array with type fails with x is not a vector
  • ARROW-3785 - [C++] Enable using double-conversion from $ARROW_BUILD_TOOLCHAIN
  • ARROW-3787 - [Rust] Implement From<ListArray> for BinaryArray
  • ARROW-3788 - [Ruby] Add support for CSV parser written in C++
  • ARROW-3795 - [R] Support for retrieving NAs from INT64 arrays
  • ARROW-3796 - [Rust] Add Example for PrimitiveArrayBuilder
  • ARROW-3798 - [GLib] Add support for column type CSV read option
  • ARROW-3800 - [C++] Vendor a string_view backport
  • ARROW-3803 - [C++/Python] Merge C++ builds and tests, run Python tests in separate CI entries
  • ARROW-3807 - [R] Missing Field API
  • ARROW-3819 - [Packaging] Update conda variant files to conform with feedstock after compiler migration
  • ARROW-3821 - [Format/Documentation] : Fix typos and grammar issues in Flight.proto comments
  • ARROW-3823 - [R] + buffer.complex
  • ARROW-3825 - [Python] Document how to run the Python unit tests in python/README.md
  • ARROW-3826 - [C++] Determine if using ccache caching in Travis CI actually improves build times
  • ARROW-3830 - [GLib] Add GArrowCodec
  • ARROW-3834 - [Doc] Merge C++ and Python documentation
  • ARROW-3836 - [C++] Add PREFIX, EXTRA_LINK_LIBS, DEPENDENCIES to ADD_ARROW_BENCHMARK
  • ARROW-3839 - [Rust] Add ability to infer schema in CSV reader
  • ARROW-3841 - [C++] Suppress catching polymorphic type by value warning
  • ARROW-3842 - [R] RecordBatchStreamWriter api
  • ARROW-3844 - [C++] Remove ARROW_USE_SSE and ARROW_SSE3
  • ARROW-3845 - [Gandiva][GLib] Add GGandivaNode
  • ARROW-3847 - [GLib] Remove unnecessary ''
  • ARROW-3849 - [C++] Leverage Armv8 crc32 extension instructions to accelerate the hash computation for Arm64
  • ARROW-3851 - [C++] Run clang-format in parallel
  • ARROW-3852 - [C++] Suppress used uninitialized warning
  • ARROW-3853 - [C++] Implement string to timestamp cast
  • ARROW-3854 - [GLib] Deprecate garrow_gio_{input,output}streamget_raw()
  • ARROW-3855 - [Rust] Schema/Field/Datatype now have derived serde traits
  • ARROW-3856 - [Ruby] Support compressed CSV save/load
  • ARROW-3858 - [GLib] Use {class_name}getinstance_private
  • ARROW-3859 - [Arrow][Java] Fixed backward incompatible change. (#3018)
  • ARROW-3860 - [C++] Add ARROW_GANDIVA_STATIC_LIBSTDCPP option to restore hard-coded behavior prior to ARROW-3437
  • ARROW-3862 - [C++] Improve third-party dependencies download script
  • ARROW-3863 - [GLib] Use travis_retry with brew bundle command
  • ARROW-3864 - [GLib] Add support for allow-float-truncate cast option
  • ARROW-3865 - [Packaging] Add double-conversion dependency to conda forge recipes and the windows wheel build
  • ARROW-3867 - [Documentation] Uploading binary realase artifacts to Bintray
  • ARROW-3868 - [Rust] Switch to nightly Rust for required build, stable is now allowed to fail
  • ARROW-3870 - [C++] Add Peek to InputStream abstract interface
  • ARROW-3871 - [R] Replace usages of C++ GetValuesSafely with new methods on ArrayData
  • ARROW-3878 - [Rust] Improve primitive types
  • ARROW-3880 - [Rust] Implement simple math operations for numeric arrays
  • ARROW-3881 - [Rust] PrimitiveArray<T> should support comparison operators
  • ARROW-3883 - [Rust] Update README
  • ARROW-3884 - [Python] Add LLVM6 to manylinux1 base image
  • ARROW-3885 - [Rust] Release prepare step should increment Rust version
  • ARROW-3886 - [C++] Add support for decompressed buffer size check for Snappy
  • ARROW-3891 - [Java] Remove Long.bitCount with simple bitmap operations
  • ARROW-3893 - [C++] Improve adaptive int builder performance
  • ARROW-3895 - [Rust] csv::Reader now returns Result<Option> instead of Option<Result>
  • ARROW-3899 - [Python] Table.to_pandas converts Arrow date32[day] to pandas datetime64[ns]
  • ARROW-3900 - [GLib] Add garrow_mutable_buffer_set_data()
  • ARROW-3905 - [Ruby]
  • ARROW-3906 - [C++] Break out builder.cc into multiple compilation units
  • ARROW-3908 - [Rust] Update rust dockerfile to use nightly toolchain
  • ARROW-3910 - [Python] Set date_as_objects=True as default in to_pandas methods
  • ARROW-3911 - [Python] Deduplicate datetime.date objects in Table.to_pandas internals
  • ARROW-3912 - [Plasma][GLib] Add support for creating and referring objects
  • ARROW-3913 - [Gandiva][GLib] Add GGandivaLiteralNode
  • ARROW-3914 - [C++/Python/Packaging] Docker-compose setup for Alpine linux
  • ARROW-3916 - [Python] Add support for filesystem kwarg in ParquetWriter
  • ARROW-3921 - [GLib][CI] Log Homebrew output
  • ARROW-3922 - [C++] Micro-optimizations to BitUtil::GetBit
  • ARROW-3924 - [Packaging][Plasma] Add support for Plasma deb/rpm packages
  • ARROW-3925 - [Python] Add autoconf to conda install instructions
  • ARROW-3928 - [Python] Deduplicate Python objects when converting binary, string, date, time types to object arrays
  • ARROW-3929 - [Go] improve CSV reader memory usage
  • ARROW-3930 - [C++] Avoid using Mersenne Twister for random test data
  • ARROW-3932 - [Python] Include Benchmarks.md in Sphinx docs
  • ARROW-3934 - [Gandiva] Only add precompiled tests if ARROW_GANDIVA_BUILD_TESTS
  • ARROW-3938 - [Packaging] Stop to refer java/pom.xml to get version information
  • ARROW-3939 - [Rust] Remove macro definition for ListArrayBuilder
  • ARROW-3945 - [Website] Update website for Gandiva donation
  • ARROW-3946 - [GLib] Add support for union
  • ARROW-3948 - [GLib][CI] Set timeout to Homebrew
  • ARROW-3950 - [Plasma] Make loading the TensorFlow op optional
  • ARROW-3952 - [Rust] Upgrade to Rust 2018 Edition
  • ARROW-3958 - [Plasma] Reduce number of IPCs
  • ARROW-3959 - [Rust] Add date/time data types
  • ARROW-3960 - [Rust] remove extern crate for Rust 2018
  • ARROW-3963 - [Packaging/Docker] Nightly test for building sphinx documentations
  • ARROW-3964 - [Go] Refactor examples of csv reader
  • ARROW-3967 - [Gandiva][C++] Make node.h public
  • ARROW-3970 - [Gandiva][C++] Remove unnecessary boost dependencies.
  • ARROW-3971 - [Python] Remove deprecations in 0.11 and prior
  • ARROW-3974 - [C++] Combine field_builders_ and children_ members in array/builder.h
  • ARROW-3982 - [C++] Allow "binary" input in simple JSON format
  • ARROW-3983 - [Gandiva][Crossbow] Link Boost statically in JAR packaging scripts
  • ARROW-3984 - [C++] Exit with error if user hits zstd ExternalProject path
  • ARROW-3986 - [C++] Document memory management and table APIs
  • ARROW-3986 - [C++] Write prose documentation
  • ARROW-3987 - [Java] Benchmark results for ARROW-1807
  • ARROW-3988 - [C++] Do not build unit tests by default, fix building Gandiva unit tests when ARROW_BUILD_TESTS=OFF
  • ARROW-3993 - [JS] CI Jobs Failing
  • ARROW-3994 - [C++] Remove ARROW_GANDIVA_BUILD_TESTS option
  • ARROW-3995 - [CI] Use understandable names on Travis
  • ARROW-3997 - [Documentation] Clarify dictionary index type
  • ARROW-4002 - [C++][Gandiva] Remove needless CMake version check
  • ARROW-4004 - [GLib] Replace GPU with CUDA
  • ARROW-4005 - [Plasma][GLib] Add gplasma_client_disconnect()
  • ARROW-4006 - Add CODE_OF_CONDUCT.md
  • ARROW-4009 - [CI] Run Valgrind and C++ code coverage in different builds
  • ARROW-4010 - [C++] Enable Travis CI scripts to only build and install only certain targets
  • ARROW-4015 - [Plasma] remove unused interfaces for plasma manager
  • ARROW-4017 - [C++] Move vendored libraries in dedicated directory
  • ARROW-4026 - [C++] Add *-all, *-tests, *-benchmarks modular CMake targets. Use in Travis CI
  • ARROW-4028 - [Rust] Merge parquet-rs codebase
  • ARROW-4029 - [C++] Exclude headers with 'internal' from installation. Document header file conventions in README
  • ARROW-4030 - [CI] Use travis_terminate in more script commands to fail faster
  • ARROW-4035 - [Ruby] Support msys2 mingw dependencies
  • ARROW-4037 - [Packaging] Remove workaround to verify 0.11.0
  • ARROW-4038 - [Rust] Implement boolean AND, OR, NOT array ops
  • ARROW-4039 - [Python] Update link to 'development.rst' page from Python README.md
  • ARROW-4042 - [Rust] Rename BinaryArray::get_value to value
  • ARROW-4043 - [Packaging/Docker] Python tests on alpine miss pytest dependency
  • ARROW-4044 - [Packaging/Python] Add hypothesis test dependency to pyarrow conda recipe
  • ARROW-4045 - [Packaging/Python] Add hypothesis test dependency to wheel crossbow tests
  • ARROW-4048 - [GLib] Return ChunkedArray instead of Array in gparquet_arrow_file_reader_read_column
  • ARROW-4051 - [Gandiva][GLib] Add support for null literal
  • ARROW-4054 - [Python] Update gtest, flatbuffers and OpenSSL in manylinux1 base image
  • ARROW-4060 - [Rust] Add parquet arrow converter.
  • ARROW-4069 - [Python] Add tests for casting binary -> string/utf8. Add pyarrow.utf8() type factory alias for readability
  • ARROW-4075 - [Rust] Reuse array builder after calling finish()
  • ARROW-4079 - [C++] Add machine benchmark
  • ARROW-4080 - [Rust] Improving lengthy build times in Appveyor
  • ARROW-4082 - [C++] Allow RelWithDebInfo, improve FindClangTools
  • ARROW-4084 - [C++] Make Status static method support variadic arguments
  • ARROW-4085 - [GLib] Use "field" for struct data type
  • ARROW-4087 - [C++] Make CSV spellings of null values configurable
  • ARROW-4093 - [C++] Fix wrong suggested method name
  • ARROW-4098 - [Python] Deprecate open_file/open_stream top level APIs in favor of using ipc namespace
  • ARROW-4100 - [Gandiva][C++] Fix regex for special character dot.
  • ARROW-4102 - [C++] Return common IdentityCast when casting to equal type
  • ARROW-4103 - [Docs] Move documentation build instructions from source/python/development.rst to docs/README.md
  • ARROW-4105 - [Rust] Add rust-toolchain to enforce user to use nightly toolchain for building
  • ARROW-4107 - [Python] Use ninja in pyarrow manylinux1 build
  • ARROW-4112 - [Packaging] Add support for Gandiva .deb
  • ARROW-4116 - [Python] Add warning to development instructions to avoid virtualenv when using Anaconda/miniconda
  • ARROW-4122 - [C++] Initialize class members based on codebase static analysis
  • ARROW-4127 - [Documentation][Python] Add instructions to build with Docker
  • ARROW-4129 - [Python] Fix syntax problem in benchmark docs
  • ARROW-4132 - [GLib] Add more GArrowTable constructors
  • ARROW-4141 - [Ruby] Add support for creating schema from raw Ruby objects
  • ARROW-4148 - [CI/Python] Disable ORC on nightly Alpine builds
  • ARROW-4150 - [C++] Ensure allocated buffers have non-null data pointer
  • ARROW-4151 - [Rust] Restructure project directories
  • ARROW-4152 - [GLib] Remove an example to show Torch integration
  • ARROW-4153 - [GLib] Add builder_append_value() for consistency
  • ARROW-4154 - [GLib] Add GArrowDecimal128DataType
  • ARROW-4155 - [Rust] Implement array_ops::sum() for PrimitiveArray<T>
  • ARROW-4156 - [C++] Don't use object libs with Xcode
  • ARROW-4158 - Allow committers to set ARROW_GITHUB_API_TOKEN for merge script, better debugging output
  • ARROW-4160 - [Rust] Add README and executable files to parquet
  • ARROW-4161 - [GLib] Add PlasmaClientOptions
  • ARROW-4162 - [Ruby] Add support for creating data types from description
  • ARROW-4166 - [Ruby] Add support for saving to and loading from buffer
  • ARROW-4167 - [Gandiva] switch to arrow/util/variant
  • ARROW-4168 - [GLib] Use property to keep GArrowDataType passed in garrow_field_new()
  • ARROW-4172 - [Rust] more consistent naming in array builders
  • ARROW-4174 - [Ruby] Add support for building composite array from raw Ruby objects
  • ARROW-4175 - [GLib] Add support for decimal compare operators
  • ARROW-4177 - [C++] Add ThreadPool and TaskGroup microbenchmarks
  • ARROW-4183 - [Ruby] Add Arrow::Struct as an element of Arrow::StructArray
  • ARROW-4184 - [Ruby] Add Arrow::RecordBatch#to_table
  • ARROW-4191 - [C++] Use same CC and AR for jemalloc as for the main sources
  • ARROW-4199 - [GLib] Add garrow_seekable_input_stream_peek()
  • ARROW-4207 - [Gandiva][GLib] Add support for IfNode
  • ARROW-4210 - [Python] Mention boost-cpp directly in the conda meta.yaml for pyarrow
  • ARROW-4211 - [GLib] Add GArrowFixedSizeBinaryDataType
  • ARROW-4214 - [Ruby] Add support for building RecordBatch from raw Ruby objects
  • ARROW-4216 - [Python] Add CUDA API docs
  • ARROW-4228 - [GLib] Add garrow_list_data_type_get_field()
  • ARROW-4229 - [Packaging] Set crossbow target explicitly to enable building arbitrary arrow repo
  • ARROW-4233 - [Packaging] Use Docker to build source archive
  • ARROW-4239 - [Packaging] Fix version update for the next version
  • ARROW-4240 - [Packaging] Add missing Plasma GLib and Gandiva GLib documents to souce archive
  • ARROW-4241 - [Packaging] Disable crossbow conda OSX clang builds
  • ARROW-4243 - [Python] Fix test failures with pandas 0.24.0rc1
  • ARROW-4249 - [Plasma] Clean up client namespace
  • ARROW-4257 - [Release] Update release verification script to check binaries on Bintray
  • ARROW-4266 - [Python][CI] Disable ORC tests in dask integration test
  • ARROW-4269 - [Python] Fix serialization in pandas 0.22
  • ARROW-4270 - [Packaging][Conda] Update xcode version and remove toolchain builds
  • ARROW-4276 - [Release] Remove needless Bintray authentication from binaries verify script
  • ARROW-4306 - [Release] Update website and add blog post announcing 0.12.0 release
  • PARQUET-690 - [C++] Reuse Thrift resources when serializing metadata structures
  • PARQUET-1271 - [C++] Rename parquet_reader tool to parquet-reader for consistency
  • PARQUET-1439 - Remove PARQUET_ARROW_LINKAGE option, clean up overall library linking configuration
  • PARQUET-1449 - [C++] Support building with ARROW_BOOST_VENDORED=ON
  • PARQUET-1463 - [C++] Utilize common hashing machinery for dictionary encoding
  • PARQUET-1467 - [C++] Remove defunct ChunkedAllocator code
  • PARQUET-1473 - [C++] Add helper function that converts ParquetVersion to human-friendly string
  • PARQUET-1484 - [C++] Improve memory usage of FileMetaDataBuilder

Bug Fixes

  • ARROW-1847 - [Doc] Document the difference between RecordBatch and Table in an FAQ fashion
  • ARROW-2026 - [C++] Enforce use_deprecated_int96_timestamps to all time…
  • ARROW-2038 - [Python] Strip s3:// scheme in S3FSWrapper isdir() and isfile()
  • ARROW-2113 - /3768: [Python] set classpath to all hadoop jars when HADOOP_HOME present
  • ARROW-2591 - [Python] Add Parquet test case writing list-typed column with empty lists that caused segfault on 0.9.0
  • ARROW-2592 - [Python] Add "ignore_metadata" option to Table.to_pandas
  • ARROW-2654 - [Python] Error with errno 22 when loading 3.6 GB Parquet file
  • ARROW-2708 - [C++] Internal GetValues function in arrow::compute should check for nullptr
  • ARROW-2831 - [Plasma] MemoryError in teardown
  • ARROW-2970 - [Python] Support conversions of NumPy string arrays requiring chunked binary output
  • ARROW-2987 - [Python] test_cython_api can fail if run in an environment where vsvarsall.bat has been run more than once
  • ARROW-3048 - [Python] Import pyarrow fails if scikit-learn is installed from conda (boost-cpp / libboost issue)
  • ARROW-3058 - [Python] Raise more helpful better error message when writing a pandas.DataFrame to Feather format that requires a chunked layout
  • ARROW-3186 - [GLib][CI] Use the latest Meson again
  • ARROW-3202 - [C++] Fix compilation on Alpine Linux by using ARROW_WITH_BACKTRACE define
  • ARROW-3225 - [C++/Python] Pandas object conversion of ListType<DateType> and ListType<TimeType>
  • ARROW-3324 - [Python] Destroy temporary metadata builder classes more eagerly when building files to reduce memory usage
  • ARROW-3343 - [Java] Disable flaky tests
  • ARROW-3405 - [Python] Document CSV reader
  • ARROW-3428 - [Python] Fix from_pandas conversion from float to bool
  • ARROW-3436 - [C++] Boost version required by Gandiva is too new for Ubuntu 14.04
  • ARROW-3437 - [C++] Use older API for boost::optional, remove gtest include from prod code, remove -static-libstdc++ flags
  • ARROW-3438 - [Packaging] Fix too much Markdown escape in CHANGELOG
  • ARROW-3445 - [GLib] Fix libarrow-glib link for libparquet-glib
  • ARROW-3449 - [C++] Fixes to build with CMake 3.2. Document what requires newer CMake
  • ARROW-3466 - [C++] Avoid leaking protobuf symbols
  • ARROW-3467 - [C++] Fix building against external double-conversion
  • ARROW-3470 - [C++] Fix row-wise example
  • ARROW-3477 - [C++] fixes for 32 bit architectures
  • ARROW-3480 - [Website] Fix broken install document for Ubuntu
  • ARROW-3483 - [CI] Python 3.6 build failure on Travis-CI
  • ARROW-3485 - [C++] Examples fail with Protobuf error
  • ARROW-3494 - [Gandiva][C++] fix re2 error in cmake
  • ARROW-3498 - [R] Make IPC APIs consistent
  • ARROW-3516 - [C++] Use unsigned type for difference of pointers in parallel_memcpy
  • ARROW-3517 - [C++] Add a workaround for MinGW-w64 32bit crash
  • ARROW-3524 - [C++] Fix compiler warnings from ARROW-3409 on clang-6
  • ARROW-3527 - [R] remove unused variables
  • ARROW-3528 - [R] Fixed typo in R package documentation
  • ARROW-3535 - [Python] pip install tensorflow install too new numpy in manylinux1 build
  • ARROW-3541 - [Rust] Update BufferBuilder to allow for new bit-packed BooleanArray
  • ARROW-3544 - [Gandiva][C++] Create function registry in multiple compilation units to reduce build times
  • ARROW-3549 - [Rust] Replace i64 with usize for some bit utility functions
  • ARROW-3573 - [Rust] with_bitset does not set valid bits correctly
  • ARROW-3580 - [Gandiva][C++] Fix build error with g++ 8.2.0
  • ARROW-3586 - [Python] Add test ensuring no segfault
  • ARROW-3598 - [Plasma] Fix Plasma GPU linking error.
  • ARROW-3613 - [Go] Fix builder downsize
  • ARROW-3613 - [Go] fix builder resize
  • ARROW-3614 - [R] Support for timestamps
  • ARROW-3634 - [GLib] Follow CudaDeviceManager::AllocateHost() API change
  • ARROW-3637 - [Go] implement Stringer for arrays
  • ARROW-3658 - [Rust] Incorrect List<T> tests
  • ARROW-3670 - [C++] Use FindBacktrace to find execinfo.h support
  • ARROW-3687 - [Rust] Anything measuring array slots should be usize
  • ARROW-3698 - [Gandiva] Segmentation fault when using a large table in Gandiva
  • ARROW-3700 - [C++] Ignore empty lines in CSV files
  • ARROW-3703 - [Python] DataFrame.to_parquet crashes if datetime column has time zones
  • ARROW-3704 - [Gandiva][C++] Add missing include
  • ARROW-3707 - [C++] Fix test regression with zstd 1.3.7
  • ARROW-3711 - [C++] Don't pass CXX_FLAGS to C_FLAGS
  • ARROW-3712 - [CI] Quick fix for RAT failure
  • ARROW-3715 - [C++] : Fix typo in gflags_ep CMake config
  • ARROW-3716 - [R] Missing cases for ChunkedArray conversion
  • ARROW-3728 - [Python] Ignore differences in schema custom metadata when writing table to ParquetWriter
  • ARROW-3734 - [C++] Linking static zstd library fails on Arch x86-64
  • ARROW-3740 - [C++] Builder should not downsize
  • ARROW-3742 - Fix pyarrow.types & gandiva cython bindings
  • ARROW-3745 - [C++] CMake passes static libraries multiple times to linker
  • ARROW-3754 - [C++] Enable Zstandard by default only when CMake is 3.7 or later
  • ARROW-3756 - [CI/Docker/Java] Java tests are failing in docker-compose setup
  • ARROW-3765 - [Gandiva] Segfault when the validity bitmap has not been allocated
  • ARROW-3766 - [Python] pa.Table.from_pandas doesn't use schema ordering
  • ARROW-3768 - [Python] set classpath to hdfs not hadoop executable
  • ARROW-3775 - [C++] Handling Parquet Arrow reads that overflow a BinaryArray capacity
  • ARROW-3790 - [C++] Fix erroneous safe casting
  • ARROW-3792 - [C++] Writing a list-type chunked column to Parquet fails if any chunk is 0-length
  • ARROW-3793 - [C++] TestScalarAppendUnsafe is not testing unsafe appends
  • ARROW-3797 - [Rust] BinaryArray::value_offset incorrect in offset case
  • ARROW-3805 - [Gandiva] Handle null validity bit-map in if-else
  • ARROW-3831 - [C++] Add support for returning decompressed size
  • ARROW-3835 - [C++] Add missing arrow::io::CompressedOutputStream::raw() implementation
  • ARROW-3837 - [C++] Add GFLAGS_IS_A_DLL define to fix Windows build
  • ARROW-3866 - [Python] Column metadata is not transferred to tables in pyarrow
  • ARROW-3869 - [Rust] "invalid fastbin errors" since Rust nightly-2018-11-03
  • ARROW-3874 - [C++] Add LLVM_DIR to find_package in FindLLVM.cmake
  • ARROW-3879 - [C++] Fix uninitialized member in CudaBufferWriter
  • ARROW-3888 - [C++] Fix various compiler warnings
  • ARROW-3889 - [Python] Crash when creating schema from invalid args
  • ARROW-3890 - [Python] Handle NumPy binary arrays with UTF-8 validation when converting to StringArray
  • ARROW-3894 - [C++] Ensure that IPC file is properly initialized even if no record batches are written
  • ARROW-3898 - [Example] parquet-arrow example has compilation errors
  • ARROW-3909 - [Python] Table.from_pandas call that seemingly should zero copy does not
  • ARROW-3918 - [Python] ParquetWriter.write_table doesn't support coerce_timestamps or allow_truncated_timestamps
  • ARROW-3920 - [plasma] Fix reference counting in custom tensorflow plasma operator.
  • ARROW-3931 - [C++] Make possible to build regardless of LANG
  • ARROW-3936 - [C++] Add ONOINHERIT to the file open flags on Windows
  • ARROW-3937 - [Rust] Fix Rust nightly build (formatting rules changed)
  • ARROW-3940 - [Python/Documentation] Add required packages to the development instruction
  • ARROW-3941 - [R] RecordBatchStreamReader$schema
  • ARROW-3942 - [R] Feather api fixes
  • ARROW-3953 - [Python] Compat with pandas 0.24 rename of MultiIndex labels -> codes
  • ARROW-3955 - [GLib] Add (transfer full) to free when no longer needed
  • ARROW-3957 - [Python] Better error message when user connects to HDFS cluster with wrong port
  • ARROW-3961 - [Python/Documentation] Fix wrong path in the pyarrow README
  • ARROW-3969 - [Rust] Format using stable rustfmt
  • ARROW-3976 - [Ruby] Try to upgrade git to avoid errors caused by Homebrew on older git
  • ARROW-3977 - [Gandiva] fix label during ctest invoc
  • ARROW-3979 - [Gandiva] fix all valgrind reported errors
  • ARROW-3980 - [C++] Fix CRTP use in json-simple.cc
  • ARROW-3989 - [Rust][CSV] Cast bool string to lower case in reader
  • ARROW-3996 - [C++] Add missing packages on Linux
  • ARROW-4008 - [C++] Restore ARROW_BUILD_UTILITIES to fix integration tests
  • ARROW-4011 - [Gandiva] Install irhelpers.bc and use it
  • ARROW-4019 - [C++] Fix Coverity issues
  • ARROW-4033 - [C++] Use readlink -f instead of realpath in dependency download script
  • ARROW-4034 - [Ruby] Add support :append option to FileOutputStream
  • ARROW-4041 - [CI] Python 2.7 run uses Python 3.6
  • ARROW-4049 - [C++] Arrow never use glog even though glog is linked.
  • ARROW-4052 - [C++] Linker errors with glog and gflags
  • ARROW-4053 - [Python/Integration] HDFS Tests failing with I/O operation on closed file
  • ARROW-4055 - [Python] Fails to convert pytz.utc with versions 2018.3 and earlier
  • ARROW-4058 - [C++] arrow-io-hdfs-test fails when run against HDFS cluster from docker-compose
  • ARROW-4065 - [C++] arrowTargets.cmake is broken
  • ARROW-4066 - [Doc] Instructions to create Sphinx documentation
  • ARROW-4070 - [C++] Enable use of ARROW_BOOST_VENDORED with ninja-build
  • ARROW-4073 - [Python] Fix URI parsing on Windows. Also fix test for get_library_dirs when using ARROW_HOME to develop
  • ARROW-4074 - [Python] test_get_library_dirs_win32 fails if libraries installed someplace different from conda or wheel packages
  • ARROW-4078 - [CI] Detect changes in docs/ directory and build the Linux Python entry if so
  • ARROW-4088 - [Python] Table.from_batches() fails when passed a schema with metadata
  • ARROW-4089 - [Plasma] The tutorial is wrong regarding the parameter type of PlasmaClient.Create
  • ARROW-4101 - [C++] Identity BinaryType cast
  • ARROW-4106 - [Python] Tests fail to run because hypothesis update broke its API
  • ARROW-4109 - [Packaging] Missing glog dependency from arrow-cpp conda recipe
  • ARROW-4113 - [R] Fix version number
  • ARROW-4114 - [C++] Add python to requirements list for running on ubuntu
  • ARROW-4115 - [Gandiva] zero-init boolean data bufs
  • ARROW-4118 - [Python] Fix benchmark setup for "asv run"
  • ARROW-4125 - [Python] Don't fail ASV if Plasma extension is not built (e.g. on Windows)
  • ARROW-4126 - [Go] offset not used when accessing boolean array
  • ARROW-4128 - [C++] Update style guide to reflect NULLPTR and doxygen
  • ARROW-4130 - [Go] offset not used when accessing binary array
  • ARROW-4134 - [Packaging] Properly setup timezone in docker tests to prevent ORC adapter's abort
  • ARROW-4135 - [Python] Can't reload a pandas dataframe containing a list of datetime.time
  • ARROW-4137 - [Rust] Move parquet code into a separate crate
  • ARROW-4138 - [Python] Fix setuptools_scm version customization on Windows
  • ARROW-4147 - [Java] reduce heap usage for varwidth vectors (#3298)
  • ARROW-4149 - [CI/C++] Parquet test misses ZSTD compression codec in CMake 3.2 nightly builds
  • ARROW-4157 - [C++] Fix clang documentation warnings on Ubuntu 18.04
  • ARROW-4171 - [Rust] fix parquet crate release version
  • ARROW-4173 - Fix JIRA library name in error message
  • ARROW-4178 - [C++] Fix TSan and UBSan errors
  • ARROW-4179 - [Python] Use more public API to determine whether a test has a pytest mark or not
  • ARROW-4182 - [Python][CI] SEGV frequency
  • ARROW-4185 - [Rust] Change directory before running Rust examples on Windows
  • ARROW-4186 - [C++] BitmapWriter shouldn't clobber data when length == 0
  • ARROW-4188 - [Rust] Move Rust README to top level rust directory
  • ARROW-4197 - [C++] Better Emscripten support
  • ARROW-4200 - [C++/Python] Enable conda_env_python.yml to work on Windows, simplify python/development.rst
  • ARROW-4209 - [Gandiva] Avoid struct return param in IR
  • ARROW-4215 - [GLib] Fix typos in documentation
  • ARROW-4227 - [GLib] Fix wrong data type in field of composite data type
  • ARROW-4237 - [Packaging] Fix CMAKE_INSTALL_LIBDIR in release verification script
  • ARROW-4238 - [Packaging] Fix RC version conflict between crossbow and rake
  • ARROW-4246 - [Plasma][Python][Follow-up] Ensure plasma::ObjectTableEntry always has the same size regardless of whether built with CUDA support
  • ARROW-4246 - [Plasma][Python] PlasmaClient.list returns wrong information with CUDA enabled Plasma
  • ARROW-4256 - [Release] Fix Windows verification script for 0.12 release
  • ARROW-4258 - [Python] Safe cast fails from numpy float64 array with nans to integer
  • ARROW-4260 - [Python] NumPy buffer protocol failure
  • PARQUET-1426 - [C++] parquet-dump-schema has poor usability
  • PARQUET-1458 - [C++] parquet::CompressionToString not recognizing brotli compression
  • PARQUET-1469 - [C++] Fix data corruption bug in parquet::internal::DefinitionLevelsToBitmap that was triggered through random data
  • PARQUET-1471 - [C++] TypedStatistics<T>::UpdateSpaced reads out of bounds value when there are more definition levels than spaced values
  • PARQUET-1481 - [C++] Throw exception when encountering bad Thrift metadata in RecordReader

Apache Arrow 0.11.1 (2018-10-23)

New Features and Improvements

  • ARROW-3353 - [Packaging] Build python 3.7 wheels
  • ARROW-3534 - [Python][skip appveyor]
  • ARROW-3546 - [Python] Provide testing setup to verify wheel binaries work in one or more common Linux distributions
  • ARROW-3565 - [Python] Pin tensorflow to 1.11.0 in manylinux1 container

Bug Fixes

  • ARROW-3514 - [C++] Work around insufficient output size estimate on old zlibs
  • ARROW-3907 - [Python] from_pandas errors when schemas are used with lower resolution timestamps

Apache Arrow 0.11.0 (2018-10-08)

New Features and Improvements

  • ARROW-25 - [C++] Implement CSV reader
  • ARROW-249 - [JAVA] Flight GRPC Implementation
  • ARROW-614 - [C++] Use glog (or some other tool) to print stack traces in debug builds on errors
  • ARROW-1325 - [R] Initial R package that builds against the arrow C++ library
  • ARROW-1424 - [Python] Add CUDA support to pyarrow
  • ARROW-1491 - [C++] Add casting from strings to numbers and booleans
  • ARROW-1521 - [C++] Add BufferOutputStream::Reset method
  • ARROW-1563 - [C++][FOLLOWUP] Use std::function instead of declaring auxiliary helper classes
  • ARROW-1563 - [C++] Implement logical unary and binary kernels for boolean arrays
  • ARROW-1860 - [C++] Add data structure to "stage" a sequence of IPC messages from in-memory data
  • ARROW-1949 - [Python/C++] Add option to Array.from_pandas and pyarrow.array to perform unsafe casts
  • ARROW-1963 - [C++/Python] Create Array from sequence of numpy.datetime64
  • ARROW-1968 - [C++/Python] Add basic unit tests for ORC reader
  • ARROW-2165 - [JAVA] enhance AllocationListener with onChildAdded()/onChildRemoved() calls (#2697)
  • ARROW-2338 - [Scripts] Windows release verification script should create a conda environment
  • ARROW-2352 - [C++/Python] Test OSX packaging in Travis matrix
  • ARROW-2519 - [Rust] Implement min/max for primitive arrays
  • ARROW-2520 - [Rust] CI should also build against nightly Rust
  • ARROW-2555 - [C++/Python] Allow Parquet-Arrow writer to truncate timestamps instead of failing
  • ARROW-2583 - [Rust] Buffer should be typeless
  • ARROW-2617 - [Rust] Schema should contain fields not columns
  • ARROW-2687 - [JS] Example usage in README is outdated
  • ARROW-2734 - [Python] Cython api example doesn't work by default on macOS
  • ARROW-2750 - [MATLAB] Initial MATLAB interface, support for reading numeric types from Feather files
  • ARROW-2799 - [Python] Add safe option to Table.from_pandas to avoid unsafe casts
  • ARROW-2813 - [CI][Followup] Disable gcov output in Travis-CI logs
  • ARROW-2813 - [CI] Mute uninformative lcov warnings
  • ARROW-2817 - [C++] Enable libraries to be installed in msys2 on Windows
  • ARROW-2840 - [C++] See if stream alignment logic can be simplified
  • ARROW-2865 - [C++/Python] Reduce some duplicated code in python/builtin_convert.cc
  • ARROW-2889 - [C++] Add optional argument to ADD_ARROW_TEST CMake function to add unit test prefix
  • ARROW-2900 - [Python] Improve performance of appending nested NumPy arrays in builtin_convert.cc
  • ARROW-2936 - [Python] Implement Table.cast for casting from one schema to another (if possible)
  • ARROW-2948 - [Packaging] Generate changelog with crossbow
  • ARROW-2950 - [C++] Clean up util/bit-util.h
  • ARROW-2952 - [C++] Dockerized include-what-you-use
  • ARROW-2958 - [C++] Bump Flatbuffers EP version to master to build on gcc 8.1
  • ARROW-2960 - [Packaging] Fix verify-release-candidate for binary packages and fix release cutting script for lib64 cmake issue
  • ARROW-2964 - [Go] wire all primitive arrays into array.MakeFromArray
  • ARROW-2971 - [Python] Give some modules in arrow/python more descriptive names
  • ARROW-2972 - [Python] Implement inference logic for uint64 conversions in builtin_convert.cc
  • ARROW-2975 - [Plasma] Fix TensorFlow operator compilation with pip package
  • ARROW-2976 - [Python] Fix pyarrow.get_library_dirs
  • ARROW-2979 - [GLib] Add operator functions in GArrowDecimal128
  • ARROW-2983 - [Packaging] Verify source release and binary artifacts in different scripts
  • ARROW-2989 - [C++/Python] Remove API deprecations in 0.10
  • ARROW-2991 - [CI] Cut down number of AppVeyor jobs
  • ARROW-2994 - [Python] Only include Python and NumPy include directories for libarrow_python targets
  • ARROW-2996 - [C++] Fix typo in cpp/.clang-tidy
  • ARROW-2998 - [C++][Resizable] Buffer
  • ARROW-2999 - [Python] Disable ASV runs in Travis CI for now
  • ARROW-3000 - [C++] Add option to label test groups then only build those unit tests
  • ARROW-3001 - [Packaging] Don't modify PATH during rust release verification
  • ARROW-3002 - [Python] Hash more parts of pyarrow.Field
  • ARROW-3003 - [Doc] Enable Java doc generation
  • ARROW-3005 - [Release] Update website, draft simple release blog post for 0.10.0
  • ARROW-3008 - [Packaging] Verify GPU related modules if available
  • ARROW-3009 - [Python] Fix pyarrow ORC reader
  • ARROW-3010 - [GLib] Update README to use Bundler
  • ARROW-3017 - [C++] Don't throw exception in arrow/util/thread-pool.h
  • ARROW-3018 - [Plasma][FOLLOWUP] Update plasma documentation
  • ARROW-3018 - [Plasma] Remove Mersenne twister
  • ARROW-3019 - [Packaging] Use Bundler to verify Arrow GLib
  • ARROW-3021 - [Go] add support for List arrays
  • ARROW-3022 - [Go] add support for Struct arrays
  • ARROW-3023 - [C++] Add gold linker enabling logic from Apache Kudu
  • ARROW-3024 - [C++] Remove mutex in MemoryPool implementations
  • ARROW-3025 - [C++] Add option to switch between dynamic and static linking in unit test executables
  • ARROW-3026 - [Python][Plasma] Only run Plasma unit tests with valgrind under Python 3.6
  • ARROW-3027 - [Ruby] Stop "git tag" by "rake release"
  • ARROW-3028 - [Python] Do less work to test Python documentation build
  • ARROW-3029 - [Python] Generate version file when building
  • ARROW-3031 - [Go] streamline Release of Arrays and Builders
  • ARROW-3033 - [Dev] docker-compose test tooling does not seem to cache built Docker images
  • ARROW-3034 - [Packaging] Resolve symbolic link in tar.gz
  • ARROW-3035 - [Rust] Examples in README.md do not run
  • ARROW-3036 - [Go] implement array.NewSlice
  • ARROW-3037 - [Go] implement Null array
  • ARROW-3042 - [Go] add godoc badge to README
  • ARROW-3043 - [C++] pthread doesn't exist on MinGW
  • ARROW-3044 - [Python] Remove all occurrences of cython's legacy property definition syntax
  • ARROW-3045 - [Python] Remove nullcheck from ipc Message and MessageReader
  • ARROW-3046 - [GLib] Use rubyish method
  • ARROW-3050 - [C++] Adopt HiveServer2 client codebase from
  • ARROW-3051 - [C++] Status performance optimization from Impala/Kudu
  • ARROW-3057 - [INTEGRATION] Fix spark and hdfs dockerfiles
  • ARROW-3059 - [C++] Remove namespace arrow::test
  • ARROW-3060 - [C++] Factor out string-to-X conversion routines
  • ARROW-3062 - [Python] Fix python package finder to also work in Python 2.7
  • ARROW-3064 - [C++] Add option to ADD_ARROW_TEST to indicate additional dependencies for particular unit test executables
  • ARROW-3067 - [Packaging] Support dev/rc/release .deb/.rpm builds
  • ARROW-3068 - [Packaging] Bump version to 0.11.0-SNAPSHOT
  • ARROW-3069 - [Release] Stop using SHA1 checksums per ASF policy
  • ARROW-3072 - [C++] Add RETURN_NOT_OK linting rule, use ARROW_RETURN_NOT_OK in header files
  • ARROW-3075 - [C++] Incorporate parquet-cpp codebase into Arrow C++ build
  • ARROW-3076 - [Website] Add Google Analytics scripts to Sphinx, Doxygen API docs
  • ARROW-3088 - [Rust] Use internal Result<T> type instead of Result<T, ArrowError>
  • ARROW-3090 - [Rust] Accompany error messages with assertions
  • ARROW-3094 - [Python] Easier construction of schemas and struct types
  • ARROW-3099 - [C++] Add benchmark for number parsing
  • ARROW-3105 - [Plasma] Improve flushing error message
  • ARROW-3106 - [Website] Update committers and PMC roster on website
  • ARROW-3109 - [Python] Add Python 3.7 virtualenvs to manylinux1 container
  • ARROW-3110 - [C++] Fix warnings with gcc 7.3.0
  • ARROW-3111 - [Java] Adding logback config file to allow running tests with different log level
  • ARROW-3114 - [Website] Add information about user@ mailing list to website / Community page
  • ARROW-3115 - [JAVA] Style checks - fix import ordering
  • ARROW-3116 - [Plasma] Add "ls" to object store
  • ARROW-3117 - [GLib] Add garrow_chunked_array_to_string()
  • ARROW-3119 - [Packaging] Nightly packaging script fails
  • ARROW-3127 - [Doc] Add Tutorial for Sending Tensor from C++ to Python
  • ARROW-3128 - [C++] Support system shared zlib
  • ARROW-3129 - [Packaging] Stop to use deprecated BuildRoot and Group in .spec
  • ARROW-3130 - [Go] add initial support for Go modules
  • ARROW-3136 - [C++] Clean up public API
  • ARROW-3142 - [C++] Fetch all libs from toolchain environment
  • ARROW-3143 - [C++] CopyBitmap into existing memory
  • ARROW-3146 - [C++] Prototype Flight RPC client and server implementations
  • ARROW-3147 - [C++] Improve MSVC version detection
  • ARROW-3148 - [C++] Remove needless U+00A0 NO-BREAK SPACE (#2500)
  • ARROW-3152 - [Packaging] Add zlib to runtime dependencies for arrow-cpp conda package
  • ARROW-3153 - [Packaging] Fix broken nightly package builds introduced with recent cmake changes and orc tests
  • ARROW-3157 - [C++] Add Buffer::Wrap, MutableBuffer::Wrap convenience methods for wrapping typed memory, std::vector<T>
  • ARROW-3158 - [C++] Handle float truncation during casting
  • ARROW-3160 - [Python] Improve pathlib.Path support in parquet and filesystem modules
  • ARROW-3163 - [Python] Add missing Cython dependency to source package
  • ARROW-3167 - [CI] Limit clcache cache size
  • ARROW-3168 - [C++] Restore pkgconfig for Parquet C++ libraries
  • ARROW-3170 - [C++] Experimental readahead spooler
  • ARROW-3171 - [Java] Enable checkstyle for line length and indentation
  • ARROW-3172 - [Rust] Update documentation for datatypes.rs
  • ARROW-3174 - [Rust] run examples as part of CI
  • ARROW-3177 - [Rust] Update expected error messages for tests that 'should panic'
  • ARROW-3180 - [C++] Add docker-compose setup to simulate Travis CI run locally
  • ARROW-3181 - [Packaging] Adjust conda package scripts to account for Parquet codebase migration
  • ARROW-3182 - [Gandiva] Integrate gandiva to arrow build. Update licenses to apache license.
  • ARROW-3187 - [C++] Add support for using glog (Google logging library)
  • ARROW-3195 - [C++] Add missing error check for NumPy initialization in test
  • ARROW-3196 - Add support for merging both ARROW and PARQUET patches
  • ARROW-3197 - [C++] Add instructions for building Parquet libraries and running the unit tests
  • ARROW-3198 - [Website] Blog post for 0.11 release
  • ARROW-3211 - [C++] Disable gold linker with MinGW-w64
  • ARROW-3212 - [C++] Make IPC metadata deterministic, regardless of current stream position. Clean up stream / tensor alignment logic
  • ARROW-3213 - [C++] Use CMake to build vendored Snappy on Windows
  • ARROW-3214 - [C++] Disable insecure warnings in MinGW build
  • ARROW-3215 - [C++] Add support for finding libpython on MSYS2
  • ARROW-3216 - [C++] Add missing libpython link to libarrow_python in MinGW build
  • ARROW-3217 - [C++] Add missing ARROW_STATIC definition in MinGW build
  • ARROW-3218 - [C++] Remove needless links to utilities in MinGW build
  • ARROW-3219 - [C++] Use Win32 API in MinGW build
  • ARROW-3223 - [GLib] Use the same shared object versioning rule in C++
  • ARROW-3229 - [Packaging] : Adjust wheel package scripts to account for Parquet codebase migration
  • ARROW-3234 - [C++] Fix libprotobuf shared library link order
  • ARROW-3235 - [Packaging] Update deb names
  • ARROW-3236 - [C++] Fix stream accounting bug causing garbled schema message when writing IPC file format
  • ARROW-3240 - [GLib] Add build instructions using meson
  • ARROW-3242 - [C++] Make CpuInfo a singleton, use coarser-grained dispatch to SSE4 in Parquet dictionary encoding
  • ARROW-3249 - [Python] Run flake8 on integration_test.py and crossbow.py
  • ARROW-3250 - [C++] Buffer implementation which owns memory from a std::string
  • ARROW-3252 - [C++] Do not hard code the "v" part of versions in thirdparty toolchain
  • ARROW-3257 - [C++] Stop to use IMPORTED_LINK_INTERFACE_LIBRARIES
  • ARROW-3258 - [GLib] Fix CI failure on macOS
  • ARROW-3259 - [GLib] Rename "writeable" to "writable"
  • ARROW-3261 - [Python] Add "field" method to select fields from StructArray
  • ARROW-3262 - [Python] Implement getitem with integers on pyarrow.Column
  • ARROW-3264 - [Java] Checkstyle fix whitespace
  • ARROW-3267 - [Python] Create empty table from schema
  • ARROW-3268 - [CI][skip travis]
  • ARROW-3269 - [Python] Fix warnings in unit test suite
  • ARROW-3270 - [Release] Adjust release verification scripts to recent parquet migration
  • ARROW-3274 - [Packaging] Missing glog dependency from conda-forge recipes
  • ARROW-3276 - [Packaging] Add support for Parquet deb/rpm packages
  • ARROW-3281 - [Java] Make sure that WritableByteChannel in WriteChannel writes
  • ARROW-3282 - [R] initial R functionality
  • ARROW-3284 - [R][C++] Status code R error
  • ARROW-3285 - [GLib] Add arrow_cpp_build_type and arrow_cpp_build_dir options
  • ARROW-3286 - [C++] Add missing ARROW_EXPORT to RecordBatchBuilder
  • ARROW-3287 - [C++] Suppress "redeclared without dllimport attribute" warning from MinGW
  • ARROW-3288 - [GLib] Add missing new API index for 0.11.0
  • ARROW-3300 - [Release] Update deb package names in preparation
  • ARROW-3301 - [Website] Update Jekyll and Bootstrap 4
  • ARROW-3305 - [JS] Incorrect development documentation link in javascript readme
  • ARROW-3309 - [JS] Missing links from DEVELOP.md
  • ARROW-3313 - [R] Follow-up: install clang-format in R CI entry
  • ARROW-3313 - [R] Move .clang-format to top level. Add r/lint.sh script for linting R C++ files in Travis CI
  • ARROW-3319 - [GLib] Add align() to GArrowInputStream and GArrowOutputStream
  • ARROW-3320 - [C++] Improve float parsing performance
  • ARROW-3321 - [C++] Improve integer parsing performance
  • ARROW-3334 - [Python] Update conda packages to new numpy requirement
  • ARROW-3335 - [Python] Add ccache to manylinux1 container
  • ARROW-3339 - [R] Support for character vectors
  • ARROW-3341 - [R] Support for logical vector
  • ARROW-3349 - [C++] Use aligned_* API in MinGW
  • ARROW-3350 - [Website] Fix powered by links
  • ARROW-3352 - [Packaging] Fix recently failing wheel builds
  • ARROW-3356 - [Python] Document parameters of Table.to_pandas method
  • ARROW-3357 - [Rust] Add a mutable buffer implementation
  • ARROW-3360 - [GLib] Import Parquet GLib
  • ARROW-3363 - [C++/Python] Add helper functions to detect scalar Python types
  • ARROW-3371 - [Python] Remove check_metadata argument for Field.equals docstring
  • ARROW-3375 - [Rust] remove unused mempool
  • ARROW-3376 - [C++] Add double-conversion to cpp/thirdparty/download_dependencies.sh
  • ARROW-3377 - [Gandiva][C++] Replace If statement with bit operations for bitmap
  • ARROW-3382 - [C++] Run Gandiva tests in Travis CI
  • ARROW-3392 - [Python] Support filters in disjunctive normal form in ParquetDataset
  • ARROW-3395 - [C++/Python] Add docker container for linting
  • ARROW-3397 - [C++] Change a CMake relative path for modules
  • ARROW-3400 - [Packaging] Add support for Parquet GLib deb/rpm
  • ARROW-3404 - [C++] Make CSV chunker faster
  • ARROW-3411 - [Packaging] Make dev/release/01-perform.sh executable
  • ARROW-3412 - [Packaging] Update rat exclude files
  • ARROW-3413 - [Packaging] Include Parquet GLib document to source archive
  • ARROW-3415 - [Packaging] Fix "conda activate" failure
  • ARROW-3416 - [Packaging] Use SHA512 instead of SHA1
  • ARROW-3417 - [Packaging] Fix Parquet C++ test failure
  • ARROW-3418 - [C++] Update parquet-cpp version to 1.5.1-SNAPSHOT
  • ARROW-3423 - [Packaging] Remove RC information from deb/rpm packages
  • ARROW-3443 - [Java] Flight reports memory leaks in TestBasicOperation
  • PARQUET-169 - Implement support for bulk reading and writing rep/def levels
  • PARQUET-267 - Detach thirdparty code from build configuration.
  • PARQUET-416 - C++11 compilation, code reorg, libparquet and installation targets
  • PARQUET-418 - Refactored parquet_reader utility for printing file contents.
  • PARQUET-428 - Support INT96 and FIXED_LEN_BYTE_ARRAY types
  • PARQUET-434 - Add a ParquetFileReader class
  • PARQUET-435 - Change column reader methods to be array-oriented rather than scalar
  • PARQUET-436 - Implement basic Write Support
  • PARQUET-437 - Add googletest setup and ADD_PARQUET_TEST helper
  • PARQUET-438 - Update RLE encoding tools and add unit tests from Impala
  • PARQUET-439 - Conform copyright headers to ASF requirements
  • PARQUET-442 - Nested schema conversion, Thrift struct decoupling, dump-schema utility
  • PARQUET-448 - Add cmake options to not build tests and/or executables
  • PARQUET-449 - updated to latest parquet.thrift
  • PARQUET-451 - Add RowGroupReader helper class and refactor parquet_reader.cc into DebugPrint
  • PARQUET-456 - Finish gzip implementation and unit test all compressors
  • PARQUET-463 - Add local DCHECK macros, fix some dcheck bugs exposed
  • PARQUET-468 - Use thirdparty Thrift compiler to compile parquet.thrift at make time
  • PARQUET-477 - Add clang-format / clang-tidy checks to toolchain
  • PARQUET-482 - Organize public API headers
  • PARQUET-485 - Decouple page deserialization from column reader to facilitate unit testing
  • PARQUET-488 - Add SSE cmake toggle, fix build on systems without SSE
  • PARQUET-489 - Shared library symbol visibility
  • PARQUET-494 - Implement DictionaryEncoder and test dictionary decoding
  • PARQUET-496 - Fix cpplint configuration to catch more style errors
  • PARQUET-497 - Decouple serialized file internals from the ParquetFileReader public API
  • PARQUET-499 - Complete PlainEncoder implementation for all primitive types and test end to end
  • PARQUET-501 - Add OutputStream abstract interface, refactor encoding code paths
  • PARQUET-503 - Reenable parquet 2.0 encoding implementations.
  • PARQUET-508 - Add ParquetFilePrinter
  • PARQUET-508 - Add ParquetFilePrinter
  • PARQUET-512 - Add Google benchmark for performance testing
  • PARQUET-515 - Add "SetData" to LevelDecoder
  • PARQUET-518 - Remove -Wno-sign-compare and scrub integer signedness
  • PARQUET-519 - Remove last of suppressed compiler warnings
  • PARQUET-520 - Add MemoryMapSource and add unit tests for both it and LocalFileSource
  • PARQUET-533 - Add a Buffer abstraction, refactor input/output classes to be simpler using Buffers
  • PARQUET-538 - Improve ColumnReader Tests
  • PARQUET-542 - Support custom memory allocators
  • PARQUET-545 - Improve API to support decimal type
  • PARQUET-547 - Refactor templates to all be based on DataType structs
  • PARQUET-551 - Handle compiler warnings due to disabled DCHECKs in relea…
  • PARQUET-556 - Extend RowGroupStatistics to include "min" "max" statistics
  • PARQUET-559 - Enable external RandomAccessSource as input to the ParquetFileReader
  • PARQUET-564 - Add cmake option to run valgrind on each unit test executable
  • PARQUET-566 - Add method to retrieve the full column path
  • PARQUET-568 - Enable top-level column selection.
  • PARQUET-572 - Rename public namespace to parquet from parquet_cpp
  • PARQUET-573 - Create a public API for reading and writing file metadata
  • PARQUET-582 - Conversions functions for Parquet enums to Thrift enums
  • PARQUET-583 - Parquet to Thrift schema conversion
  • PARQUET-587 - Implement BufferReader::Read(int64_t,uint8_t*)
  • PARQUET-589 - Implement BufferedInputStream for better memory usage
  • PARQUET-592 - Support compressed writes
  • PARQUET-593 - Add API for writing Page statistics
  • PARQUET-595 - API for KeyValue metadata
  • PARQUET-595 - API for KeyValue metadata
  • PARQUET-597 - Add data rates to benchmark output
  • PARQUET-598 - Test writing all primitive data types
  • PARQUET-600 - Add benchmarks for RLE-Level encoding
  • PARQUET-603 - Implement missing information in schema descriptor
  • PARQUET-605 - Expose schema node in ColumnDescriptor
  • PARQUET-607 - Public writer header
  • PARQUET-610 - Print additional ColumnMetaData for each RowGroup
  • PARQUET-616 - WriteBatch should accept const arrays
  • PARQUET-619 - Add OutputStream for local files
  • PARQUET-625 - Improve RLE read performance
  • PARQUET-633 - Add version to WriterProperties
  • PARQUET-634 - Consistent private linking of dependencies
  • PARQUET-636 - Expose selection for different encodings
  • PARQUET-641 - Instantiate stringstream only if needed in SerializedPageReader::NextPage
  • PARQUET-646 - Add options to make developing with clang and 3rd-party gcc easier
  • PARQUET-666 - Add support for writing dictionaries
  • PARQUET-671 - performance improvements for rle/bit-packed decoding
  • PARQUET-679 - Local Windows build and Appveyor support
  • PARQUET-679 - Fix debug asserts in tests (msvc/debug build)
  • PARQUET-679 - [C++] Resolve unit tests issues on Windows; Run unit tes…
  • PARQUET-679 - Local Windows build and Appveyor support
  • PARQUET-681 - Add tool to scan a parquet file
  • PARQUET-681 - Add tool to scan a parquet file
  • PARQUET-687 - C++: Switch to PLAIN encoding if dictionary grows too large
  • PARQUET-689 - C++: Compress DataPages eagerly
  • PARQUET-699 - Update parquet.thrift from https://github.com/apache/parquet-format
  • PARQUET-712 - Add library to read into Arrow memory
  • PARQUET-721 - benchmarks for reading into Arrow
  • PARQUET-724 - Test more advanced properties setting
  • PARQUET-728 - Incorporate upstream Arrow API changes
  • PARQUET-728 - Incorporate upstream Arrow API changes
  • PARQUET-731 - API to return metadata size and Skip reading values
  • PARQUET-737 - Use absolute namespace in macros
  • PARQUET-752 - Account for upstream Arrow API changes
  • PARQUET-762 - C++: Use optimistic allocation instead of Arrow Builders
  • PARQUET-763 - C++: Expose ParquetFileReader through Arrow reader
  • PARQUET-769 - Add support for Brotli compression
  • PARQUET-778 - Standardize the schema output to match the parquet-mr format
  • PARQUET-782 - Support writing to Arrow sinks
  • PARQUET-785 - LIST schema conversion for Arrow lists
  • PARQUET-805 - Read Int96 into Arrow Timestamp(ns)
  • PARQUET-807 - Allow user to retain ownership of parquet::FileMetaData.
  • PARQUET-807 - Allow user to retain ownership of parquet::FileMetaData.
  • PARQUET-809 - Add SchemaDescriptor::Equals method
  • PARQUET-813 - Build thirdparty dependencies using ExternalProject
  • PARQUET-820 - Decoders should directly emit arrays with spacing for null entries
  • PARQUET-829 - Make use of ARROW-469
  • PARQUET-830 - Add parquet::arrow::OpenFile with additional properties and metadata args
  • PARQUET-833 - C++: Provide API to write spaced arrays
  • PARQUET-834 - Support I/O of arrow::ListArray
  • PARQUET-835 - Read Arrow columns in parallel with thread pool
  • PARQUET-836 - Bugfix + testcase for column subsetting in arrow::FileReader::ReadFlatTable
  • PARQUET-844 - Schema, compression consolidation / flattening
  • PARQUET-848 - Build Thrift bits as part of main parquet_objlib component
  • PARQUET-857 - Flatten parquet/encodings directory, consolidate code
  • PARQUET-858 - Flatten column directory, minor code consolidation
  • PARQUET-859 - Flatten parquet/file directory, consolidate file reader, file writer code
  • PARQUET-862 - Provide defaut cache size values
  • PARQUET-866 - API fixes for ARROW-33 patch
  • PARQUET-867 - Support writing sliced Arrow arrays
  • PARQUET-874 - Use default memory allocator from Arrow
  • PARQUET-877 - Update Arrow Hash, update Version in metadata.
  • PARQUET-882 - Improve Application Version parsing
  • PARQUET-890 - Support I/O of DATE columns in parquet_arrow
  • PARQUET-894 - Fix compilation warnings
  • PARQUET-894 - Fix compilation warning
  • PARQUET-897 - Only use designated public headers from libarrow
  • PARQUET-903 - Add option to set RPATH to origin
  • PARQUET-909 - Reduce buffer allocations (mallocs) on critical path
  • PARQUET-909 - Reduce buffer allocations (mallocs) on critical path
  • PARQUET-911 - [C++] Support nested structs in parquet_arrow
  • PARQUET-928 - Support pkg-config
  • PARQUET-929 - Handle arrow::DictionaryArray when writing Arrow data
  • PARQUET-930 - Add timestamp[us] to schema test
  • PARQUET-934 - Support multiarch on Debian
  • PARQUET-935 - Set version to shared library
  • PARQUET-946 - Add ReadRowGroup and num_row_group methods to arrow::FileReader
  • PARQUET-953 - Add static constructors to arrow::FileWriter for initializing from schema, add WriteTable method
  • PARQUET-967 - Combine libparquet, libparquet_arrow libraries
  • PARQUET-970 - Add Lz4 and Zstd compression codecs
  • PARQUET-978 - [C++] Minimizing footer reads for small(ish) metadata
  • PARQUET-984 - Add abi and so version to pkg-config
  • PARQUET-991 - Resolve msvc warnings; Appveyor treats msvc warnings as …
  • PARQUET-991 - Fix msvc warning C4100: '<id>': unreferenced formal parameter
  • PARQUET-991 - Resolve msvc warnings; Appveyor treats msvc warnings as …
  • PARQUET-999 - Improve MSVC build - Enable PARQUET_BUILD_BENCHMARKS
  • PARQUET-1008 - [C++] TypedColumnReader::ReadBatch method updated to ac…
  • PARQUET-1035 - Write Int96 from Arrow timestamp(ns)
  • PARQUET-1037 - allow arbitrary size row-groups
  • PARQUET-1041 - Support Arrow's NullArray
  • PARQUET-1043 - Raise minimum CMake version to 3.2, delete cruft.
  • PARQUET-1044 - Use compression libraries from Apache Arrow
  • PARQUET-1045 - Remove code that's being moved to Apache Arrow in ARROW-1154
  • PARQUET-1053 - Fix unused result warnings due to unchecked Statuses
  • PARQUET-1053 - Fix unused result warnings due to unchecked Statuses
  • PARQUET-1068 - Modify .clang-format to use straight Google format with 90-character line width
  • PARQUET-1068 - Modify .clang-format to use straight Google format with 90-character line width
  • PARQUET-1072 - Build with ARROW_NO_DEPRECATED_API in Travis CI
  • PARQUET-1078 - Add option to coerce Arrow timestamps to a particular unit
  • PARQUET-1079 - Remove Arrow offset shift unneeded after ARROW-1335
  • PARQUET-1083 - Factor logic in parquet-scan.cc into a library function to help with perf testing
  • PARQUET-1083 - Factor logic in parquet-scan.cc into a library function to help with perf testing
  • PARQUET-1086 - [C++] Remove usage of arrow/util/compiler-util.h
  • PARQUET-1087 - Add ScanContents function to arrow::FileReader that catches Parquet exceptions
  • PARQUET-1092 - Support writing chunked arrow::Table columns
  • PARQUET-1093 - Improve Arrow level generation error message
  • PARQUET-1094 - Add benchmark for boolean Arrow column I/O
  • PARQUET-1095 - [C++] Read and write Arrow decimal values
  • PARQUET-1104 - Upgrade to Apache Arrow 0.7.0 RC0
  • PARQUET-1150 - Hide statically linked boost symbols
  • PARQUET-1160 - [C++] Implement BYTE_ARRAY-backed Decimal reads
  • PARQUET-1164 - [C++] Account for API changes in ARROW-1808
  • PARQUET-1165 - Pin clang-format version to 4.0
  • PARQUET-1166 - Add GetRecordBatchReader in parquet/arrow/reader
  • PARQUET-1177 - Add PARQUET_BUILD_WARNING_LEVEL option and more rigorous Clang warnings
  • PARQUET-1177 - Add PARQUET_BUILD_WARNING_LEVEL option and more rigorous Clang warnings
  • PARQUET-1196 - Example parquet_arrow project
  • PARQUET-1200 - Support reading a single Arrow column from a Parquet file
  • PARQUET-1218 - More informative error message on too short pages
  • PARQUET-1225 - NaN values may lead to incorrect filtering under certai…
  • PARQUET-1227 - Thrift crypto metadata structures
  • PARQUET-1256 - Add --print-key-value-metadata option to parquet_reader tool
  • PARQUET-1256 - Add --print-key-value-metadata option to parquet_reader tool
  • PARQUET-1267 - [C++] replace "unsafe" std::equal by std::memcmp
  • PARQUET-1276 - [C++] Reduce the amount of memory used for writing null decimal values
  • PARQUET-1279 - [C++] Adding use of ASSERT_NO_FATAL_FAILURE in unit tests when calling helper functions that call ASSERT_ macros
  • PARQUET-1301 - [C++] Crypto package in parquet-cpp
  • PARQUET-1308 - [C++] Use Arrow thread pool, not Arrow ParallelFor, fix deprecated APIs, upgrade clang-format version. Fix record delimiting bug
  • PARQUET-1323 - Fix compiler warnings on clang-6
  • PARQUET-1332 - Add bloom filter for parquet
  • PARQUET-1340 - Fix Travis Ci valgrind errors related to std::random_de…
  • PARQUET-1346 - [C++] Protect against empty Arrow arrays with null values
  • PARQUET-1348 - Add ability to write FileMetaData in arrow FileWriter
  • PARQUET-1350 - [C++] Use abstract ResizableBuffer instead of concrete PoolBuffer
  • PARQUET-1360 - Use conforming API style, variable names in WriteFileMetaData functions
  • PARQUET-1366 - [C++] Streamline use of Arrow's bit-util.h APIs
  • PARQUET-1372 - Add an API to allow writing RowGroups based on size
  • PARQUET-1372 - Add an API to allow writing RowGroups based on size
  • PARQUET-1378 - Allow RowGroups with zero rows to be written
  • PARQUET-1382 - [C++] Prepare for arrow::test namespace removal
  • PARQUET-1392 - Read multiple RowGroups at once into an Arrow table
  • PARQUET-1398 - [C++] move iv_prefix to Algorithms
  • PARQUET-1401 - [C++] optional RowGroup fields for handling hidden columns
  • PARQUET-1427 - [C++] Incorporate with build system, parquet target. Fix
  • PARQUET-1431 - [C++] Automaticaly set thrift to use boost for thrift versions before 0.11

Bug Fixes

  • ARROW-1380 - [Plasma] Fix "still reachable" valgrind warnings when PLASMA_VALGRIND=1
  • ARROW-1661 - [Python] Build Python 3.7 in manylinux container
  • ARROW-1799 - [Plasma C++] Make unittest does not create plasma store executable
  • ARROW-1996 - [Python] pyarrow.read_serialized cannot read concatenated records
  • ARROW-2027 - [C++] ipc::Message::SerializeTo does not pad the message body
  • ARROW-2220 - Only suggest default fix version that is a mainline release in merge tool
  • ARROW-2310 - Source release scripts fail with Java8
  • ARROW-2646 - [C++/Python] Pandas roundtrip for date objects
  • ARROW-2775 - [Python] ccache error when building manylinux1 wheels
  • ARROW-2776 - [C++] Do not pass -Wno-noexcept-type for compilers that do not support it
  • ARROW-2782 - [Python] Ongoing Travis CI failures in Plasma unit tests
  • ARROW-2785 - [C++] Crash in json-integration-test
  • ARROW-2814 - [Python] Unify conversion paths for sequences of Python objects
  • ARROW-2854 - [C++/Python] Casting float NaN to int should raise an error on safe cast
  • ARROW-2925 - [JS] Documentation failing in docker container
  • ARROW-2965 - [Python] Guard against overflow when serializing Numpy uint64 scalar
  • ARROW-2966 - [Python] Data type conversion error
  • ARROW-2973 - [Python] pitrou/asv.git@customize_commands does not work with the "new" way of activating conda
  • ARROW-2974 - [Python] Replace usages of "source activate" with "conda activate" in CI scripts
  • ARROW-2986 - [C++] Use /EHsc flag for exception handling on MSVC, disable C4772 compiler warning in arrow/util/logging.h
  • ARROW-2992 - [Python] Fix Parquet benchmark
  • ARROW-2992 - [CI] Remove some AppVeyor build configurations
  • ARROW-3006 - [GLib] Fix a bug that .gir/.typelib for GPU aren't installed
  • ARROW-3007 - [Packaging] Remove needless dependencies
  • ARROW-3011 - [CI] Remove Slack notification
  • ARROW-3012 - [Python] Fix setuptools_scm usage
  • ARROW-3013 - [Website] Fix download links on website for tarballs, checksums
  • ARROW-3015 - [Python] Fix typo in uint8() docstring
  • ARROW-3047 - [C++/Python] Better build instructions with ORC
  • ARROW-3049 - [C++/Python] Fix reading empty ORC file
  • ARROW-3053 - [Python] Add unit test for strided object conversion that was failing in 0.9.0
  • ARROW-3056 - [Python] Add notes to NativeFile docstrings for BufferedIOBase methods that are not implemented
  • ARROW-3061 - [JAVA] Fix BufferAllocator#getHeadroom (#2434)
  • ARROW-3065 - [Python] concat_tables() failing from bad Pandas Metadata
  • ARROW-3083 - [CI][skip appveyor]
  • ARROW-3093 - [C++] Linking errors with ORC enabled
  • ARROW-3095 - [Plasma] Move plasma store
  • ARROW-3098 - [C++/Python] Allow seeking at end of BufferReader and FixedSizeBufferWriter
  • ARROW-3100 - [GLib] Follow Homebrew change that lua splits luarocks
  • ARROW-3125 - [C++] Add support for finding libpython on MSYS2
  • ARROW-3125 - [Python] Update ASV instructions
  • ARROW-3132 - Regenerate 0.10.0 changelog given JIRA metadata updates
  • ARROW-3137 - [Python] pyarrow 0.10 requires newer version of numpy than specified in requirements
  • ARROW-3140 - [Plasma] Fix Plasma build with GPU support
  • ARROW-3141 - [Python] Raise numpy global requirement to 1.14
  • ARROW-3145 - [C++] Thrift compiler reruns in arrow/dbi/hiveserver2/thrift when using Ninja build
  • ARROW-3173 - [Rust] dynamic_types example does not run
  • ARROW-3175 - [Java] Switch to official flatbuffers Java artifact and com.github.icexelloss for flatc executable artifact
  • ARROW-3183 - [Python] Fix get_library_dirs on Windows
  • ARROW-3188 - [Python] Table.from_arrays segfaults if lists and schema are passed
  • ARROW-3190 - [C++] Rename Writeable references to Writable, add backwards compatibility, deprecations
  • ARROW-3206 - [C++] Fix CMake error when ARROW_HIVESERVER2=ON but tests disabled
  • ARROW-3227 - [Python] Require bytes-like input to NativeFile.write
  • ARROW-3228 - [Python] Do not allow PyObject_GetBuffer to obtain non-readonly Py_buffer when pyarrow Buffer is not mutable
  • ARROW-3231 - [Python] Sphinx's autodoc_default_flags is now deprecated
  • ARROW-3237 - [CI] Update linux packaging filenames in rat exclusion list
  • ARROW-3241 - [Plasma] test_plasma_list test failure on Ubuntu 14.04
  • ARROW-3251 - [C++] Fix conversion warnings in cast.cc
  • ARROW-3256 - ,3304: [JS] fix file footer inconsistency, yield all messages from the stream reader
  • ARROW-3271 - [Python] Manylinux1 builds timing out in Travis CI
  • ARROW-3279 - [C++] Allow linking Arrow tests dynamically on Windows
  • ARROW-3299 - [C++] Make RecordBatchBuilder non-copyable to appease MSVC
  • ARROW-3322 - [CI] Fix AppVeyor script to skip Rust job when no Rust changes
  • ARROW-3327 - [Python] Use local Arrow checkout instead of separate clone
  • ARROW-3338 - [Python] Crash when schema and columns do not match
  • ARROW-3342 - Appveyor builds have stopped triggering on GitHub
  • ARROW-3348 - [Plasma] Fix bug in which plasma store dies when object created by remo…
  • ARROW-3354 - [Python] Swap cuda.read_record_batch arguments
  • ARROW-3369 - [Packaging] Wheel builds are failing due to wheel 0.32 release
  • ARROW-3370 - [Packaging] Suppress BFD warnings on CentOS 6
  • ARROW-3373 - [Plasma] Fix bug when plasma client requests multiple objects and add test.
  • ARROW-3374 - [Python] Implicitly set from_pandas=True when passing pandas.Categorical to pyarrow.array. Preserve ordered categories
  • ARROW-3390 - [C++] cmake file under windows msys2 system doesn't work
  • ARROW-3393 - [C++] Add missing override on virtual dtor in task-group.cc
  • ARROW-3394 - [Java] Remove duplicate dependency in Flight for grpc-netty
  • ARROW-3403 - [Website] Source tarball link missing from install page
  • ARROW-3420 - [C++] Fix outstanding include-what-you-use issues in src/arrow, src/parquet codebases
  • PARQUET-232 - minor compilation issue
  • PARQUET-446 - Hide Thrift compiled headers and Boost from public API, #include scrubbing
  • PARQUET-454 - Fix inconsistencies with boolean PLAIN encoding
  • PARQUET-455 - Fix OS X / Clang compiler warnings
  • PARQUET-457 - Verify page deserialization for GZIP and SNAPPY codecs, related refactoring
  • PARQUET-469 - Roll back Thrift thirdparty and compiled sources to 0.9.0
  • PARQUET-472 - Changed the ownership of InputStream in ColumnReader.
  • PARQUET-505 - Column reader should automatically handle large data pages
  • PARQUET-507 - Reduce the runtime of rle-test
  • PARQUET-513 - Fail build if valgrind finds error during ctest, fix a core dump
  • PARQUET-525 - Add test coverage for failure modes in ParseMetaData
  • PARQUET-537 - Ensure that LocalFileSource is properly closed.
  • PARQUET-549 - Add column reader tests for dictionary pages
  • PARQUET-555 - Dictionary page metadata handling inconsistencies
  • PARQUET-561 - Add destructor to PIMPL
  • PARQUET-599 - Better size estimation for levels
  • PARQUET-604 - Add writer headers to installation
  • PARQUET-614 - Remove unneeded LZ4-related code
  • PARQUET-620 - Ensure metadata is written only once
  • PARQUET-621 - Add flag to indicate if decimalmetadata is set
  • PARQUET-629 - RowGroupSerializer should only close itself once
  • PARQUET-639 - Do not export DCHECK in public headers
  • PARQUET-643 - Add const modifier to schema pointer reference
  • PARQUET-657 - Do not define DISALLOW_COPY_AND_ASSIGN if already defined
  • PARQUET-658 - Add virtual destructor to ColumnReader
  • PARQUET-659 - Export extern templates for typed column reader/writer classes
  • PARQUET-662 - Compile ParquetException implementation and explicitly export
  • PARQUET-676 - Fix incorrect MaxBufferSize for small bit widths
  • PARQUET-691 - Write ColumnChunk metadata after chunk is complete
  • PARQUET-694 - Revert default data page size back to 1M
  • PARQUET-700 - Disable dictionary encoding for boolean columns
  • PARQUET-701 - Ensure that Close can be called multiple times
  • PARQUET-702 - Add a writer + reader example with detailed comments
  • PARQUET-702 - Add a writer + reader example with detailed comments
  • PARQUET-703 - Validate that ColumnChunk metadata counts nulls in num_values
  • PARQUET-704 - Install scan-all.h
  • PARQUET-708 - account for "worst case scenario" in MaxBufferSize for bit_width > 1
  • PARQUET-710 - Remove unneeded private member variables from RowGroupReader ABI
  • PARQUET-711 - Use metadata builders in parquet writer
  • PARQUET-711 - Use metadata builders in parquet writer
  • PARQUET-718 - Fix I/O of non-dictionary encoded pages
  • PARQUET-719 - Fix WriterBatch API to handle NULL values
  • PARQUET-720 - Mark ScanAllValues as inline to prevent link error
  • PARQUET-739 - Don't use a static buffer for data accessed by multiple threads
  • PARQUET-739 - Don't use a static buffer for data accessed by multiple threads
  • PARQUET-741 - Always allocate fresh buffers while compressing
  • PARQUET-742 - Add missing license headers
  • PARQUET-745 - TypedRowGroupStatistics fails to PlainDecode min and max in ByteArrayType
  • PARQUET-747 - Better hide TypedRowGroupStatistics in public API
  • PARQUET-759 - Fix handling of columns of empty strings
  • PARQUET-760 - Store correct encoding in fallback data pages
  • PARQUET-764 - Support batches for PLAIN boolean writes that aren't a multiple of 8
  • PARQUET-766 - Expose ParquetFileReader through Arrow reader as const
  • PARQUET-775 - Make TrackingAllocator thread-safe
  • PARQUET-779 - Export TypedRowGroupStatistics in libparquet
  • PARQUET-780 - WriterBatch API does not properly handle NULL values for byte array types
  • PARQUET-789 - Catch/translate ParquetExceptions in parquet::arrow::FileReader
  • PARQUET-793 - Do not return incorrect statistics
  • PARQUET-797 - Updates for ARROW-418 header API changes
  • PARQUET-799 - Fix bug in MemoryMapSource::CloseFile
  • PARQUET-812 - Read BYTE_ARRAY with no logical type as arrow::BinaryArray
  • PARQUET-816 - Workaround for incorrect column chunk metadata in parquet-mr <= 1.2.8
  • PARQUET-818 - Refactoring to utilize common IO, buffer, memory management abstractions and implementations
  • PARQUET-818 - Refactoring to utilize common IO, buffer, memory management abstractions and implementations
  • PARQUET-819 - Don't try to install no longer existing arrow/utils.h
  • PARQUET-827 - Account for arrow::MemoryPool API change and fix bug in reading Int96 timestamps
  • PARQUET-828 - Do not implicitly cast ParquetVersion enum to int
  • PARQUET-837 - Remove RandomAccessSource::Seek method which can be a source of thread safety problems
  • PARQUET-841 - Version number being incorrectly written for v1 files
  • PARQUET-842 - Do not set unnecessary fields in the parquet::SchemaElement
  • PARQUET-843 - Impala is thrown off by a REPEATED root schema node
  • PARQUET-846 - CpuInfo::Init() is not thread safe
  • PARQUET-880 - Prevent destructors from throwing
  • PARQUET-888 - Add missing virtual dtor.
  • PARQUET-889 - Fix compilation when SSE is enabled
  • PARQUET-892 - Specify public link targets for parquet_static so that transitive dependencies are linked in executables
  • PARQUET-895 - Fix broken reading of nested repeated columns
  • PARQUET-898 - Upgrade to googletest 1.8.0, move back to Xcode 6.4 in Travis CI
  • PARQUET-908 - Fix shared library visibility of some symbols in types.h
  • PARQUET-914 - Rewording exception message in column writer.
  • PARQUET-915 - Support additional Arrow date/time types and metadata
  • PARQUET-918 - Keep ordering in column indices when converting Parquet Schema
  • PARQUET-918 - FromParquetSchema API crashes on nested schemas
  • PARQUET-919 - Account for ARROW-683 changes, but make no functional changes. Set PARQUET_ARROW=on by default
  • PARQUET-923 - Account for Time type changes in Arrow
  • PARQUET-933 - Account for API changes in ARROW-728
  • PARQUET-936 - Return Invalid Status if chunk_size <= 0 when WriteTable in parquet-arrow
  • PARQUET-943 - Fix build error on x86
  • PARQUET-947 - Account for Arrow library consolidation in ARROW-795, API changes in ARROW-782
  • PARQUET-958 - [C++] Print Parquet metadata in JSON format
  • PARQUET-958 - [C++] Print Parquet metadata in JSON format
  • PARQUET-963 - Return NotImplemented when attempting to read a struct field
  • PARQUET-965 - Add FIXED_LEN_BYTE_ARRAY read and write support in parquet-arrow
  • PARQUET-979 - Limit size of min, max or disable stats for long binary types
  • PARQUET-992 - Do not transitively include zlib.h in public API
  • PARQUET-995 - Use sizeof(Int96) instead of Int96Type
  • PARQUET-997 - Fix override compiler warnings
  • PARQUET-1002 - Compute statistics based on Sort Order
  • PARQUET-1003 - Modify DEFAULT_CREATED_BY value for every new release v…
  • PARQUET-1007 - Update parquet.thrift
  • PARQUET-1029 - [C++] Some extern template symbols not being exported in gcc
  • PARQUET-1029 - [C++] Some extern template symbols not being exported in gcc
  • PARQUET-1033 - Improve documentation about WriteBatchSpaced
  • PARQUET-1038 - Key value metadata should be nullptr if not set
  • PARQUET-1040 - Add missing writer methods
  • PARQUET-1042 - Fix Compilation breaks on GCC 4.8
  • PARQUET-1048 - Apache Arrow static transitive dependencies
  • PARQUET-1048 - Apache Arrow static transitive dependencies
  • PARQUET-1054 - Fixes for Arrow API changes in ARROW-1199
  • PARQUET-1071 - Check that arrow::FileWriter::Close() is idempotent
  • PARQUET-1085 - [C++] Use namespaced macros from arrow/util/macros.h, work around UNUSED rename
  • PARQUET-1088 - Remove parquet_version.h from version control since it gets auto generated
  • PARQUET-1090 - Add max row group length option, fix int32 overflow
  • PARQUET-1098 - Install util/comparison.h
  • PARQUET-1100 - Introduce RecordReader interface to better support nested data, refactor parquet/arrow/reader
  • PARQUET-1108 - Fix Int96 comparators
  • PARQUET-1114 - Apply changes for ARROW-1601 ARROW-1611, change shared l…
  • PARQUET-1121 - Handle Dictionary[Null] arrays on writing Arrow tables
  • PARQUET-1123 - [C++] Update parquet-cpp to use Arrow's AssertArraysEqual
  • PARQUET-1138 - Fix Arrow 0.7.1 build
  • PARQUET-1167 - [C++] FieldToNode function should return a status when throwing an exception
  • PARQUET-1175 - Fix arrow::ArrayData method rename from ShallowCopy to Copy
  • PARQUET-1179 - Upgrade to Thrift 0.11, use std::shared_ptr instead of boost::shared_ptr
  • PARQUET-1180 - Fix behaviour of num_children element of primitive nodes
  • PARQUET-1193 - [CPP] Implement ColumnOrder to support min_value and max_value
  • PARQUET-1226 - Fixes for CHECKIN compiler warning level with clang 5.0
  • PARQUET-1233 - Enable option to switch between stl classes and boost c…
  • PARQUET-1245 - Fix creating Arrow table with duplicate column names
  • PARQUET-1255 - Fix error message when PARQUET_TEST_DATA isn't defined
  • PARQUET-1265 - Segfault on static ApplicationVersion initialization
  • PARQUET-1268 - Fix conversion of null list Arrow arrays
  • PARQUET-1270 - Install executable tools
  • PARQUET-1272 - Return correct row count for nested columns in ScanFileContents
  • PARQUET-1273 - Properly write dictionary values when writing in chunks
  • PARQUET-1274 - Prevent segfault that was occurring when writing a nanosecond timestamp with arrow writer properties set to coerce timestamps and support deprecated int96 timestamps.
  • PARQUET-1283 - [C++] Remove trailing space for string and int96 statis…
  • PARQUET-1307 - Fix memory-test for newer Arrow
  • PARQUET-1315 - ColumnChunkMetaData.has_dictionary_page() should return…
  • PARQUET-1333 - [C++] Reading of files with dictionary size 0 fails on Windows with bad_alloc
  • PARQUET-1334 - [C++] memory_map parameter seems missleading in parquet file opener
  • PARQUET-1357 - FormatStatValue truncates binary statistics on zero character
  • PARQUET-1358 - index_page_offset should be unset as it is not supported
  • PARQUET-1369 - Disregard column sort order if statistics max/min are equal
  • PARQUET-1384 - fix clang build error for bloom_filter-test.cc

Apache Arrow 0.10.0 (2018-08-06)

Bug Fixes

  • ARROW-198 - [Java] OutOfMemoryError for vector test case
  • ARROW-640 - [Python] Implement hash and equality for Array scalar values Arrow scalar values
  • ARROW-2020 - [Python] Parquet segfaults if coercing ns timestamps and writing 96-bit timestamps
  • ARROW-2059 - [Python] Possible performance regression in Feather read/write path
  • ARROW-2101 - [Python/C++] Correctly convert numpy arrays of bytes to arrow arrays of strings when user specifies arrow type of string
  • ARROW-2122 - [Python] Pyarrow fails to serialize dataframe with timestamp.
  • ARROW-2182 - [Python] Build C++ libraries in benchmarks build step
  • ARROW-2189 - [C++] Seg. fault on make_shared<PoolBuffer>
  • ARROW-2193 - [Plasma] plasma_store has runtime dependency on Boost shared libraries when ARROW_BOOST_USE_SHARED=on
  • ARROW-2195 - [Plasma] Return auto-releasing buffers
  • ARROW-2247 - [Python] Statically-linking boost_regex in both libarrow and libparquet results in segfault
  • ARROW-2273 - [Python] Raise NotImplementedError when pandas Sparse types serializing
  • ARROW-2300 - [C++/Python] Integration test for HDFS
  • ARROW-2305 - [Python] Bump Cython requirement to 0.27+
  • ARROW-2314 - [C++/Python] Fix union array slicing
  • ARROW-2326 - [Python] Use @loader_path/ as rpath instead of @loader_path when bundling C++ libraries in wheels on macOS
  • ARROW-2328 - [C++] Fixed and unit tested feather writing with slice
  • ARROW-2331 - [Python] Fix indexing for negative or out-of-bounds indices
  • ARROW-2333 - [Python] Fix bundling boost with default namespace
  • ARROW-2342 - [Python] Allow pickling more types
  • ARROW-2346 - [Python] Fix PYARROW_CXX_FLAGS with multiple options
  • ARROW-2349 - [Python] Opt in to bundling Boost shared libraries separately
  • ARROW-2351 - [C++] StringBuilder::append(vector<string>...) not impleme…
  • ARROW-2354 - [C++] Make PyDecimal_Check() faster
  • ARROW-2355 - [Python] Unable to import pyarrow [0.9.0] OSX
  • ARROW-2357 - [Python] Add microbenchmark for PandasObjectIsNull()
  • ARROW-2368 - [JAVA] Correctly pad negative values in DecimalVector#setBigEndian (#1809)
  • ARROW-2369 - [Python] Fix reading large Parquet files (> 4 GB)
  • ARROW-2370 - [GLib] Fix include path in .pc on Meson build
  • ARROW-2371 - [GLib] Update "Requires" in .pc on GNU Autotools build
  • ARROW-2372 - [Python] ArrowIOError: Invalid argument when reading Parquet file
  • ARROW-2375 - [Rust] Implement Drop for Buffer so memory is released
  • ARROW-2377 - [GLib] Support old GObject Introspection
  • ARROW-2380 - [Python] Streamline conversions
  • ARROW-2382 - [Rust] Bug fix: List was not using aligned mem
  • ARROW-2383 - [deb] Use system Protocol Buffers
  • ARROW-2387 - [Python] Flip test for rescale loss if value < 0
  • ARROW-2391 - [C++/Python] Segmentation fault from PyArrow when mapping Pandas datetime column to pyarrow.date64
  • ARROW-2393 - [C++][PREPEND] macros from status.h into util/logging.h since they use the logging infrastructure and shouldn't be in the public API.
  • ARROW-2403 - [C++] arrow::CpuInfo::model_name_ destructed twice on exit
  • ARROW-2405 - [C++] <function> is required for std::function
  • ARROW-2418 - [Rust] BUG FIX: reserve memory when building list
  • ARROW-2419 - [Site] Hard-code timezone
  • ARROW-2420 - [Rust] Fix major memory bug and add benches
  • ARROW-2421 - [C++] Update LLVM version in cpp README
  • ARROW-2423 - [Python] Enable DataType, Field and plasma ObjectID equality checks against no…
  • ARROW-2424 - [Rust] Fix build - add missing import
  • ARROW-2425 - [Rust] BUG FIX: Add u8 mappings for Array::from
  • ARROW-2426 - [GLib] Follow python -> python@3 change in Homebrew
  • ARROW-2432 - [Python] Fix Pandas decimal type conversion with None values
  • ARROW-2437 - [C++] Add ReadMessage without aligned argument.
  • ARROW-2438 - [Rust] memory_pool.rs misses license header
  • ARROW-2441 - [Rust] Builder<T>::slice_mut assertions are too strict
  • ARROW-2443 - [Python] Allow creation of empty Dictionary indices
  • ARROW-2450 - [Python] Test for Parquet roundtrip of null lists
  • ARROW-2452 - [TEST] Spark integration test fails with permission error
  • ARROW-2454 - [C++] Allow zero-array chunked arrays
  • ARROW-2455 - [C++] Initialize the atomic bytes_allocated_ properly
  • ARROW-2457 - [GLib] Support large is_valids in builder's append_values()
  • ARROW-2459 - pyarrow: Segfault with pyarrow.deserialize_pandas
  • ARROW-2462 - [C++] Fix Segfault in UnpackBinaryDictionary
  • ARROW-2465 - [Plasma/GPU] Preserve plasma_store rpath
  • ARROW-2466 - [C++] Fix "append" flag to FileOutputStream
  • ARROW-2468 - [Rust] Builder::slice_mut() should take mut self.
  • ARROW-2471 - [Rust] Builder zero capacity fix
  • ARROW-2473 - [Rust] List empty slice assertion
  • ARROW-2474 - [Rust] Add windows support for memory pool abstraction
  • ARROW-2489 - [Plasma] Fix PlasmaClient ABI variation
  • ARROW-2491 - [Python] raise NotImplementedError on from_buffers with nested types
  • ARROW-2492 - [Python] Prevent segfault on accidental call of pyarrow.Array
  • ARROW-2500 - [Java] IPC Writers/readers are not always setting validity bits correctly
  • ARROW-2502 - [Rust] Restore Windows Compatibility
  • ARROW-2503 - [Python] Prevent trailing space character for string statistics
  • ARROW-2509 - Build for node 9.8
  • ARROW-2510 - [Python] Segmentation fault when converting empty column as categorical
  • ARROW-2511 - [Java] Fix BaseVariableWidthVector.allocateNew to not swallow exception (#1947)
  • ARROW-2514 - [Python] Speed up inferring nested Numpy array
  • ARROW-2515 - [Python] Add DictionaryValue class, fixing bugs with nested dictionaries
  • ARROW-2518 - [Java] Re-instate JDK tests in matrix, but with JDK 8 instead of JDK 7
  • ARROW-2530 - [GLib] Support out-of-source directory build again
  • ARROW-2534 - [C++] Hide all zlib symbols from libarrow.so
  • ARROW-2545 - [Python] Link against required system libraries
  • ARROW-2554 - [Python] fix timestamp unit detection from python lists
  • ARROW-2557 - [Rust] Add badge for code coverage in README
  • ARROW-2561 - [C++] Fix double free in cuda-test under code coverage
  • ARROW-2564 - [C++] Replace deprecated method in documentation
  • ARROW-2565 - [Plasma] new subscriber cannot receive notifications about existing objects
  • ARROW-2570 - [Python] Add support for writing parquet files with LZ4 compression
  • ARROW-2571 - [C++] Lz4Codec doesn't properly handle empty data
  • ARROW-2575 - [Python] Exclude hidden files starting with . in ParquetManifest
  • ARROW-2578 - [Plasma] Use mersenne twister to generate random number
  • ARROW-2589 - [Python] Workaround regression in Pandas 0.23.0
  • ARROW-2593 - [Python] TypeError: data type "mixed-integer" not understood
  • ARROW-2594 - [Java] When realloc Vectors, zero out all unfilled bytes of new buffer
  • ARROW-2599 - [Python] pip install is not working without Arrow C++ being installed
  • ARROW-2601 - [Python] Prevent user from calling *MemoryPool constructors directly
  • ARROW-2603 - [Python] Allow date and datetime subclassing
  • ARROW-2615 - [Rust] Post refactor cleanup
  • ARROW-2622 - [C++] Array methods IsNull and IsValid are not complementary
  • ARROW-2629 - [Plasma] Iterator invalidation for pending_notifications_
  • ARROW-2630 - [JAVA] typo fix
  • ARROW-2632 - [Java] ArrowStreamWriter accumulates ArrowBlock but does not use them
  • ARROW-2640 - [JS] Write schema metadata
  • ARROW-2642 - [Python] Fail building parquet binding on Windows
  • ARROW-2643 - [C++] Travis-CI build failure with cpp toolchain enabled
  • ARROW-2644 - [Python] Fix prototype declaration in Parquet binding
  • ARROW-2655 - [C++] Fix compiler warnings with gcc 7
  • ARROW-2657 - [Python] Import TensorFlow python extension before pyarrow to avoid segfault
  • ARROW-2668 - [C++] Suppress -Wnull-pointer-arithmetic when compiling plasma/malloc.cc on clang
  • ARROW-2669 - [C++] EP_CXX_FLAGS not passed on when building gbenchmark
  • ARROW-2675 - Fix build error with clang-10 (Apple Clang / LLVM)
  • ARROW-2683 - [Python] Resource Warning (Unclosed File) when using pyarrow.parquet.read_table()
  • ARROW-2690 - [Plasma] Use uniform function names in public APIs in Plasma. Add namespace around Flatbuffers
  • ARROW-2691 - [Rust] Update code formatting with latest Rust stable
  • ARROW-2693 - [Python] pa.chunked_array causes a segmentation fault on empty input
  • ARROW-2694 - - [Python] ArrayValue string conversion returns the representation instead of the converted python object string
  • ARROW-2698 - [Python] Exception when passing a string to Table.column
  • ARROW-2711 - [Python] Fix inference from Pandas column with first empty list
  • ARROW-2715 - Address apt flakiness with launchpad.net
  • ARROW-2716 - [Python] Make manylinux1 base image independent of Python patch releases
  • ARROW-2721 - [C++] Fix ORC and Protocol Buffers link error
  • ARROW-2722 - [Python] Sanitize dtype number to handle edge cases
  • ARROW-2723 - [C++] Add .pc for arrow orc
  • ARROW-2726 - [C++] Fix the latest Boost version
  • ARROW-2727 - [Java] Fix POM file issue causing build failure in java/adapters/jdbc
  • ARROW-2741 - [Python][D] and type=pa.date64 produces invalid results
  • ARROW-2744 - [C++] Avoid creating list arrays with a null values buffer
  • ARROW-2745 - [C++] ORC ExternalProject needs to declare dependency on vendored protobuf
  • ARROW-2747 - [Python] Fix huge pages Plasma test
  • ARROW-2754 - [Python] Change Python setup.py to make release builds by default
  • ARROW-2770 - [Packaging] Account for conda-forge compiler migration in conda recipes
  • ARROW-2773 - [Python] corrected partition_cols parameter name
  • ARROW-2781 - [Python] Download boost using curl in manylinux1 image
  • ARROW-2787 - [Python] Fix Cython usage instructions
  • ARROW-2795 - [Python] Run TensorFlow import workaround only on Linux platforms
  • ARROW-2806 - [C++/Python] More consistent null/nan handling
  • ARROW-2810 - [Plasma] Remove flatbuffers from public API
  • ARROW-2812 - [Ruby] interface for Arrow::StructArray
  • ARROW-2820 - [Python] Check that array lengths in RecordBatch.from_arrays are all the same
  • ARROW-2823 - [C++] Search for flatbuffers in <root>/lib64
  • ARROW-2841 - [Go] support building in forks
  • ARROW-2850 - [C++/Python] Correctly set RPATHs on all binaries
  • ARROW-2851 - [C++] Update RAT excludes for new install file names
  • ARROW-2852 - [Rust] Make Array sync and send
  • ARROW-2856 - [Python/C++] Array constructor should not truncate floats when casting to int
  • ARROW-2862 - [C++] Ensure thirdparty download directory has been created in thirdparty/download_thirdparty.sh
  • ARROW-2867 - [Python] Incorrect example for Cython usage
  • ARROW-2871 - [Python] Raise when calling to_numpy() on boolean array
  • ARROW-2872 - [Python] Add tensorflow mark to opt-in to TF-related unit tests
  • ARROW-2876 - [Packaging] Replace ssh-URLs with https://
  • ARROW-2877 - [Packaging] crossbow submit results in duplicate Travis CI build
  • ARROW-2878 - [Packaging] README.md does not mention setting GitHub API token in user's crossbow repo settings
  • ARROW-2883 - [C++] Fix Clang warnings in code built with -DARROW_GPU=ON
  • ARROW-2891 - [Python] Preserve schema in write_to_dataset
  • ARROW-2894 - [Glib] Adjust tests to format refactor
  • ARROW-2895 - [CI] Add missing Ruby dependency on C++
  • ARROW-2896 - [GLib] Add missing exports
  • ARROW-2901 - [Java] Build is failing on Java9
  • ARROW-2902 - [Python] Clean up after build artifacts created by root docker user in HDFS integration test
  • ARROW-2903 - [C++] Setting -DARROW_HDFS=OFF breaks arrow build when linking against boost libraries
  • ARROW-2911 - [Python] Parquet binary statistics that end in '\0' truncate last byte
  • ARROW-2917 - [Python] Use detach() to avoid PyTorch gradient errors
  • ARROW-2920 - [Python] Fix pytorch segfault
  • ARROW-2926 - [Python] Do not attempt to write tables with invalid schemas in ParquetWriter.write_table
  • ARROW-2930 - [C++] migrated MacOS specific code for shared library target
  • ARROW-2940 - [Python] Fix OSError when trying to load libcaffe2.so in pytorch 0.3.0
  • ARROW-2945 - [Packaging] Update argument check
  • ARROW-2955 - Fix typo in pyarrow's HDFS API result
  • ARROW-2963 - [C++] Make thread pool fork-safe
  • ARROW-2978 - [Rust] Travis CI build is failing
  • ARROW-2982 - The "--show-progress" option is only supported in wget 1.16 and higher
  • ARROW-3210 - [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

New Features and Improvements

  • ARROW-530 - [C++/Python] Provide subpools for better memory allocation …
  • ARROW-564 - [Python] Add Array.to_numpy()
  • ARROW-665 - C++: Move zeroing logic for (re)allocations to the Allocator
  • ARROW-889 - [Python/C++] Unify PrettyPrints between Python and C++
  • ARROW-902 - [C++] Script for downloading all thirdparty build dependencies and configuration for offline builds
  • ARROW-906 - [C++/Python] Read and write field metadata in IPC
  • ARROW-1018 - [C++] Create FileOutputStream, ReadableFile from file descriptor
  • ARROW-1163 - [Java] Java client support for plasma
  • ARROW-1388 - [Python] Add Table.drop method for removing columns
  • ARROW-1454 - [Python] Also match ArrowNotImplementedError in unsupported type conversions from pandas
  • ARROW-1715 - [Python] Implement pickling for Column, ChunkedArray, RecordBatch, Table
  • ARROW-1722 - [C++] Add linting script to find C++/CLI incompatibilities
  • ARROW-1731 - [Python] Add columns selector in Table.from_array
  • ARROW-1744 - [Plasma] Provide TensorFlow operator to transfer Tensors between Plasma and TensorFlow
  • ARROW-1780 - - JDBC Adapter to convert Relational Data objects to Arrow Data Format Vector Objects (#1759)
  • ARROW-1858 - [Python] Added documentation for pq.write_dataset
  • ARROW-1868 - [Java] Change vector getMinorType to use MinorType instead of Types.MinorType
  • ARROW-1886 - [C++/Python] Flatten struct columns in table
  • ARROW-1913 - [Java] Disable Javadoc doclint with Java 8
  • ARROW-1928 - [C++] Add BitmapReader/BitmapWriter benchmarks
  • ARROW-1954 - [Python] Add metadata accessor to pyarrow.Field
  • ARROW-1964 - [Python] Expose StringBuilder to Python
  • ARROW-2014 - [Python] Document read_pandas method in pyarrow.parquet
  • ARROW-2055 - [Java] Upgrade to Java 8
  • ARROW-2060 - [Python] Documentation for creating StructArray using from_arrays or a sequence of dicts
  • ARROW-2061 - [C++] Run ASAN builds in Travis CI
  • ARROW-2074 - [Python] Infer lists of dicts as struct arrays
  • ARROW-2097 - [CI, Python] Reduce Travis-CI verbosity
  • ARROW-2100 - [Python] Drop Python 3.4 support
  • ARROW-2140 - [Python] Improve float16 support
  • ARROW-2141 - [Python] Support variable length binary conversion from Pandas
  • ARROW-2147 - [Python] Fix type inference of numpy arrays
  • ARROW-2207 - [GLib] Support GArrowDecimal128
  • ARROW-2222 - handle untrusted inputs
  • ARROW-2224 - [C++] Remove boost-regex dependency
  • ARROW-2241 - [Python] Simple script for running all current ASV benchmarks at a commit or tag
  • ARROW-2264 - [Python] Efficiently serialize numpy arrays with dtype of unicode fixed length string
  • ARROW-2267 - Rust bindings
  • ARROW-2276 - [Python] Expose buffer protocol on Tensor
  • ARROW-2281 - [Python] Add Array.from_buffers()
  • ARROW-2285 - [C++/Python] Can't convert Numpy string arrays
  • ARROW-2286 - [C++/Python] Allow subscripting pyarrow.lib.StructValue
  • ARROW-2287 - [Python] chunked array not iterable, not indexable
  • ARROW-2299 - [Go] Import Go arrow implementation from influxdata/arrow
  • ARROW-2301 - [Python] Build source distribution inside the manylinux1 docker
  • ARROW-2302 - [GLib] Unify GNU Autotools build and Meson build into one Travis CI job
  • ARROW-2308 - [Python] Make deserialized numpy arrays 64-byte aligned.
  • ARROW-2315 - [C++/Python] Flatten struct array
  • ARROW-2319 - [C++] Add BufferedOutputStream class
  • ARROW-2322 - [Java] Document dev environment requirements for publishing Java release artifacts
  • ARROW-2325 - [Python] Update setup.py to use Markdown project description
  • ARROW-2330 - [C++] Optimize delta buffer creation with partially finishable array builders
  • ARROW-2332 - Add Feather Dataset class
  • ARROW-2332 - Feather Reader option to return Table
  • ARROW-2334 - [C++] Update boost to 1.66.0
  • ARROW-2335 - [Go] move README one directory higher
  • ARROW-2340 - [Website] Add blog post about Go code donation
  • ARROW-2341 - [Python] Improve pa.union() mode argument behaviour
  • ARROW-2343 - [Java/Packaging] Run mvn clean in API doc builds
  • ARROW-2344 - [Go] Run Go unit tests in Travis CI
  • ARROW-2345 - [Documentation] Fix bundle exec and set sphinx nosidebar to True
  • ARROW-2348 - [GLib] Remove GLib + Go example
  • ARROW-2350 - Consolidated RUN step in spark_integration Dockerfile
  • ARROW-2353 - [CI] Check correctness of built wheel on AppVeyor
  • ARROW-2361 - [Rust] Starting point for a native Rust implementation of Arrow
  • ARROW-2364 - [Plasma] PlasmaClient::Get() could take vector of object ids
  • ARROW-2376 - [Rust] Travis builds the Rust library
  • ARROW-2378 - [Rust] Rustfmt
  • ARROW-2381 - [Rust] Adds iterator support to Buffer<T>
  • ARROW-2384 - [Rust] Additional test & Trait standardization
  • ARROW-2385 - [Rust] implement to_json for DataType and Field
  • ARROW-2388 - [C++] Use valid_bytes API for StringBuilder::Append
  • ARROW-2389 - [C++] Add CapacityError
  • ARROW-2390 - [C++/Python] Map Python exceptions to Arrow status codes
  • ARROW-2394 - [Python] Correct flake8 errors in benchmarks
  • ARROW-2395 - [Python] Fix flake8 warnings outside of pyarrow/ directory. Check in CI
  • ARROW-2396 - [Rust] Unify Rust Errors
  • ARROW-2397 - [Documentation] Update format documentation to describe tensor alignment.
  • ARROW-2398 - [Rust] Create Builder<T> for building buffers directly in aligned memory
  • ARROW-2400 - [C++] Fix Status destructor performance
  • ARROW-2401 - Support filters on Hive partitioned Parquet files
  • ARROW-2402 - [C++] Avoid spurious copies with FixedSizeBinaryBuilder
  • ARROW-2404 - [C++] Fix "declaration of 'type_id' hides class member" w…
  • ARROW-2407 - [GLib] Add garrow_string_array_builder_append_values()
  • ARROW-2408 - [Rust][T] fromBuffer<T>`
  • ARROW-2408 - [Rust] Remove build warnings
  • ARROW-2411 - [C++] Add StringBuilder::Append(const char **values)
  • ARROW-2413 - [Rust] Remove useless calls to format!().
  • ARROW-2414 - Fix a variety of typos.
  • ARROW-2415 - [Rust] Fix clippy ref-match-pats warnings.
  • ARROW-2416 - [C++] Support system libprotobuf
  • ARROW-2417 - [Rust] Fix API safety issues
  • ARROW-2422 - Support more operators for partition filtering
  • ARROW-2427 - [C++] Implement ReadAt properly
  • ARROW-2430 - [Packaging] MVP for branch based packaging automation
  • ARROW-2433 - [Rust][T] )
  • ARROW-2434 - [Rust] Add windows support
  • ARROW-2435 - [Rust] Add memory pool abstraction.
  • ARROW-2436 - [Rust] Add windows CI
  • ARROW-2439 - [Rust] Run license header checks also in Rust CI entry
  • ARROW-2440 - [Rust] Implement ListBuilder<T>
  • ARROW-2442 - [C++] Disambiguate builder Append() overloads
  • ARROW-2445 - [Rust] Add documentation and make some fields private
  • ARROW-2448 - [Plasma] Reference counting for PlasmaClient::Impl
  • ARROW-2451 - [Python] Handle non-object arrays more efficiently in custom serializer.
  • ARROW-2453 - [Python] Improve Table column access
  • ARROW-2458 - [Plasma] Use one thread pool per PlasmaClient
  • ARROW-2463 - [C++] Update flatbuffers to 1.9.0
  • ARROW-2464 - [Python] Use a python_version marker instead of a condition
  • ARROW-2469 - [C++] Make out arguments last in ReadMessage.
  • ARROW-2470 - [C++] Avoid seeking in GetFileSize
  • ARROW-2472 - [Rust] Remove public attributes from Schema and Field and add accessors
  • ARROW-2477 - [Rust] Set up code coverage in CI
  • ARROW-2478 - [C++] Introduce a checked_cast function that performs a dynamic_cast in debug mode
  • ARROW-2479 - [C++] Add ThreadPool class
  • ARROW-2480 - [C++] Enable casting the value of a decimal to int32_t or int64_t
  • ARROW-2481 - [Rust] Move all calls to free() into memory.rs
  • ARROW-2482 - [Format] Clarify struct field alignment
  • ARROW-2484 - [C++] Document ABI compliance checking
  • ARROW-2485 - Re-write of run_clang_format.py, such that it outputs the diffs of th…
  • ARROW-2486 - [C++/Python] Provide a Docker image that contains all dependencies for development
  • ARROW-2488 - [C++] Add Boost 1.67 and 1.68 as recognized versions
  • ARROW-2493 - [Python] Add support for pickling to buffers and arrays
  • ARROW-2494 - [C++] Return status codes from PlasmaClient::Seal instead of crashing
  • ARROW-2498 - [Java] Use java 1.8 instead of java 1.7
  • ARROW-2499 - [C++] Factor out Python iteration routines
  • ARROW-2505 - [C++] Disable MSVC warning C4800
  • ARROW-2506 - [Plasma] Build error on macOS
  • ARROW-2507 - [Rust] Don't take a reference when not needed.
  • ARROW-2508 - [Python] Fix pytest.raises msg to message
  • ARROW-2513 - [Python] DictionaryType should give access to index type and dictionary array
  • ARROW-2516 - [CI] Filter changes in AppVeyor builds
  • ARROW-2521 - [Rust] Refactor Rust API to use traits and generic to represent Array instead of enum
  • ARROW-2522 - [C++] Version shared library files
  • ARROW-2525 - [GLib] Add garrow_struct_array_flatten()
  • ARROW-2526 - [GLib] Update .gitignore
  • ARROW-2527 - [GLib] Enable GPU document
  • ARROW-2528 - [Rust] Add trait bounds for T in Buffer/List
  • ARROW-2529 - [C++] Update mention of clang-format to 5.0 in the docs
  • ARROW-2531 - [C++] Update clang bits to 6.0
  • ARROW-2533 - [CI] Fast finish failing AppVeyor builds
  • ARROW-2536 - [Rust] optimize capacity allocation for ListBuilder
  • ARROW-2537 - [Ruby] Import
  • ARROW-2539 - [Plasma] Use unique_ptr instead of raw pointer
  • ARROW-2540 - [Plasma] Create constructors & destructors for ObjectTableEntry
  • ARROW-2541 - [Plasma] Replace macros with constexpr
  • ARROW-2543 - [Rust] Cache dependencies when building our rust library
  • ARROW-2544 - [CI] Run the C++ tests with two jobs
  • ARROW-2547 - Fix off-by-one in List<List<byte>> example
  • ARROW-2548 - Clarify List<Char> Array example
  • ARROW-2549 - [GLib] Apply arrow::StatusCode changes to GArrowError
  • ARROW-2550 - [C++] Add missing status codes into arrow::Status::CodeAsString()
  • ARROW-2551 - [Plasma] Improve notification logic
  • ARROW-2552 - [Plasma] Fix memory error
  • ARROW-2553 - [Python] Set MACOSX_DEPLOYMENT_TARGET in wheel build
  • ARROW-2558 - [Plasma] avoid walk through all the objects when a client disconnects
  • ARROW-2562 - [CI] C++ and Rust code coverage using codecov.io
  • ARROW-2563 - [Rust] Poor caching in Travis-CI
  • ARROW-2566 - [CI] Add codecov.io badge
  • ARROW-2567 - [C++] Not only compare type ids on Array equality
  • ARROW-2568 - [Python] Expose thread pool size setting to Python, and deprecate "nthreads" where possible
  • ARROW-2569 - [C++] Improve thread pool size heuristic
  • ARROW-2574 - [Python] Add Cython and Python code coverage
  • ARROW-2576 - [GLib] Add abs functions for Decimal128
  • ARROW-2577 - [Plasma] Add asv benchmarks for plasma
  • ARROW-2580 - [GLib] Fix abs functions for Decimal128
  • ARROW-2582 - [GLib] Add negate functions for Decimal128
  • ARROW-2585 - [C++] Add Decimal::FromBigEndian, which was formerly a static method in parquet-cpp/src/parquet/arrow/reader.cc
  • ARROW-2586 - [C++] Changing the type of ListBuilder's and StructBuilder's children from unique_ptr to shared_ptr so that it can support deserialization from Parquet to Arrow with arbitrary nesting
  • ARROW-2595 - [Plasma] to avoid producing garbage data
  • ARROW-2596 - [GLib] Use the default value of GTK-Doc
  • ARROW-2597 - [Plasma] remove UniqueIDHasher
  • ARROW-2604 - [Java] Add convenience method to VarCharVector to set Text
  • ARROW-2608 - [Java/Python] Add pyarrow.{Array,Field}.from_jvm / jvm_buffer
  • ARROW-2611 - [Python] Fix Python 2 integer serialization
  • ARROW-2612 - [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
  • ARROW-2613 - [Docs] Update the gen_apidocs docker script
  • ARROW-2614 - Remove 'group: deprecated' in Travis
  • ARROW-2626 - [Python] Add column name to exception message when writing pandas df fails
  • ARROW-2634 - [Go] Add Go license details to LICENSE.txt
  • ARROW-2635 - [Ruby] Add LICENSE.txt and NOTICE.txt for Apache Arrow Ruby
  • ARROW-2636 - [Ruby] Add missing "unofficial" notes
  • ARROW-2638 - [Python] Prevent calling extension class constructors directly
  • ARROW-2639 - [Python] Remove unnecessary _check_nullptr methods
  • ARROW-2641 - [C++] Avoid spurious memset() calls, improve bitmap write performance
  • ARROW-2645 - [Java] Refactor ArrowWriter to remove all ArrowFileWriter specifc logic
  • ARROW-2649 - [C++] Add GenerateBits() function to improve bitmap writing performance
  • ARROW-2656 - [Python] Improve creation time of ParquetManifest for partitioned datasets using thread pool
  • ARROW-2660 - [Python] Experimental zero-copy pickling
  • ARROW-2661 - [Python] Adding the ability to programmatically pass hdfs configration key/value pairs via pyarrow
  • ARROW-2662 - [Python] Add to_pandas to ChunkedArray
  • ARROW-2663 - [Python] Make dictionary_encode and unique accesible on Column / ChunkedArray
  • ARROW-2664 - [Python] Implement getitem / slicing on Buffer
  • ARROW-2666 - [Python] numpy.asarray should trigger to_pandas on Array/ChunkedArray
  • ARROW-2672 - [Python] Build ORC extension in manylinux1 wheels
  • ARROW-2674 - [Packaging] Start building nightlies
  • ARROW-2676 - [Packaging] Deploy build artifacts to github releases
  • ARROW-2677 - [Python] Expose Parquet ZSTD compression
  • ARROW-2678 - [GLib] Add more common problems compiling c_glib on OSX
  • ARROW-2680 - [Python] Add documentation about type inference in Table.from_pandas
  • ARROW-2682 - [CI] Notify in Slack about broken builds
  • ARROW-2689 - [Python] Remove parameter timestamps_to_ms
  • ARROW-2692 - [Python] Add test for writing dictionary encoded columns to chunked Parquet files
  • ARROW-2695 - [Python] Prevent calling scalar constructors directly
  • ARROW-2696 - [JAVA] enhance AllocationListener with an onFailedAllocation() call (#2133)
  • ARROW-2699 - [C++/Python] Add Table method that replaces a column with a new supplied column
  • ARROW-2700 - [Python] Add simple examples to Array.cast docstring
  • ARROW-2701 - [C++] Make MemoryMappedFile resizable redux
  • ARROW-2704 - [Java] Change MessageReader API to improve custom message handling for streams
  • ARROW-2713 - [Packaging] Fix linux package builds
  • ARROW-2717 - [Packaging] Postfix conda artifacts with target arch
  • ARROW-2718 - [Packaging] GPG sign downloaded artifacts
  • ARROW-2724 - [Packaging] Determine whether all the expected artifacts are uploaded
  • ARROW-2725 - [Java] make Accountant.AllocationOutcome publicly visible (#2149)
  • ARROW-2729 - [GLib] Add decimal128 array builder
  • ARROW-2731 - Add external Orc capability
  • ARROW-2732 - [GLib] Update brew packages for macOS
  • ARROW-2733 - [GLib] Cast garrow_decimal128 to gint64
  • ARROW-2738 - [GLib] Use Brewfile on installation process
  • ARROW-2739 - [GLib] Use G_DECLARE_DERIVABLE_TYPE
  • ARROW-2740 - [Python] Add address property to Buffer
  • ARROW-2742 - [Python] Allow Table.from_batches to use iterator of record batches
  • ARROW-2748 - [GLib] Add garrow_decimal_data_type_get_scale() (and _precision())
  • ARROW-2749 - [GLib] Rename *garrow_decimal128_array_get_value to *garrow_decimal128_array_format_value
  • ARROW-2751 - [GLib] Add garrow_table_replace_column()
  • ARROW-2752 - [GLib] Document garrow_decimal_data_type_new()
  • ARROW-2753 - [GLib] Add garrow_schema_*_field()
  • ARROW-2755 - [Python] Allow using Ninja to build extension
  • ARROW-2756 - [Python] Remove redundant imports and minor fixes in parquet tests
  • ARROW-2758 - [Plasma] Use Scope enum in Plasma
  • ARROW-2760 - [Python] Remove legacy property definition syntax from parquet module and test them
  • ARROW-2761 - [Python] Add support for set operations in hive partition filtering
  • ARROW-2763 - [Python] Make _metadata file accessible in ParquetDataset
  • ARROW-2780 - [Go] Run code coverage analysis
  • ARROW-2784 - [C++] MemoryMappedFile::WriteAt allow writing past the end
  • ARROW-2790 - [C++] Minor style changes from the review
  • ARROW-2790 - [C++] Buffers can contain uninitialized memory
  • ARROW-2791 - [Packaging] Build Ubuntu 18.04 packages
  • ARROW-2792 - [Packaging] Consider uploading tarballs to avoid naming conflicts
  • ARROW-2794 - [Plasma] Add the RPC of a list of Delete Objects in Plasma
  • ARROW-2798 - [Plasma] Use hashing function that takes into account all UniqueID bytes
  • ARROW-2802 - [Docs] Move all release management instructions to Confluence
  • ARROW-2804 - [Website] Link to Developer wiki (Confluence) from front page
  • ARROW-2805 - [Python] Use official way to find TensorFlow module
  • ARROW-2809 - [C++] Only print cpplint and clang-format output for failures by default
  • ARROW-2811 - [Python] Test serialization for determinism
  • ARROW-2815 - [CI] Suppress DEBUG logging when building Java library in C++ CI entries
  • ARROW-2816 - [Python] Make NativeFile BufferedIOBase-compliant
  • ARROW-2821 - [C++] Remove redundant memsets in BooleanBuilder
  • ARROW-2822 - [C++] Remove the unneeded const qualifier and clarify the comments
  • ARROW-2822 - [C++] Zero padding bytes in PoolBuffer
  • ARROW-2824 - [GLib] Add garrow_decimal128_array_get_value()
  • ARROW-2825 - [C++] Add AllocateBuffer / AllocateResizableBuffer variants with default memory pool
  • ARROW-2826 - [C++] Remove ArrayBuilder::Init method, clean up Resize, remove PoolBuffer from public API
  • ARROW-2827 - [C++] Stop to use -jN in sub make
  • ARROW-2829 - [GLib] Add GArrowORCFileReader
  • ARROW-2830 - [deb] Enable parallel build again
  • ARROW-2832 - [Python] Pretty-print schema metadata in Schema.__repr__
  • ARROW-2833 - [Python] Column.__repr__ will lock up Jupyter with large datasets
  • ARROW-2834 - [GLib] Remove "enable_" prefix from Meson options
  • ARROW-2836 - [Packaging] Expand build matrices to multiple tasks
  • ARROW-2837 - [C++] ArrayBuilder::null_bitmap returns PoolBuffer
  • ARROW-2838 - [Python] Speed up PandasObjectIsNull
  • ARROW-2844 - [Packaging] Test OSX wheels after build
  • ARROW-2845 - [Packaging] Upload additional debian artifacts
  • ARROW-2846 - [Packaging] Update nightly build in crossbow as well as the sample configuration
  • ARROW-2847 - [Packaging] Fix artifact name matching for conda forge packages
  • ARROW-2848 - [Packaging] Use lib10.deb instead of lib0.deb
  • ARROW-2849 - [Ruby] Arrow::Table#load supports ORC
  • ARROW-2855 - [C++] Blog post that outlines the benefits of using jemalloc
  • ARROW-2859 - [Python] Accept buffer-like objects as sources in open_file, open_stream APIs
  • ARROW-2861 - [Python] Add note about how to not write DataFrame index to Parquet
  • ARROW-2864 - [Plasma] Add deletion cache to delete objects later when they are not in use.
  • ARROW-2868 - [Packaging] Fix Apache Arrow ORC GLib related problems
  • ARROW-2869 - [Python] Add documentation for Array.to_numpy
  • ARROW-2874 - [Packaging] Pass job prefix when putting on Queue
  • ARROW-2875 - [Packaging] Don't attempt to download arrow archive in linux builds
  • ARROW-2881 - [Website] Add community tab to header, add link and callout to dev wiki
  • ARROW-2884 - [Packaging] Support RC
  • ARROW-2886 - [Release] Remove an unused variable
  • ARROW-2890 - [Plasma] Make python client release method private
  • ARROW-2893 - [C++] Remove PoolBuffer class from public API and hide implementation details behind factory functions
  • ARROW-2897 - [Packaging] Organize supported Ubuntu versions
  • ARROW-2898 - [Packaging] Setuptools_scm just shipped a new version which fails to parse `apache-arrow-<version>` tag
  • ARROW-2906 - [Website] Remove the link to slack channel
  • ARROW-2907 - [GitHub] Improve the first paragraph of "How to contribute patches"
  • ARROW-2908 - [Rust] Update version to 0.10.0
  • ARROW-2914 - [Integration] Add WindowPandasUDFTests to Spark integration script
  • ARROW-2915 - [Packaging] Remove artifact form ubuntu-trusty build
  • ARROW-2918 - [C++] Improve formatting of Struct pretty prints
  • ARROW-2921 - [Release] Update .deb/.rpm changelogs in preparation
  • ARROW-2922 - [Release] Make python command name customizable
  • ARROW-2923 - [DOC] Adding Apache Spark integration test instructions
  • ARROW-2924 - [Java] mvn release fails when an older maven javadoc plugin is installed
  • ARROW-2927 - [Packaging] AppVeyor wheel task is failing on initial checkout
  • ARROW-2928 - [Packaging] AppVeyor crossbow conda builds are picking up boost 1.63.0 instead of the installed version
  • ARROW-2929 - [C++] ARROW-2826 Breaks parquet-cpp 1.4.0 builds
  • ARROW-2934 - [Packaging] Add checksums creation to sign subcommand
  • ARROW-2935 - [Packaging] Add verify_binary_artifacts function to verify-release-candidate.sh
  • ARROW-2937 - [Java] Followup to ARROW-2704. Make MessageReader classes immutable and clarify docs
  • ARROW-2943 - [C++] Implement BufferedOutputStream::Flush
  • ARROW-2944 - [Format] Synchronize some metadata changes to columnar format Markdown documents
  • ARROW-2946 - [Packaging] Stop to use $PWD
  • ARROW-2947 - [Packaging] Remove Ubuntu Artful
  • ARROW-2949 - [CI] Add retry logic when downloading miniconda to reduce flakiness
  • ARROW-2951 - [CI] Changes in format/ should cause Appveyor builds to run
  • ARROW-2953 - [Plasma] Reduce plasma memory usage
  • ARROW-2954 - [Plasma] Reduce plasma store memory usage
  • ARROW-2962 - [Packaging] Bintray descriptor files are no longer needed
  • ARROW-2977 - [Packaging] Release verification script should check rust too
  • ARROW-2985 - [Ruby] Run unit tests in verify-release-candidate.sh
  • ARROW-2988 - [Release] More automated release verification on Windows
  • ARROW-2990 - [GLib] Fail to build with rpath-ed Arrow C++ on macOS

Apache Arrow 0.9.0 (2018-03-19)

New Features and Improvements

  • ARROW-232 - [Python] Add unit test for writing Parquet file from chunked table
  • ARROW-633 - /634: [Java] Add FixedSizeBinary support in Java and integration tests (Updated)
  • ARROW-634 - Add integration tests for FixedSizeBinary
  • ARROW-760 - [Python] document differences w.r.t. fastparquet
  • ARROW-764 - [C++] Improves performance of CopyBitmap and adds benchmarks
  • ARROW-969 - [C++] Add add/remove field functions for RecordBatch
  • ARROW-1021 - [Python] Add documentation for C++ pyarrow API
  • ARROW-1035 - [Python] Add streaming dataframe reconstruction benchmark
  • ARROW-1394 - [Plasma] Add optional extension for allocating memory on GPUs
  • ARROW-1463 - [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code
  • ARROW-1579 - [Java] Adding containerized Spark Integration tests
  • ARROW-1580 - [Python] Instructions for setting up nightly builds on Linux
  • ARROW-1621 - [JAVA] Reduce Heap Usage per Vector
  • ARROW-1623 - [C++] Add convenience method to construct Buffer from a string that owns its memory
  • ARROW-1632 - [Python] Permit categorical conversions in Table.to_pandas on a per-column basis
  • ARROW-1643 - [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS
  • ARROW-1705 - [Python] allow building array from dicts
  • ARROW-1706 - [Python] Coerce array inputs to StructArray.from_arrays. Flip order of arguments
  • ARROW-1712 - [C++] Add method to BinaryBuilder to reserve space for value data
  • ARROW-1757 - [C++] Add DictionaryArray::FromArrays alternate ctor that can check or sanitized "untrusted" indices
  • ARROW-1815 - [Java] Rename MapVector to StructVector
  • ARROW-1832 - [JS] Implement JSON reader for integration tests
  • ARROW-1835 - [C++] Create Arrow schema from std::tuple types
  • ARROW-1861 - [Python][skip ci]
  • ARROW-1872 - [Website] Minor edits and addition of YAML for versions
  • ARROW-1899 - [Python] Refactor handling of null sentinels in python/numpy_to_arrow.cc
  • ARROW-1920 - [C++/Python] Add experimental reader for Apache ORC files
  • ARROW-1926 - [GLib] Add garrow_timestamp_data_type_get_unit()
  • ARROW-1927 - [Plasma] Add delete function
  • ARROW-1929 - [C++] Copy over testing utility code from PARQUET-1092
  • ARROW-1930 - [C++] Adds Slice operation to ChunkedArray and Column
  • ARROW-1931 - [C++] Suppress C4996 deprecation warning in MSVC builds for now
  • ARROW-1937 - [Python] Document nested array initialization
  • ARROW-1942 - [C++] Hash table specializations for small integers
  • ARROW-1947 - [Plasma] Change Client Create and Get to use Buffers
  • ARROW-1951 - [Python] Add memcopy threads argument to PlasmaClient put.
  • ARROW-1962 - [Java] Adding reset to ValueVector interface
  • ARROW-1965 - [GLib] Add garrow_array_builder_get_value_data_type()
  • ARROW-1969 - [C++] Don't build ORC extension by default
  • ARROW-1970 - [GLib] Add garrow_chunked_array_get_value_data_type() and garrow_chunked_array_get_value_type()
  • ARROW-1977 - [C++] Update windows dev docs
  • ARROW-1978 - [Website] Consolidate Powered By project list, add more visibly to front page
  • ARROW-2004 - [C++] Add shrink_to_fit parameter to BufferBuilder::Resize, add Reserve method
  • ARROW-2007 - [Python] Implement float32 conversions, use NumPy dtype when possible for inner arrays
  • ARROW-2011 - [Python] Allow setting the pickler in the serialization context.
  • ARROW-2012 - [GLib] Support "make distclean"
  • ARROW-2018 - [C++] fix Build instruction on macOS and Homebrew
  • ARROW-2019 - [JAVA] Control the memory allocated for inner vector in LIST (#1497)
  • ARROW-2024 - [Python] Remove torch serialization from default serialization context.
  • ARROW-2028 - [Python] extra_cmake_args needs to be passed through shlex.split
  • ARROW-2031 - [Python] HadoopFileSystem is pickleable
  • ARROW-2035 - [C++] Update vendored cpplint.py to a Py3-compatible one
  • ARROW-2036 - [Python] Support standard IOBase methods on NativeFile
  • ARROW-2042 - [Plasma] Revert API change of plasma::Create to output a MutableBuffer
  • ARROW-2043 - [C++] change description from OS X to macOS
  • ARROW-2046 - [Python] Support path-like objects
  • ARROW-2048 - [Python/C++] Upate Thrift pin to 0.11
  • ARROW-2050 - [Python] Support setup.py pytest
  • ARROW-2052 - [C++ / Python] Rework OwnedRef, remove ScopedRef
  • ARROW-2053 - [C++] Build instruction is incomplete
  • ARROW-2054 - [C++] Fix compilation warnings
  • ARROW-2064 - [GLib] Add common build problems link to the install section
  • ARROW-2065 - [Python] Fix bug in SerializationContext.clone().
  • ARROW-2066 - [Python] Document using pyarrow with Azure Blob Store
  • ARROW-2068 - [Python] Expose array's buffers
  • ARROW-2069 - [Python] Add note that Plasma is not supported on Windows
  • ARROW-2071 - [Python] Fix test slowness on Travis-CI
  • ARROW-2071 - [Python] Lighten serialization tests
  • ARROW-2073 - [Python] Create struct array from sequence of tuples
  • ARROW-2076 - [Python] Display slowest test durations
  • ARROW-2083 - [CI] Detect changed components on Travis-CI
  • ARROW-2084 - [C++] Support newer Brotli static library names
  • ARROW-2086 - [Python] Shrink size of arrow_manylinux1_x86_64_base docker image
  • ARROW-2087 - [Python] Binaries of 3rdparty are not stripped in manylinux1 base image
  • ARROW-2088 - [GLib] Add GArrowNumericArray
  • ARROW-2089 - [GLib] Rename to GARROW_TYPE_BOOLEAN for consistency
  • ARROW-2090 - [Python] Add context methods to ParquetWriter
  • ARROW-2093 - [Python] Do not install PyTorch in Travis CI
  • ARROW-2094 - [C++] Install libprotobuf and set PROTOBUF_HOME when using toolchain
  • ARROW-2095 - [C++] Less verbose building 3rd party deps
  • ARROW-2096 - [C++] Turn off Boost_DEBUG to trim build output
  • ARROW-2099 - [Python] Add safe option to DictionaryArray.from_arrays to do boundschecking of indices by default
  • ARROW-2107 - [GLib] Follow arrow::gpu::CudaIpcMemHandle API change
  • ARROW-2108 - [Python] Update instructions for ASV
  • ARROW-2110 - [Python] Only require pytest-runner on test commands
  • ARROW-2111 - [C++] Lint in parallel
  • ARROW-2114 - [Python][skip appveyor]
  • ARROW-2117 - [C++] Update codebase / CI toolchain for clang 5.0
  • ARROW-2118 - [C++] Fix misleading error when memory mapping a zero-length file
  • ARROW-2120 - [C++] Add possibility to use empty MSVCSTATIC_LIB_SUFFIX for Thirdparties
  • ARROW-2121 - [Python] Handle object arrays directly in pandas serializer.
  • ARROW-2123 - [JS] Upgrade to TS 2.7.1
  • ARROW-2132 - Add link to Plasma in main README
  • ARROW-2134 - [CI] Make Travis-CI commit inspection more robust
  • ARROW-2137 - [Python] Don't print paths that are ignored when reading Parquet files
  • ARROW-2138 - [C++] abort on failed debug check
  • ARROW-2142 - [Python] Allow conversion from Numpy struct array
  • ARROW-2143 - [Python] Provide a manylinux1 wheel for cp27m
  • ARROW-2146 - [GLib] Add Slice api to ChunkedArray
  • ARROW-2149 - [Python] Reorganize test_convert_pandas.py
  • ARROW-2154 - [Python] Implement equality on buffers
  • ARROW-2155 - [Python] frombuffer() should respect mutability of argument
  • ARROW-2156 - [CI] Isolate Sphinx dependencies
  • ARROW-2163 - [CI] Make apt installs explicit
  • ARROW-2166 - [GLib] Add Slice api to Column
  • ARROW-2168 - [C++] Build toolchain on CI with jemalloc
  • ARROW-2169 - [C++] MSVC is complaining about uncaptured variables
  • ARROW-2174 - [JS] export arrow format and schema enums
  • ARROW-2176 - [C++] Extend DictionaryBuilder to support delta dictionaries
  • ARROW-2177 - [C++] Remove support for specifying negative scale values in DecimalType
  • ARROW-2180 - [C++] Remove deprecated APIs from 0.8.0 cycle
  • ARROW-2181 - [PYTHON][DOC] Add doc on usage of concat_tables
  • ARROW-2184 - [C++] Add static constructor for FileOutputStream returning shared_ptr to OutputStream
  • ARROW-2185 - Strip CI directives from commit messages
  • ARROW-2190 - [GLib] Add add/remove field functions for RecordBatch
  • ARROW-2191 - [C++] Only use specific version of jemalloc
  • ARROW-2197 - Document C++ ABI issue and workaround
  • ARROW-2198 - [Python] correct docstring for parquet.read_table
  • ARROW-2199 - [JAVA] Control the memory allocated for inner vectors in containers. (#1646)
  • ARROW-2203 - [C++] StderrStream class
  • ARROW-2204 - Fix TLS errors in manylinux1 build
  • ARROW-2205 - [Python] Option for integer object nulls
  • ARROW-2206 - [JS] Document Perspective project
  • ARROW-2218 - [Python] PythonFile should infer mode when not given
  • ARROW-2231 - [CI] Use clcache on AppVeyor for faster builds
  • ARROW-2238 - [C++] Detect and use clcache in cmake configuration
  • ARROW-2239 - [C++] Update Windows build docs
  • ARROW-2250 - [Python] Do not create a subprocess for plasma but just use existing process
  • ARROW-2252 - [Python] Create buffer from address, size and base
  • ARROW-2253 - [Python] Support eq on scalar values
  • ARROW-2257 - [C++] Add high-level option to toggle CXX11 ABI
  • ARROW-2261 - [GLib] Improve memory management for GArrowBuffer data
  • ARROW-2262 - [Python] Support slicing on pyarrow.ChunkedArray
  • ARROW-2279 - [Python] Better error message if lib cannot be found
  • ARROW-2282 - [Python] Create StringArray from buffers
  • ARROW-2283 - [C++] Support Arrow C++ installed in /usr detection by pkg-config
  • ARROW-2289 - [GLib] Add Numeric, Integer, FloatingPoint data types
  • ARROW-2291 - [C++] Add additional libboost-regex-dev to build instructions in README
  • ARROW-2292 - [Python] Rename frombuffer() to py_buffer()
  • ARROW-2309 - [C++] Use std::make_unsigned
  • ARROW-2321 - [C++] Release verification script fails with if CMAKE_INSTALL_LIBDIR is not $ARROW_HOME/lib
  • ARROW-2329 - [Website]: 0.9.0 release update
  • ARROW-2336 - [Website] Blog post for 0.9.0 release
  • ARROW-2768 - [Packaging] Support Ubuntu 18.04
  • ARROW-2783 - Importing conda-forge pyarrow fails

Bug Fixes

  • ARROW-1345 - [Python] Test conversion from nested NumPy arrays with smaller int, float types
  • ARROW-1589 - [C++] Fuzzing for certain input formats
  • ARROW-1646 - [Python] Handle NumPy scalar types
  • ARROW-1856 - [Python] Auto-detect Parquet ABI version when using PARQUET_HOME
  • ARROW-1909 - [C++] Enables building with benchmarks on windows
  • ARROW-1912 - [Website] Add committer affiliations and roles to website
  • ARROW-1919 - [Plasma] Test that object ids are 20 bytes
  • ARROW-1924 - [Python] Bring back pickle=True option for serialization
  • ARROW-1933 - [GLib] Fix build error with --with-arrow-cpp-build-dir
  • ARROW-1940 - [Python] Extra metadata gets added after multiple conversions between pd.DataFrame and pa.Table
  • ARROW-1941 - [Python] Fix empty list roundtrip in to_pandas
  • ARROW-1943 - [JAVA] handle setInitialCapacity for deeply nested lists
  • ARROW-1944 - [C++] Fix ARROW_STATIC_LIB in FindArrow
  • ARROW-1945 - [C++] Fix doxygen documentation of array.h
  • ARROW-1946 - [JAVA] Add APIs to decimal vector for writing big endian data
  • ARROW-1948 - [Java] Load ListVector validity buffer with BitVectorHelper to handle all non-null
  • ARROW-1950 - [Python] pandas_type in pandas metadata incorrect for List types
  • ARROW-1953 - [JS] Fix JS build
  • ARROW-1955 - MSVC generates "attempting to reference a deleted function" during build.
  • ARROW-1958 - [Python] Error in pandas conversion for datetimetz row index
  • ARROW-1961 - [Python] Preserve pre-existing schema metadata in Parquet files when passing flavor='spark'
  • ARROW-1966 - [C++] Accommodate JAVA_HOME on Linux that includes the jre/ directory, or is the full path to directory with libjvm
  • ARROW-1967 - Python: AssertionError w.r.t Pandas conversion on Parquet files in 0.8.0 dev version
  • ARROW-1971 - [Python] Add pandas serialization to the default
  • ARROW-1972 - [Python] Import pyarrow in DeserializeObject.
  • ARROW-1973 - [Python] Memory leak when converting Arrow tables with array columns to Pandas dataframes.
  • ARROW-1976 - [Python] Handling unicode pandas columns on parquet.read_table
  • ARROW-1979 - [JS] Fix JS builds hanging in es2015
  • ARROW-1980 - [Python] Fix race condition in write_to_dataset
  • ARROW-1982 - [Python] Coerce Parquet statistics as bytes to more useful Python scalar types
  • ARROW-1986 - [Python] HadoopFileSystem is not picklable and cannot currently be used with multiprocessing
  • ARROW-1991 - [Website] Fix Docker documentation build
  • ARROW-1992 - [C++/Python] Fix segfault when string to categorical empty string array
  • ARROW-1997 - [C++/Python] Ignore zero-copy-option in to_pandas when strings_to_categorical is True
  • ARROW-1998 - [Python] fix crash on empty Numpy arrays
  • ARROW-1999 - [Python] Type checking in from_numpy_dtype
  • ARROW-2000 - [Plasma] Deduplicate file descriptors when replying to GetRequest.
  • ARROW-2002 - [Python] check write_queue is not full and writer_thread is alive before enqueue new record when download file.
  • ARROW-2003 - [Python] Remove use of fastpath parameter to pandas.core.internals.make_block
  • ARROW-2005 - [Python] Fix incorrect flake8 config path to Cython lint config
  • ARROW-2008 - [Python] Type inference for int32 NumPy arrays (expecting list<int32>) returns int64 and then conversion fails
  • ARROW-2010 - [C++] Do not suppress shorten-64-to-32 warnings from clang, fix warnings in ORC adapter
  • ARROW-2017 - [Python] Use unsigned PyLong API for uint64 values over int64 range
  • ARROW-2023 - [C++] Fix ASAN failure on malformed / empty stream input, enable ASAN builds, add more dev docs
  • ARROW-2025 - [C++] Creating multiple equivalent HadoopFileSystems works fine
  • ARROW-2029 - [Python] NativeFile.tell errors after close
  • ARROW-2032 - [C++] ORC ep installs on each call to ninja build
  • ARROW-2033 - [Python] Fix pa.array() with iterator input
  • ARROW-2039 - [Python] Avoid crashing on uninitialized Buffer
  • ARROW-2040 - [Python] Deserialized Numpy array must keep ref to underlying tensor
  • ARROW-2047 - [Python] Use sys.executable instead of one in the search path.
  • ARROW-2049 - [Python] Use python -m cython to run Cython, instead of CYTHON_EXECUTABLE
  • ARROW-2062 - [Python] Do not use memory maps in test_serialization.py to try to improve Travis CI flakiness
  • ARROW-2070 - [Python] Fix chdir logic in setup.py
  • ARROW-2072 - [Python] Fix crash in decimal128.byte_width
  • ARROW-2080 - [Python] Update documentation about pandas serialization context.
  • ARROW-2085 - [Python] HadoopFileSystem.isdir/.isfile return False on missing paths
  • ARROW-2106 - [Python] Add conversion for a series of datetime objects
  • ARROW-2109 - [C++] Completely disable boost autolink on MSVC build
  • ARROW-2124 - [Python] Add test for empty item in array
  • ARROW-2128 - [Python] Support arrays of empty lists
  • ARROW-2129 - [Python] Handle conversion of empty tables to Pandas
  • ARROW-2131 - [Python] Prepend module path to PYTHONPATH when spawning subprocess
  • ARROW-2133 - [Python] Fix segfault on conversion of empty nested array to Pandas
  • ARROW-2135 - [Python] Fix NaN conversion when casting from Numpy array
  • ARROW-2139 - [Python] Address Sphinx deprecation warning when building docs
  • ARROW-2145 - /ARROW-2153/ARROW-2157/ARROW-2160/ARROW-2177: [Python] Decimal conversion not working for NaN values
  • ARROW-2150 - [Python] Raise NotImplementedError when comparing with pyarrow.Array for now
  • ARROW-2151 - [Python] Fix conversion from np.uint64 scalars
  • ARROW-2153 - [C++/Python] Decimal conversion not working for exponential notation
  • ARROW-2157 - [Python] Decimal arrays cannot be constructed from Python lists
  • ARROW-2158 - [Python] Construction of Decimal array with None or np.nan fails
  • ARROW-2160 - [C++/Python] Fix decimal precision inference
  • ARROW-2161 - [Python] Skip test_cython_api if ARROW_HOME isn't defined
  • ARROW-2162 - [Python/C++] Decimal Values with too-high precision are multiplied by 100
  • ARROW-2167 - [C++] Building Orc extensions fails with the default BUILD_WARNING_LEVEL=Production
  • ARROW-2170 - [Python] construct_metadata fails on reading files where no index was preserved
  • ARROW-2171 - [C++/Python] Make OwnedRef safer
  • ARROW-2172 - [C++/Python] Fix converting from Numpy array with non-natural stride
  • ARROW-2173 - [C++/Python] Hold the GIL in NumPyBuffer destructor
  • ARROW-2175 - [Python] Install Arrow libraries in Travis CI builds when only Python directory is affected
  • ARROW-2178 - [JS] Fix JS html FileReader example
  • ARROW-2179 - [C++] Install omitted headers in arrow/util
  • ARROW-2192 - [CI] Always build on master branch and repository
  • ARROW-2194 - [Python] Pandas columns metadata incorrect for empty string columns
  • ARROW-2208 - [Python] install issues with jemalloc
  • ARROW-2209 - [Python] Partition columns are not correctly loaded in schema of ParquetDataset
  • ARROW-2210 - [C++] Reset ptr on failed memory allocation
  • ARROW-2212 - [C++/Python] Build Protobuf in base manylinux 1 docker image
  • ARROW-2223 - [JS] compile src/bin as es5-cjs to all output targets
  • ARROW-2227 - [Python] Fix off-by-one error in chunked binary conversions
  • ARROW-2228 - [Python] Unsigned int type for arrow Table not supported
  • ARROW-2230 - [Python] Strip catch-all tag matching from git-describe
  • ARROW-2232 - [Python] pyarrow.Tensor constructor segfaults
  • ARROW-2234 - [JS] Read timestamp low bits as Uint32s
  • ARROW-2240 - [Python] Array initialization with leading numpy nan fails with exception
  • ARROW-2244 - [C++] Add unit test to explicitly check that NullArray internal data set correctly in Slice operations
  • ARROW-2245 - ARROW-2246: [Python] Revert static linkage of parquet-cpp in manylinux1 wheel
  • ARROW-2246 - [Python] Use namespaced boost in manylinux1 package
  • ARROW-2251 - [GLib] Keep GArrowBuffer alive while GArrowTensor for the buffer is live
  • ARROW-2254 - [Python] Ignore JS tags in local dev versions
  • ARROW-2258 - [Python] Add additional information to find Boost on windows
  • ARROW-2263 - [Python] Prepend local pyarrow/ path to PYTHONPATH in test_cython.py
  • ARROW-2265 - [Python] Use CheckExact when serializing lists and numpy arrays.
  • ARROW-2268 - Drop usage of md5 checksums for source releases, verification scripts
  • ARROW-2269 - [Python] Make boost namespace selectable in wheels
  • ARROW-2270 - [Python] Fix lifetime of ForeignBuffer base object
  • ARROW-2272 - [Python] Clean up leftovers in test_plasma.py
  • ARROW-2275 - [C++] Guard against bad use of Buffer.mutable_data()
  • ARROW-2280 - [Python] Return the offset for the buffers in pyarrow.Array
  • ARROW-2284 - [Python] Fix error display on test_plasma error
  • ARROW-2288 - [Python] Fix slicing logic
  • ARROW-2297 - [JS] babel-jest is not listed as a dev dependency
  • ARROW-2304 - [C++] Fix HDFS MultipleClients unit test
  • ARROW-2306 - [Python] Fix partitioned Parquet test against HDFS
  • ARROW-2307 - [Python] Allow reading record batch streams with zero record batches
  • ARROW-2311 - [Python/C++] Fix struct array slicing
  • ARROW-2312 - [JS] run test_js before test_integration
  • ARROW-2313 - [C++] Add -NDEBUG flag to arrow.pc
  • ARROW-2316 - [C++] Revert Buffer::mutable_data to inline so that linkers do not have to remember to define NDEBUG for release builds
  • ARROW-2318 - [Plasma] Run plasma store tests with unique socket
  • ARROW-2320 - [C++] Vendored Boost build does not build regex library
  • ARROW-2406 - [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided

Apache Arrow 0.8.0 (2017-12-18)

Bug Fixes

  • ARROW-226 - [C++] If opening an HDFS file fails and it does not exist, say so to help with debugging
  • ARROW-641 - [C++] Do not build io-hdfs-test if ARROW_HDFS is off
  • ARROW-1282 - Large memory reallocation by Arrow causes hang in jemalloc
  • ARROW-1298 - C++: Add prefix to jemalloc functions to guard against issues when using multiple allocators in the same process
  • ARROW-1341 - [C++] Deprecate arrow::MakeTable in favor of new ctor from ARROW-1334
  • ARROW-1347 - [JAVA] Return consistent child field name for List Vectors
  • ARROW-1398 - [Python] No support reading columns of type decimal(19,4)
  • ARROW-1409 - [Format] Remove page id from Buffer metadata, increment metadata version number
  • ARROW-1431 - [Java] JsonFileReader doesn't intialize some vectors approperately
  • ARROW-1436 - PyArrow Timestamps written to Parquet as INT96 appear in Spark as 'bigint'
  • ARROW-1540 - Add NO_VALGRIND option to ADD_ARROW_TEST and disable valgrind in a few problematic tests
  • ARROW-1541 - [C++] Fix race conditions in arrow_gpu with generated Flatbuffers files. Do not put generated files in source tree
  • ARROW-1543 - [C++] Correct C++ tutorial to use std::unique_ptr instead of std::shared_ptr
  • ARROW-1549 - [JS] Integrate auto-generated Arrow test files
  • ARROW-1555 - [Python] Implement Dask exists function
  • ARROW-1584 - [C++/Python] Support Null type in IPC round trips, fix serialize_pandas on empty DataFrame
  • ARROW-1585 - /ARROW-1586: [PYTHON] serialize_pandas roundtrip loses columns name
  • ARROW-1586 - [PYTHON] serialize_pandas roundtrip loses columns name
  • ARROW-1609 - [Plasma] Xcode 9 compilation workaround
  • ARROW-1615 - Added BUILD_WARNING_LEVEL and BUILD_WARNING_FLAGS to Setup…
  • ARROW-1617 - [Python] Do not use symlinks in python/cmake_modules
  • ARROW-1620 - Python: Download Boost in manylinux1 build from bintray
  • ARROW-1622 - [Plasma] Plasma doesn't compile with XCode 9
  • ARROW-1624 - [C++] Fix build on LLVM 4.0, remove some clang warning suppressions
  • ARROW-1625 - [Serialization] Support OrderedDict and defaultdict serialization
  • ARROW-1629 - [C++] Add miscellaneous DCHECKs and minor changes based on infer tool output
  • ARROW-1633 - [Python] Support NumPy string and unicode types in pyarrow.array, Array.from_pandas
  • ARROW-1640 - Fix HTTPS failures in cmake / libcurl caused by ca-certificates clash
  • ARROW-1647 - [Plasma] Make sure to read length header as int64_t instead of size_t.
  • ARROW-1653 - [Plasma] Use static cast to avoid compiler warning.
  • ARROW-1655 - [Java] Add Scale and Precision to ValueVectorTypes.tdd for Decimals
  • ARROW-1656 - [C++] Endianness Macro is Incorrect on Windows And Mac
  • ARROW-1657 - [C++] Multithreaded Read Test Failing on Arch Linux
  • ARROW-1658 - [Python] Add boundschecking of dictionary indices when creating CategoricalBlock
  • ARROW-1663 - [Java] use consistent name for null and not-null in FixedSizeLis…
  • ARROW-1670 - [Serialization] Speed up deserialization by getting rid of smart pointer overhead
  • ARROW-1672 - [Python] Failure to write Feather bytes column
  • ARROW-1673 - [Python] Add support for numpy 'bool' type
  • ARROW-1676 - [C++] Only pad null bitmap up to a factor of 8 bytes in Feather format
  • ARROW-1678 - [Python] Implement numpy.float16 SerDe
  • ARROW-1680 - [Python] Timestamp unit change not done in from_pandas() conversion
  • ARROW-1681 - [Python] Error writing with nulls in lists
  • ARROW-1686 - [Docs] rsync contents of apidocs directory into site java directory
  • ARROW-1693 - [JS] Expand JavaScript implementation, build system, fix integration tests
  • ARROW-1694 - [Java] Unclosed VectorSchemaRoot in JsonFileReader#readDictionaryBatches()
  • ARROW-1695 - [Serialization] Fix reference counting of numpy arrays created in custom serializer
  • ARROW-1698 - [JS] File reader attempts to load the same dictionary batch more than once
  • ARROW-1704 - [GLib] Fix Go example failure
  • ARROW-1708 - [JS] Fix linter error
  • ARROW-1709 - [C++] Decimal.ToString is incorrect for negative scale
  • ARROW-1711 - [Python] Fix flake8 calls to lint the right directories
  • ARROW-1714 - [Python] Fix invalid serialization/deserialization None name Series
  • ARROW-1720 - [Python] Implement bounds check in chunk getter
  • ARROW-1723 - [C++] add ARROW_STATIC to mark static libs on Windows
  • ARROW-1730 - , ARROW-1738: [Python] Fix wrong datetime conversion
  • ARROW-1732 - [Python] Permit creating record batches with no columns, test pandas roundtrips
  • ARROW-1735 - [C++] Test CastKernel writing into output array with non-zero offset
  • ARROW-1738 - [Python] Wrong datetime conversion when pa.array with unit
  • ARROW-1739 - [Python] Fix broken build due to using unittest.TestCase methods
  • ARROW-1742 - C++: clang-format is not detected correct on OSX anymore
  • ARROW-1743 - [Python] Avoid non-array writeable-flag check
  • ARROW-1745 - [Plasma] Include gtest after plasma/compat.h in tests.
  • ARROW-1749 - [C++] Handle range of Decimal128 values that require 39 digits to be displayed
  • ARROW-1751 - [Python] Pandas 0.21.0 introduces a breaking API change for MultiIndex construction
  • ARROW-1754 - [Python] Fix buggy Parquet roundtrip when an index name is the same as a column name
  • ARROW-1756 - [Python] Fix large file read/write error
  • ARROW-1762 - [C++] Add note to readme about need to set LC_ALL on some Linux systems
  • ARROW-1764 - [Python] Add -c conda-forge for Windows dev installation instructions
  • ARROW-1766 - [GLib] Fix failing builds on OSX
  • ARROW-1768 - [Python] Fix suppressed exception in ParquetWriter.del
  • ARROW-1769 - Python: pyarrow.parquet.write_to_dataset creates cyclic references
  • ARROW-1770 - [GLib] Fix GLib compiler warning
  • ARROW-1771 - [C++] ARROW-1749 Breaks Public API test in parquet-cpp
  • ARROW-1776 - [C++] Define arrow::gpu::CudaContext::bytes_allocated()
  • ARROW-1778 - [Python] Link parquet-cpp statically, privately in manylinux1 wheels
  • ARROW-1781 - Don't use brew when using the toolchain
  • ARROW-1788 - Fix Plasma store abort bug on client disconnection
  • ARROW-1791 - Limit generated data range to physical limits for temporal types
  • ARROW-1793 - fix a typo for README.md
  • ARROW-1800 - [C++] Fix and simplify random_decimals
  • ARROW-1805 - [Python] Ignore special private files when traversing ParquetDataset
  • ARROW-1811 - [C++/Python] Rename all Decimal based APIs to Decimal128
  • ARROW-1812 - [C++] Plasma store modifies hash table while iterating during client disconnect
  • ARROW-1813 - Enforce checkstyle failure in JAVA build and fix all checkstyle
  • ARROW-1821 - [INTEGRATION] Add integration test case for when Field has zero null count and optional validity buffer
  • ARROW-1829 - [Plasma] Fixes to eviction policy.
  • ARROW-1830 - [Python] Relax restriction that Parquet files in a dataset end in .parq or .parquet
  • ARROW-1831 - [Python] Docker-based documentation build does not properly set LD_LIBRARY_PATH
  • ARROW-1836 - [C++] Remove deprecated static_visitor struct to avoid msvc C4996 warning
  • ARROW-1839 - /ARROW-1871: [C++/Python] Add Decimal Parquet Read/Write Tests
  • ARROW-1840 - [Website] The installation command failed on Windows10 anaconda envir…
  • ARROW-1845 - [Python] Expose Decimal128Type
  • ARROW-1852 - [C++] Make retrieval of Plasma manager fd a const operation
  • ARROW-1853 - [Plasma] Fix off-by-one error in retry processing
  • ARROW-1863 - [Python] PyObjectStringify could render bytes-like output for more types of objects
  • ARROW-1865 - [C++] Do not alter number of rows attribute when removing last column from Table
  • ARROW-1869 - [JAVA] Fix LowCostIdentityHashMap name
  • ARROW-1871 - [Python/C++] Appending Python Decimals with different scales requires rescaling
  • ARROW-1873 - [Python] Catch more possible Python/OOM errors in to_pandas conversion path
  • ARROW-1877 - [Java] Fix incorrect equals method in JsonStringArrayList
  • ARROW-1879 - [Python] Dask integration tests are not skipped if dask is not installed
  • ARROW-1881 - Ignore JS tags for Python packages
  • ARROW-1882 - [C++] Reintroduce DictionaryBuilder
  • ARROW-1883 - [Python] Fix handling of metadata in to_pandas when not all columns are present
  • ARROW-1889 - [Python] --exclude is not available in older git versions
  • ARROW-1890 - [Python] Fix mask handling for Date32 NumPy conversions
  • ARROW-1891 - [Python] Always use NumPy NaT sentinels to mark nulls when converting to array
  • ARROW-1892 - [Python] Support binaries in lists
  • ARROW-1893 - [Python] Convert memoryview to bytes when loading from pickle in Python 2.7
  • ARROW-1895 - /ARROW-1897: [Python] Add field_name to pandas index metadata
  • ARROW-1897 - [Python] Incorrect numpy_type for pandas metadata of Categoricals
  • ARROW-1904 - [C++] Deprecate PrimitiveArray::raw_values
  • ARROW-1906 - [Python] Do not override user-supplied type in pyarrow.array when converting DatetimeTZ pandas data
  • ARROW-1908 - [Python] Construction of arrow table from pandas DataFrame with duplicate column names crashes
  • ARROW-1910 - [C++] Use c_glib Brewfile in README for installing dependencies on macOS (#1407)
  • ARROW-1914 - [C++] Fix build dependency for GPU support build
  • ARROW-1915 - [Python] Add missing parquet decorator to decimal tests
  • ARROW-1916 - [Java] Include java/dev/checkstyle in git archive for source releases
  • ARROW-1917 - Fixes to enable verify-release-candidate.sh to work for 0.8.0
  • ARROW-1935 - Download page must not link to snapshots / nightly builds
  • ARROW-1936 - Broken links to signatures/hashes etc
  • ARROW-1939 - Correct links in release 0.8 blog post

New Features and Improvements

  • ARROW-480 - [Python] Implement RowGroupMetaData.ColumnChunk
  • ARROW-504 - [Python] Add adapter to write pandas.DataFrame in user-selected chunk size to streaming format
  • ARROW-507 - [C++] Complete ListArray::FromArrays implementation, add unit tests
  • ARROW-541 - [JS] Implement JavaScript-compatible implementation
  • ARROW-571 - [Python] Add unit test for incremental Parquet file building, improve docs
  • ARROW-587 - Add fix version to PR merge tool
  • ARROW-609 - [C++] Function for casting from days since UNIX epoch to int64 date
  • ARROW-838 - [Python] Expand pyarrow.array to handle NumPy arrays not originating in pandas
  • ARROW-905 - [Docs] Dockerize document generation
  • ARROW-911 - [Python] Expand development.rst with build instructions without conda
  • ARROW-942 - Support running integration tests with both Python 2.7 and 3.6
  • ARROW-950 - [Website] Add Google Analytics tag to site
  • ARROW-972 - UnionArray in pyarrow
  • ARROW-1032 - [JS] Support custom_metadata
  • ARROW-1047 - [Java][FollowUp] Change ArrowMagic to be non-public class
  • ARROW-1047 - [Java] Add Generic Reader Interface for Stream Format
  • ARROW-1087 - [Python] Add pyarrow.get_include function. Bundle includes in all builds
  • ARROW-1114 - [C++] Add simple RecordBatchBuilder class
  • ARROW-1134 - [C++] Support for C++/CLI compilation, add NULLPTR define to avoid using nullptr in public headers
  • ARROW-1178 - [C++/Python] Add option to set chunksize in TableBatchReader, Table.to_batches method
  • ARROW-1226 - [C++] Docs cleaning in arrow/ipc. Doxyfile fixes, move ipc/metadata-internal.h symbols to internal NS
  • ARROW-1250 - [Python] Add pyarrow.types module with useful type checking functions
  • ARROW-1362 - [Integration] Validate vector type layout in IPC messages
  • ARROW-1367 - [Website] Divide CHANGELOG issues by component and add subheaders
  • ARROW-1369 - Support boolean types in the javascript arrow reader library
  • ARROW-1371 - [Website] Add "Powered By" page to the website
  • ARROW-1455 - [Python] Add Dockerfile for validating Dask integration
  • ARROW-1471 - [JAVA] Document requirements and non/requirements for ValueVector updates
  • ARROW-1472 - [JAVA] Design updated ValueVector Object Hierarchy
  • ARROW-1473 - ValueVector new hierarchy prototype (implementation phase 1)
  • ARROW-1474 - [JAVA] ValueVector hierarchy (Implementation Phase 2)
  • ARROW-1476 - [JAVA] Implement Final ValueVector Updates
  • ARROW-1482 - [C++] Implement casts between date32 and date64
  • ARROW-1483 - [C++] Implement casts between time32 and time64
  • ARROW-1484 - [C++/Python] Implement casts between date, time, timestamp units
  • ARROW-1485 - [C++] Implement union-like data type for accommodating kernel arguments which may be scalars or arrays
  • ARROW-1486 - [C++] Make Column, RecordBatch, and Table non-copyable
  • ARROW-1487 - [C++] Implement casts from List to List, where a cast function is defined from any A to B
  • ARROW-1488 - [C++] Implement ArrayBuilder::Finish in terms of FinishInternal based on ArrayData
  • ARROW-1498 - Add CONTRIBUTING.md to .github special directory
  • ARROW-1503 - [Python] Add default serialization context, callbacks for pandas.Series/DataFrame
  • ARROW-1522 - [Python] Zero copy buffer deserialization
  • ARROW-1523 - [C++] Add helper data struct with methods for reading a validity bitmap possibly having a non-zero offset
  • ARROW-1524 - [C++] More graceful solution for handling non-zero offsets on inputs and outputs in compute library
  • ARROW-1525 - [C++] New compare functions that return boolean instead of Status
  • ARROW-1526 - [Python] Add unit test for fix in PARQUET-1100
  • ARROW-1535 - [Python] Enable sdist tarballs to be installed
  • ARROW-1538 - [C++] Support Ubuntu 14.04 in .deb packaging automation
  • ARROW-1539 - [C++] Remove APIs deprecated as of 0.7.0 or prior releases
  • ARROW-1556 - [C++] Move verbose AssertArraysEqual function used in PARQUET-1100 into arrow/test-util.h
  • ARROW-1559 - [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel
  • ARROW-1573 - [C++] Implement stateful kernel function that uses DictionaryBuilder to compute dictionary indices
  • ARROW-1575 - [Python] Add tests for pyarrow.column factory function
  • ARROW-1576 - [Python] Add utility functions (or a richer type hierachy) for checking whether data type instances are members of various type classes
  • ARROW-1577 - [JS] add ASF release scripts
  • ARROW-1588 - [C++/Format] Harden Decimal Format
  • ARROW-1593 - [Python] Pass through preserve_index to RecordBatch.from_pandas in serialize_pandas
  • ARROW-1594 - [Python] Multithreaded conversions to Arrow in from_pandas
  • ARROW-1600 - [C++] Add Buffer constructor that wraps std::string
  • ARROW-1602 - [C++] Add IsValid method to pair with IsNull
  • ARROW-1603 - [C++] Add BinaryArray::GetString helper method
  • ARROW-1604 - [Python] Support common type aliases in cast(...) and various type= arguments
  • ARROW-1605 - [Python] pyarrow.array should be able to yield smaller integer types without an explicit cast
  • ARROW-1607 - [C++] Implement DictionaryBuilder for Decimals
  • ARROW-1613 - [Java] Alternative ArrowReader close to free resources but leave ReadChannel open
  • ARROW-1616 - [Python] Add unit test for RecordBatchWriter.write dispatching to write_table or write_batch
  • ARROW-1626 - Add make targets to run the inter-procedural static analys…
  • ARROW-1627 - New class to handle collection of BufferLedger(s) within …
  • ARROW-1630 - [Serialization] Support Python datetime objects
  • ARROW-1631 - [C++] Add GRPC to ThirdpartyToolchain
  • ARROW-1635 - Add release management guide
  • ARROW-1637 - [C++] IPC round-trip for null type
  • ARROW-1641 - [C++] Hide std::mutex from public headers
  • ARROW-1648 - C++: Add cast from Dictionary[NullType] to NullType
  • ARROW-1649 - C++: Print number of nulls in PrettyPrint for NullArray
  • ARROW-1651 - [JS] Lazy row accessor in Table
  • ARROW-1652 - [JS] housekeeping, vector cleanup
  • ARROW-1654 - [Python] Implement pickling for DataType, Field, Schema
  • ARROW-1662 - Move to using Homebrew/bundle and Brewfile
  • ARROW-1665 - [Serialization] Support more custom datatypes in the default serialization context
  • ARROW-1666 - [GLib] Enable gtk-doc on Travis CI Mac environment
  • ARROW-1667 - [GLib] Support Meson
  • ARROW-1671 - [C++] Deprecate arrow::MakeArray that returns Status, refactor existing code to new variant
  • ARROW-1675 - [Python] Use RecordBatch.from_pandas in Feather write path
  • ARROW-1677 - [Blog] Post on ray and arrow serialization
  • ARROW-1679 - [GLib] Add garrow_record_batch_reader_read_next()
  • ARROW-1683 - [Python] Restore TimestampType to pyarrow namespace
  • ARROW-1684 - [Python] Support selecting nested Parquet fields by any path prefix
  • ARROW-1685 - [GLib] Add GArrowTableBatchReader
  • ARROW-1687 - [Python] Expose UnionArray to pyarrow
  • ARROW-1689 - [Python] Implement zero-copy conversions for DictionaryArray
  • ARROW-1689 - [Python] Allow user to request no data copies
  • ARROW-1690 - [GLib] Add garrow_array_is_valid()
  • ARROW-1691 - [Java] Conform Java Decimal type implementation to format decisions in ARROW-1588
  • ARROW-1697 - [GitHub] Add ISSUE_TEMPLATE.md
  • ARROW-1701 - [Serialization] Support zero copy PyTorch Tensor serialization
  • ARROW-1702 - Update jemalloc in manylinux1 build
  • ARROW-1703 - [C++] Vendor exact version of jemalloc we depend on
  • ARROW-1707 - Update dev README after movement to GitBox
  • ARROW-1710 - [Java] Remove Non-Nullable Vectors
  • ARROW-1716 - [Format/JSON] Use string integer value for Decimals in JSON
  • ARROW-1717 - [Java] Refactor JsonReader for new class hierarchy and fix
  • ARROW-1718 - [C++/Python][D] -> date32
  • ARROW-1719 - [Java] Remove accessor and mutator interface
  • ARROW-1721 - [Python] Implement null-mask check in places where it isn't supported in numpy_to_arrow.cc
  • ARROW-1724 - [Packaging] Support Ubuntu 17.10
  • ARROW-1725 - [Packaging] Upload .deb for Ubuntu 17.10
  • ARROW-1726 - [GLib] Add setup description to verify C GLib build
  • ARROW-1727 - [Format] Expand Arrow streaming format to permit deltas / additions to existing dictionaries
  • ARROW-1728 - [C++] Run clang-format checks in Travis CI
  • ARROW-1734 - C++/Python: Add cast function on Column-level
  • ARROW-1736 - [GLib] Add GArrowCastOptions:allow-time-truncate
  • ARROW-1737 - [GLib] Use G_DECLARE_DERIVABLE_TYPE
  • ARROW-1740 - C++: Kernel to get unique values of an Array/Column
  • ARROW-1746 - [Python] Add build dependencies for Arch Linux
  • ARROW-1747 - [C++] Don't export symbols of statically linked libraries
  • ARROW-1748 - [GLib] Add GArrowRecordBatchBuilder
  • ARROW-1750 - [C++] Remove the need for arrow/util/random.h
  • ARROW-1752 - [Packaging] Add GPU packages for Debian and Ubuntu
  • ARROW-1753 - [Python] Provide for matching subclasses with register_type in serialization context
  • ARROW-1755 - [C++] CMake option to link msvc crt statically
  • ARROW-1758 - [Python] Remove pickle=True option for object serialization
  • ARROW-1759 - [Python] Add function / property to get implied Arrow schema from Parquet file
  • ARROW-1763 - [Python] Implement hash for DataType
  • ARROW-1765 - [Doc] Use dependencies from conda in C++ docker build
  • ARROW-1767 - [C++] Support file reads and writes over 2GB on Windows
  • ARROW-1772 - [C++] Add public-api-test module in style of parquet-cpp
  • ARROW-1773 - [C++] Add casts from date/time types to compatible signed integers
  • ARROW-1775 - Ability to abort created but unsealed Plasma objects
  • ARROW-1777 - [C++] Add ArrayData::Make static ctor for more convenient construction
  • ARROW-1779 - [Java] Integration test breaks without zeroing out validity vectors
  • ARROW-1782 - [Python] Add pyarrow.compress, decompress APIs
  • ARROW-1783 - [Python] Provide a "component" dict representation of a serialized Python object with minimal allocation
  • ARROW-1784 - [Python] Enable zero-copy serialization, deserialization of pandas.DataFrame via components
  • ARROW-1785 - [Format/C++/Java] Remove VectorLayout from serialized schemas
  • ARROW-1787 - [Python] Support reading parquet files into DataFrames in a backward compatible way
  • ARROW-1794 - [C++/Python] Rename DecimalArray to Decimal128Array
  • ARROW-1795 - [Plasma] Create flag to make Plasma store use a single memory-mapped file.
  • ARROW-1801 - [Docs] Update install instructions to use red-data-tools repos
  • ARROW-1802 - [GLib] Support arrow-gpu
  • ARROW-1806 - [GLib] Add garrow_record_batch_writer_write_table()
  • ARROW-1808 - [C++] Make RecordBatch, Table virtual interfaces for column access
  • ARROW-1809 - [GLib] Use .xml instead of .sgml for GTK-Doc main file
  • ARROW-1810 - [Plasma] Remove unused Plasma test shell scripts
  • ARROW-1816 - [Java] Resolve new vector classes structure for timestamp, date and maybe interval
  • ARROW-1817 - [Java] Configure JsonReader to read floating point NaN values
  • ARROW-1818 - Examine Java Dependencies
  • ARROW-1819 - [Java] Remove legacy vector classes
  • ARROW-1820 - [C++] Create arrow_compute shared library subcomponent
  • ARROW-1826 - [JAVA] Avoid branching in copyFrom for fixed width scalars
  • ARROW-1827 - [Java] Add checkstyle file and license template
  • ARROW-1828 - [C++] Hash kernel specialization for BooleanType
  • ARROW-1834 - [Doc] Build documentation in separate build folders
  • ARROW-1838 - [C++] Conform kernel API to use Datum for input and output
  • ARROW-1841 - [JS] Update text-encoding-utf-8 and tslib for node ESModules support
  • ARROW-1844 - [C++] Add initial Unique benchmarks for int64, variable-length strings
  • ARROW-1849 - [GLib] Add input checks to GArrowRecordBatch
  • ARROW-1850 - [C++] Use void* / const void* for buffers in file APIs
  • ARROW-1854 - [Python] Use pickle to serialize numpy arrays of objects.
  • ARROW-1855 - [GLib] Add workaround for build failure on macOS
  • ARROW-1857 - [Python] Add switch for boost linkage with static parquet in wheels
  • ARROW-1859 - [GLib] Add GArrowDictionaryDataType
  • ARROW-1862 - [GLib] Add GArrowDictionaryArray
  • ARROW-1864 - [Java] Upgrade Netty to 4.1.17
  • ARROW-1866 - [Java] Combine MapVector and NonNullableMapVector Classes
  • ARROW-1867 - [Java] Add missing methods to BitVector from legacy vector class
  • ARROW-1874 - [GLib] Add garrow_array_unique()
  • ARROW-1878 - [GLib] Add garrow_array_dictionary_encode()
  • ARROW-1884 - [C++] Exclude integration test JSON reader/writer classes from public API
  • ARROW-1885 - [Java] Restore MapVector class names prior to ARROW-1710
  • ARROW-1901 - [Python] Support recursive mkdir for DaskFilesystem
  • ARROW-1902 - [Python] Remove mkdir race condition from write_to_dataset
  • ARROW-1905 - [Python] Add more comprehensive list of exact type checking functions to pyarrow.types
  • ARROW-1911 - [JS] Add Graphistry to Arrow JS proof points
  • ARROW-1922 - Blog post on recent improvements/changes in JAVA Vectors
  • ARROW-1932 - [Website] Update site for 0.8.0
  • ARROW-1934 - [Website] Blog post summarizing highlights of 0.8.0 release

Apache Arrow 0.7.1 (2017-10-01)

New Features and Improvements

  • ARROW-559 - Add release verification script for Linux
  • ARROW-1464 - [GLib] Add "Common build problems" section into the README.md of c_glib
  • ARROW-1537 - [C++] Support building with full path install_name on macOS
  • ARROW-1546 - [GLib] Support GLib 2.40 again
  • ARROW-1548 - [GLib] Support bulk append in builder
  • ARROW-1578 - [C++] Run lint checks in Travis CI much earlier at before_script stage to fail faster
  • ARROW-1592 - [GLib] Add GArrowUIntArrayBuilder
  • ARROW-1608 - Support Release verification script on macOS
  • ARROW-1612 - [GLib] Update readme for mac os
  • ARROW-1618 - [JAVA] Reduce Heap Usage(Phase 1): move release listener logic to Allocation Manager
  • ARROW-1634 - [Website] Updates for 0.7.1 release

Bug Fixes

  • ARROW-1497 - [Java] Fix JsonReader to initialize count correctly
  • ARROW-1500 - [C++] Do not ignore return value from truncate in MemoryMa…
  • ARROW-1529 - [GLib] Use Xcode 8.3 on Travis CI
  • ARROW-1533 - [JAVA] realloc should consider the existing buffer capacity for computing target memory requirement
  • ARROW-1536 - [C++] Do not transitively depend on libboost_system
  • ARROW-1542 - [C++] Install packages in temporary directory in MSVC build verification script
  • ARROW-1544 - [JS] Export Vector types
  • ARROW-1545 - Remove deprecated args of builder
  • ARROW-1547 - [JAVA] Fix 8x memory over-allocation in BitVector
  • ARROW-1550 - [Python] Followup: fix flake8 warning
  • ARROW-1550 - [Python] Explicitly close owned file handles in ParquetWriter.close to avoid Windows flakiness
  • ARROW-1553 - [JAVA] Implement setInitialCapacity for MapWriter
  • ARROW-1554 - [Python] Update Sphinx install page to note that VC14 runtime may need to be installed on Windows
  • ARROW-1557 - [Python] Validate names length in Table.from_arrays
  • ARROW-1590 - [JS] Flow TS Table method generics
  • ARROW-1591 - C++: Xcode 9 is not correctly detected
  • ARROW-1595 - [Python] Fix package dependency resolution issue causing broken builds
  • ARROW-1598 - [C++] Fix diverged code comment in plasma tutorial
  • ARROW-1601 - [C++] Do not read extra byte from validity bitmap, add internal::BitmapReader in lieu of macros
  • ARROW-1606 - [Python] Copy .lib files in addition to .dll when bundling libraries for Windows
  • ARROW-1610 - C++/Python: Only call python-prefix if the default PYTHON_LIBRARY is not present
  • ARROW-1611 - [C++] Add BitmapWriter, do not perform out of bounds read in BitmapReader when length is 0
  • ARROW-1619 - [Java] Correctly set "lastSet" for variable vectors in JsonReader

Apache Arrow 0.7.0 (2017-09-17)

Bug Fixes

  • ARROW-12 - Get Github activity mirrored to JIRA
  • ARROW-248 - UnionVector.close() should call clear()
  • ARROW-269 - UnionVector getBuffers method does not include typevector
  • ARROW-407 - BitVector.copyFromSafe() should re-allocate if necessary instead of returning false
  • ARROW-801 - Provide direct access to underlying buffer memory addresses
  • ARROW-1302 - C++: Set MAKE to make if not defined
  • ARROW-1332 - [Packaging] Building Windows wheels in Apache repos
  • ARROW-1354 - [Python] Segfault in Table.from_pandas with Mixed-Type Categories
  • ARROW-1357 - [Python] Account for chunked arrays when converting lists back to pandas form
  • ARROW-1363 - [C++] Use buffer layout from dictionary index type in IPC messages
  • ARROW-1365 - [Python] Remove outdated pyarrow.jemalloc_memory_pool example. Update API doc site build instructions
  • ARROW-1373 - Implement getBuffer() methods for ValueVector
  • ARROW-1375 - [C++] Remove dependency on msvc version for Snappy build
  • ARROW-1378 - [Python] whl is not a supported wheel on this platform on Debian/Jessie
  • ARROW-1379 - [Java] adding maven-dependency-plugin and fixing all reported dependency errors
  • ARROW-1390 - [Python] Add more serialization tests
  • ARROW-1407 - Fix bug where DictionaryEncoder can only encode vector le…
  • ARROW-1411 - [Python] Booleans in Float Columns cause Segfault
  • ARROW-1414 - [GLib] Cast after status check
  • ARROW-1421 - [Python] Extend Python serialization API to accept non-list types
  • ARROW-1426 - [Site] Fix the title of the top page.
  • ARROW-1429 - [Python] Open common Parquet metadata file using passed file system
  • ARROW-1430 - [Python] Python CI build outside of a bash function scope, enable flake8 to fail build
  • ARROW-1434 - [Python][D] numpy arrays
  • ARROW-1435 - [Python] Properly handle time zone metadata in Parquet round trips
  • ARROW-1437 - [Python] pa.Array.from_pandas segfaults when given a mixed-type array
  • ARROW-1439 - [Packaging] Automate updating RPM in RPM build
  • ARROW-1443 - [Java] Fixed a small bug on ArrowBuf.setBytes with unsliced ByteBuffers
  • ARROW-1444 - [JAVA] fix last byte copy in BitVector splitAndTransfer
  • ARROW-1446 - [Python] Add (very slow) large memory unit test for int32 overflow in PARQUET-1090
  • ARROW-1450 - [Python] Raise proper error if custom serialization handler fails
  • ARROW-1452 - [C++] Restore DISALLOW_COPY_AND_ASSIGN usages removed in ARROW-1452 patch
  • ARROW-1452 - [C++] Make macros in arrow/util/macros.h more unique
  • ARROW-1453 - [C++/Python] Support non-contiguous Tensors in WriteTensor
  • ARROW-1457 - [C++] Optimize strided WriteTensor
  • ARROW-1458 - [Python] Document that create_parents=False is unsupported in HadoopFileSystem
  • ARROW-1459 - [Python] Use list values length to advance offset when reconstructing array of ndarrays
  • ARROW-1461 - [C++] Restore LLVM apt usage
  • ARROW-1461 - [C++] Disable builds using LLVM apt repo until installation issues resolved
  • ARROW-1467 - [JAVA] Fix reset() and allocateNew() in Nullable Value Vectors t…
  • ARROW-1469 - Segfault when serialize Pandas series with mixed object type
  • ARROW-1490 - [Java] Allow failures for JDK9 for now
  • ARROW-1493 - [C++] Flush stream in PrettyPrint functions
  • ARROW-1495 - [C++] Store shared_ptr to boxed arrays in RecordBatch
  • ARROW-1507 - [C++] Include arrow/array.h for arrow::internal::ArrayData
  • ARROW-1512 - [C++] Fix API change in documentation
  • ARROW-1514 - [C++] Fix a typo in document
  • ARROW-1527 - Fix Travis CI JDK9 build
  • ARROW-1531 - [C++] Return ToBytes by value from Decimal128
  • ARROW-1532 - [Python] Referencing an Empty Schema causes a SegFault

New Features and Improvements

  • ARROW-34 - C++: establish a basic function evaluation model
  • ARROW-229 - [C++] Implement cast functions for numeric types, booleans
  • ARROW-592 - [C++] Provide .deb and .rpm packages
  • ARROW-594 - [C++/Python] Write arrow::Table to stream and file writers
  • ARROW-695 - Add decimal integration test.
  • ARROW-696 - [C++] Support decimals in IPC and JSON reader/writer to enable integration tests
  • ARROW-759 - [Python] Serializing large class of Python objects in Apache Arrow
  • ARROW-786 - [Format] In-memory format for 128-bit Decimals, handling of sign bit
  • ARROW-837 - [Python] Add public pyarrow.allocate_buffer API. Rename FixedSizeBufferOutputStream
  • ARROW-941 - Add "cold start" instructions for running integration tests
  • ARROW-989 - [Python] Write pyarrow.Table to FileWriter or StreamWriter
  • ARROW-1156 - [C++/Python] Expand casting API, add UnaryKernel callable. Use Cast in appropriate places when converting from pandas
  • ARROW-1238 - [Java] Adding Decimal type JSON read and write support
  • ARROW-1286 - PYTHON: support Categorical serialization to/from parquet
  • ARROW-1307 - [Python] Expand IPC section to include object serialization, Feather format. Add Feather functions to API listing
  • ARROW-1317 - [Python] Attempt to set Hadoop CLASSPATH when using JNI
  • ARROW-1331 - [JAVA] include package statement
  • ARROW-1331 - [JAVA] Refactor unit tests
  • ARROW-1339 - [C++] Use of boost::filesystem::path to handle file paths
  • ARROW-1344 - [C++] Do not permit writing to closed BufferOutputStream
  • ARROW-1348 - [C++/Python] Release verification script for Windows
  • ARROW-1351 - Update CHANGELOG.md in 00-prepare.sh when creating release candidate
  • ARROW-1352 - [Integration] Added specific formatting for producer consumer output
  • ARROW-1355 - [Java] Make Arrow buildable with jdk9
  • ARROW-1356 - [Website] Add new committers
  • ARROW-1358 - Update sha{1, 256, 512} checksums per latest ASF release policy
  • ARROW-1359 - [C++] Add flavor='spark' option to write_parquet that sanitizes schema field names
  • ARROW-1364 - [C++] IPC support machinery for record batch roundtrips to GPU device memory
  • ARROW-1366 - [Plasma] Define entry point for the plasma store
  • ARROW-1372 - [Plasma] enable HUGETLB support on Linux to improve plasma put performance
  • ARROW-1376 - [C++] RecordBatchStreamReader::Open API is inconsistent with writer
  • ARROW-1377 - [Python] Add ParquetFile.scan_contents function to use for benchmarking
  • ARROW-1381 - [Python] Use FixedSizeBufferWriter in SerializedPyObject.to_buffer
  • ARROW-1383 - [C++] Add vector append variant to primitive array builders that accepts std::vector<bool>
  • ARROW-1384 - [C++] Add SerializeRecordBatch API for writing a record batch as an IPC message to a new buffer
  • ARROW-1386 - [C++] Unpin CMake version in MSVC toolchain builds
  • ARROW-1387 - [C++] Set up GPU leaf library, add unit test module for CUDA tests
  • ARROW-1392 - [C++] Add GPU IO interfaces for CUDA
  • ARROW-1395 - [C++/Python] Remove APIs deprecated from 0.5.0 onward
  • ARROW-1396 - [C++] Add PrettyPrint for schemas that outputs dictionaries
  • ARROW-1397 - [Packaging] Use Docker instead of Vagrant
  • ARROW-1399 - [C++] Add CUDA build version defines in public headers
  • ARROW-1400 - [Python] Adding parquet.write_to_dataset() method for writing partitioned .parquet files
  • ARROW-1401 - [C++] Add note to readme about ARROW_EXTRA_ERROR_CONTEXT
  • ARROW-1401 - [C++] Add ARROW_EXTRA_ERROR_CONTEXT option
  • ARROW-1402 - [C++] Deprecate APIs which return std::shared_ptr<MutableBuffer> in favor of std::shared_ptr<Buffer>
  • ARROW-1404 - [Packaging] Build .deb and .rpm on Travis CI
  • ARROW-1405 - [Python] Expose LoggingMemoryPool in Python API
  • ARROW-1406 - [Python] Harden user API for generating serialized schema and record batch messages as memoryview-compatible objects
  • ARROW-1408 - [C++] IPC public API cleanup, refactoring. Add SerializeSchema, ReadSchema public APIs
  • ARROW-1410 - Remove MAP_POPULATE flag when mmapping files in Plasma store.
  • ARROW-1412 - [Plasma] Add higher level API for putting and getting Python objects
  • ARROW-1413 - [C++] Add include-what-you-use configuration
  • ARROW-1415 - [GLib] Support date32 and date64
  • ARROW-1416 - Clarify memory layout documentation
  • ARROW-1417 - [Python] Allow more generic filesystem objects to be passed to ParquetDataset
  • ARROW-1418 - [Python] Introduce SerializationContext to register custom serialization callbacks
  • ARROW-1419 - [GLib] Suppress sign-conversion warnings
  • ARROW-1427 - [GLib] Add arrow cpp link to readme
  • ARROW-1428 - [C++] Append steps to clone source code to README.mb
  • ARROW-1432 - [C++] Build bundled jemalloc functions with private prefix
  • ARROW-1433 - [C++] Simplify Array::Slice to be non-virtual
  • ARROW-1438 - [Python] Pull serialization context through PlasmaClient put and get
  • ARROW-1441 - [Site] Add Ruby to Flexible section
  • ARROW-1442 - [Website] Add note about nightly builds to /install
  • ARROW-1447 - [C++] Fix many include-what-you-use warnings
  • ARROW-1448 - [Packaging] Support uploading built .deb and .rpm to Bintray
  • ARROW-1449 - Implement Decimal using only Int128
  • ARROW-1451 - [C++] Add public API file for IO section in arrow/io/api.h
  • ARROW-1460 - [C++] Pin clang-format at LLVM 4.0
  • ARROW-1462 - [GLib] Add GArrowTime32Array and GArrowTime64Array
  • ARROW-1466 - [C++] Implement PrettyPrint for DecimalArray
  • ARROW-1468 - [C++] Add primitive Append variants that accept std::vector<T>
  • ARROW-1479 - [JS] Expand JavaScript implementation
  • ARROW-1480 - [Python] Improve performance of serializing sets
  • ARROW-1481 - [C++] Expose type casts as generic callable object that can write into pre-allocated memory
  • ARROW-1494 - [C++] Improve doxygen comments in arrow/table.h, note that RecordBatch::column returns new object
  • ARROW-1499 - [Python] Consider adding option to parquet.write_table that sets options for maximum Spark compatibility
  • ARROW-1504 - [GLib] Add GArrowTimestampArray
  • ARROW-1505 - [GLib] Simplify arguments check
  • ARROW-1506 - [C++] Add .pc for compute modules
  • ARROW-1508 - C++: Add support for FixedSizeBinaryType in DictionaryBuilder
  • ARROW-1510 - [GLib] Support cast
  • ARROW-1511 - [C++] Promote ArrayData, MakeArray to public API, deprecate MakePrimitiveArray
  • ARROW-1513 - C++: Add cast from Dictionary to plain arrays
  • ARROW-1515 - [GLib] Detect version directly
  • ARROW-1516 - [GLib] Update document
  • ARROW-1517 - Remove unnecessary temporary in DecimalUtil::ToString function
  • ARROW-1519 - [C++] Move DecimalUtil functions to methods on the Int128 class
  • ARROW-1528 - [GLib] Resolve recursive include dependency
  • ARROW-1530 - [C++] Install arrow/util/parallel.h
  • ARROW-1551 - [Website] Updates for 0.7.0 release
  • ARROW-1597 - [Packaging] arrow-compute.pc is missing in .deb/.rpm file list

Apache Arrow 0.6.0 (2017-08-14)

Bug Fixes

  • ARROW-187 - [C++] Add development style notes to C++ README, note about esoteric exceptions in constructors
  • ARROW-276 - [JAVA] Nullable Vectors should extend BaseValueVector and not Bas…
  • ARROW-573 - [C++/Python] Implement IPC metadata handling for ordered dictionaries, pandas conversions
  • ARROW-884 - [C++] Exclude internal namespaces from generated Doxygen docs
  • ARROW-932 - [Python] Fix MSVC compiler warnings, build Python with /WX and -Werror in CI
  • ARROW-968 - [Python] Support slices in RecordBatch.getitem
  • ARROW-1192 - [JAVA] Use buffer slice for splitAndTransfer in List and Union Vectors.
  • ARROW-1195 - [C++] CpuInfo init with cores number, frequency and cache…
  • ARROW-1204 - [C++] Remove WholeProgramOptimization(/GL) compilation fl…
  • ARROW-1225 - [Python] Decode bytes to utf8 unicode if possible when passing explicit utf8 type to pyarrow.array
  • ARROW-1237 - [JAVA] expose the ability to set lastSet
  • ARROW-1239 - [JAVA] upgrading git-commit-id-plugin
  • ARROW-1240 - [JAVA] security: upgrade logback to address CVE-2017-5929 (take 2)
  • ARROW-1240 - [JAVA] security: upgrade slf4j to 1.7.25 and logback to 1.2.3
  • ARROW-1241 - [C++] Appveyor build matrix extended with Visual Studio 2…
  • ARROW-1242 - [JAVA] - upgrade jackson to mitigate security vulnerabilities (take 2)
  • ARROW-1242 - [JAVA] - upgrade jackson to mitigate security vulnerabilities
  • ARROW-1245 - [Integration] Enable JavaTester in Integration tests
  • ARROW-1248 - [Python] Suppress return-type-c-linkage warning in Cython clang builds
  • ARROW-1249 - [JAVA] expose fillEmpties from Nullable variable length vectors
  • ARROW-1263 - [C++] Get CPU info on Windows; Resolve patching whitespac…
  • ARROW-1265 - [Plasma] Clean up all resources on SIGTERM to keep valgrind output clean
  • ARROW-1267 - [Java] Handle zero length case in BitVector.splitAndTransfer
  • ARROW-1269 - [Packaging] Add Windows wheel build scripts from ARROW-1068 to arrow-dist
  • ARROW-1275 - [C++] Deafult Snappy static lib suffix updated to "_static"
  • ARROW-1276 - enable parquet serialization of empty DataFrames
  • ARROW-1283 - [JAVA] Allow VectorSchemaRoot to close more than once
  • ARROW-1285 - [Python] Delete any incomplete file when attempt to write single Parquet file fails
  • ARROW-1287 - [Python] Implement whence argument for pyarrow.NativeFile.seek
  • ARROW-1290 - [C++] Double buffer size when exceeding capacity in arrow::BufferBuilder as in array builders
  • ARROW-1291 - [Python] Cast non-string DataFrame columns to strings in RecordBatch/Table.from_pandas
  • ARROW-1294 - [C++] Pin cmake=3.8.0 in MSVC toolchain build
  • ARROW-1296 - [Java] Fix allocationSizeInBytes in FixedValueVectors.res…
  • ARROW-1300 - [JAVA] Fix Tests for ListVector
  • ARROW-1306 - [C++] Use UTF8 filenames in local file error messages
  • ARROW-1308 - [C++] Link utility executables to Arrow shared library if ARROW_BUILD_STATIC=off
  • ARROW-1309 - [Python] Handle nested lists with all None values in Array.from_pandas
  • ARROW-1310 - [JAVA] revert changes made in ARROW-886
  • ARROW-1311 - python hangs after write a few parquet tables
  • ARROW-1312 - [Python] Follow-up: do not use jemalloc in manylinux1 builds
  • ARROW-1312 - [C++] Make ARROW_JEMALLOC OFF by default until ARROW-1282 is resolved
  • ARROW-1326 - [Python] Fix Sphinx Build in Travis CI, treat Sphinx warnings as errors
  • ARROW-1327 - [Python] Always release GIL before calling check_status in Cython
  • ARROW-1328 - [Python] Set correct Arrow type when coercing to milliseconds and passing explicit type
  • ARROW-1330 - [Plasma] Turn on plasma tests on manylinux1
  • ARROW-1335 - [C++] Add offset to PrimitiveArray::raw_values to make consistent with other raw_values
  • ARROW-1338 - [Python] Do not close RecordBatchWriter on dealloc in case sink is no longer valid
  • ARROW-1340 - [Java] Fix NullableMapVector field metadata
  • ARROW-1342 - [Python] Support strided ndarrays in pandas conversion from nested lists
  • ARROW-1343 - [Java] Aligning serialized schema, end of buffers in RecordBatches
  • ARROW-1350 - [C++] Do not exclude Plasma source tree from source release

New Features and Improvements

  • ARROW-439 - [Python] Add option in "to_pandas" conversions to yield Categorical from String/Binary arrays
  • ARROW-622 - [Python] Add coerce_timestamps option to parquet.write_table, deprecate timestamps_to_ms argument
  • ARROW-1076 - [Python] Handle nanosecond timestamps more gracefully when writing to Parquet format
  • ARROW-1093 - [Python] Run flake8 in Travis CI. Add note about development to README
  • ARROW-1104 - Integrate in-memory object store from Ray
  • ARROW-1116 - [Python] Create single external GitHub repo building for building wheels for all platforms in one shot
  • ARROW-1121 - [C++] Improve error message when opening OS file fails
  • ARROW-1140 - [C++] Allow optional build of plasma
  • ARROW-1149 - [Plasma] Create Cython client library for Plasma
  • ARROW-1173 - [Plasma] Add blog post describing Plasma object store
  • ARROW-1211 - [C++] Enable builder classes to automatically use the default memory pool
  • ARROW-1213 - [Python] Support s3fs filesystem for Amazon S3 in ParquetDataset
  • ARROW-1219 - [C++] Use Google C++ code formatting
  • ARROW-1224 - [Format] Clarify language around buffer padding and align…
  • ARROW-1230 - [Plasma] Install libraries and headers
  • ARROW-1243 - [JAVA] update all libs to latest versions
  • ARROW-1246 - [Format] Draft Flatbuffer metadata description for Map
  • ARROW-1251 - [C++] Update C++ README to account for toolchain evolution
  • ARROW-1253 - [C++/Python] Speed up C++ / Python builds by using conda-forge toolchain for thirdparty libraries
  • ARROW-1255 - [Plasma] Fix typo in plasma protocol; add DCHECK for ReadXXX in plasma protocol.
  • ARROW-1256 - [Plasma] Fix compile warnings on macOS
  • ARROW-1257 - Plasma documentation
  • ARROW-1258 - [C++] Suppress Clang dlmalloc compiler warnings
  • ARROW-1259 - [Plasma] Speed up plasma tests
  • ARROW-1260 - [Plasma] Use factory method to create Python PlasmaClient
  • ARROW-1264 - [Python] Raise exception in Python instead of aborting if cannot connect to Plasma store
  • ARROW-1268 - [WEBSITE] Added blog post for Spark integration toPandas()
  • ARROW-1270 - [Packaging] Add Python wheel build scripts for macOS to arrow-dist
  • ARROW-1272 - [Python] Add script to arrow-dist to generate and upload manylinux1 Python wheels
  • ARROW-1273 - [Python] Add Parquet read_metadata, read_schema convenience functions
  • ARROW-1274 - [C++] Fix CMake >= 3.3 warning. Also add option to suppress ExternalProject output
  • ARROW-1281 - [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python
  • ARROW-1288 - Fix many license headers to use proper ASF one
  • ARROW-1289 - [Python] Add PYARROW_BUILD_PLASMA CMake option, follow semantics of --with-parquet
  • ARROW-1297 - 0.6.0 Release
  • ARROW-1301 - [C++/Python] More complete filesystem API for HDFS
  • ARROW-1303 - [C++] Support downloading Boost
  • ARROW-1304 - [Java] Fix Indentation, WhitespaceAround and EmptyLineSeparator checkstyle warnings in Java
  • ARROW-1305 - [GLib] Add GArrowIntArrayBuilder
  • ARROW-1315 - [GLib] Add missing status check for arrow::ArrayBuilder::Finish()
  • ARROW-1323 - [GLib] Add garrow_boolean_array_get_values()
  • ARROW-1333 - [Plasma] Example code for using Plasma to sort a DataFrame
  • ARROW-1334 - [C++] Add alternate Table constructor that takes vector of Array
  • ARROW-1336 - [C++] Add arrow::schema factory function, simply some awkward constructors
  • ARROW-1353 - [Website] Updates + blog post for 0.6.0 release

Apache Arrow 0.5.0 (2017-07-23)

New Features and Improvements

  • ARROW-111 - [C++] Add static analyzer to tool chain to verify checking of Status returns
  • ARROW-195 - [C++] Upgrade clang bits to clang-3.8 and move back to trusty.
  • ARROW-460 - [C++] JSON read/write for dictionaries
  • ARROW-462 - [C++] Implement in-memory conversions between non-nested primitive types and DictionaryArray equivalent
  • ARROW-575 - Python: Auto-detect nested lists and nested numpy arrays in Pandas
  • ARROW-597 - [Python] Add read_pandas convenience to stream and file reader classes. Add some data type docstrings
  • ARROW-599 - [C++] Lz4 compression codec support
  • ARROW-599 - CMake support of LZ4 compression lib
  • ARROW-600 - ZSTD compression lib support
  • ARROW-692 - Integration test data generator for dictionary types
  • ARROW-693 - [Java] Add dictionary support to JSON reader and writer
  • ARROW-742 - [C++] Use gflags from toolchain; Resolve cmake FindGFlags …
  • ARROW-742 - [C++] std::wstring_convert exceptions handling
  • ARROW-834 - Python Support creating from iterables
  • ARROW-915 - [Python] Struct Array reads limited support
  • ARROW-935 - [Java] Build Javadoc and site with OpenJDK8 in Java CI build
  • ARROW-960 - Add section on how to develop with pip
  • ARROW-962 - [Python] Add schema attribute to RecordBatchFileReader
  • ARROW-964 - [Python] Improve api docs
  • ARROW-966 - [Python] Also accept Field instance in pyarrow.list_
  • ARROW-978 - [Python] - Change python documentation sphinx theme to bootstrap
  • ARROW-1041 - [Python] Support read_pandas on a directory of Parquet files
  • ARROW-1048 - Use existing LD_LIBRARY_PATH in source release script to accommodate non-system toolchain libs
  • ARROW-1052 - Arrow 0.5.0 release
  • ARROW-1071 - [Python] RecordBatchFileReader does not have a schema property
  • ARROW-1073 - C++: Adapative integer builder
  • ARROW-1095 - Add Arrow logo PNG to website img folder
  • ARROW-1100 - [Python] Add mode property to NativeFile
  • ARROW-1102 - Make MessageSerializer.serializeMessage() public
  • ARROW-1120 - Support for writing timestamp(ns) to Int96
  • ARROW-1122 - [Website] Change timestamp to yield correct Jekyll date
  • ARROW-1122 - [Website] Add turbodbc + arrow blog post
  • ARROW-1123 - Make jemalloc the default allocator
  • ARROW-1135 - [C++] Use clang 4.0 in one of the Linux builds
  • ARROW-1137 - Python: Ensure Pandas roundtrip of all-None column
  • ARROW-1142 - [C++] Port over compression toolchain and interfaces from parquet-cpp, use Arrow-style error handling
  • ARROW-1145 - [GLib] Add get_values()
  • ARROW-1146 - Add .gitignore for *_generated.h files in src/plasma/format
  • ARROW-1148 - [C++] Raise minimum CMake version to 3.2
  • ARROW-1151 - [C++] Add branch prediction to RETURN_NOT_OK
  • ARROW-1154 - [C++] Import miscellaneous computational utility code from parquet-cpp
  • ARROW-1160 - C++: Implement DictionaryBuilder
  • ARROW-1165 - [C++] Refactor PythonDecimalToArrowDecimal to not use templates
  • ARROW-1172 - [C++] Refactor to use unique_ptr for builders
  • ARROW-1183 - [Python] Implement pandas conversions between Time32, Time64 types and datetime.time
  • ARROW-1185 - [C++] Status class cleanup, warn_unused_result attribute and Clang warning fixes
  • ARROW-1187 - Python: Feather: Serialize a DataFrame with None column
  • ARROW-1193 - [C++] Support pkg-config for arrow_python.so
  • ARROW-1196 - [C++] Release, Debug, Toolchain, NMake Generator Appveyor…
  • ARROW-1198 - Python: Add public C++ API to unwrap PyArrow object
  • ARROW-1199 - [C++] Implement mutable POD struct for Array data
  • ARROW-1202 - [C++] Remove semicolons from status macros
  • ARROW-1212 - [GLib] Add garrow_binary_array_get_offsets_buffer()
  • ARROW-1214 - [Python/C++] Add C++ functionality to more easily handle encapsulated IPC messages, Python bindings
  • ARROW-1217 - [GLib] Add GInputStream based arrow::io::RandomAccessFile
  • ARROW-1220 - [C++] Cmake script errors out if lib is not found under *…
  • ARROW-1221 - [C++] Add run_clang_format.py script, exclusions file. Pin clang-format-3.9
  • ARROW-1227 - [GLib] Support GOutputStream
  • ARROW-1229 - [GLib] Use "read" instead of "get" for reading record batch
  • ARROW-1244 - Exclude C++ Plasma source tree when creating source release
  • ARROW-1252 - [Website] Update for 0.5.0 release, add blog post summarizing changes from 0.4.x

Bug Fixes

  • ARROW-288 - Implement Arrow adapter for Spark Datasets
  • ARROW-601 - Some logical types not supported when loading Parquet
  • ARROW-784 - Cleaning up thirdparty toolchain support in Arrow on Windows
  • ARROW-785 - possible issue on writing parquet via pyarrow, subsequently read in Hive
  • ARROW-924 - Setting GTEST_HOME Fails on CMake run
  • ARROW-992 - [Python] Try to set a version in in-place local builds
  • ARROW-1043 - [Python] Make sure pandas metadata created by arrow conforms to the pandas spec
  • ARROW-1074 - Support lists and arrays in pandas DataFrames without explicit schema
  • ARROW-1079 - [Python] Filter out private directories when building Parquet dataset manifest
  • ARROW-1081 - Fill null_bitmap correctly in TestBase
  • ARROW-1096 - [C++] CreateFileMapping maximum size calculation issue
  • ARROW-1097 - Reading tensor needs file to be opened in writeable mode
  • ARROW-1098 - . [Format] modify document mistake
  • ARROW-1101 - Implement write(TypeHolder) methods in UnionListWriter
  • ARROW-1103 - [Python] Support read_pandas (with index metadata) on directory of Parquet files
  • ARROW-1107 - [JAVA] Fix getField() for NullableMapVector
  • ARROW-1108 - [JAVA] Check if ArrowBuf is empty buffer in getActualConsumedMemory() and getPossibleConsumedMemory()
  • ARROW-1109 - [JAVA] transferOwnership fails when readerIndex is not 0
  • ARROW-1110 - [JAVA] make union vector naming consistent
  • ARROW-1111 - [JAVA] Make aligning buffers optional, and allow -1 for unknown null count
  • ARROW-1112 - [JAVA] Set lastSet for VarLength and List vectors when loading
  • ARROW-1113 - [C++] Upgrade to gflags 2.2.0, use tarball instead of git tag
  • ARROW-1115 - [C++] use CCACHE_FOUND value for ccache path
  • ARROW-1117 - [Docs] Minor issues in GLib README
  • ARROW-1124 - Increase numpy dependency to >=1.10.x
  • ARROW-1125 - Python: Add public C++ API to unwrap PyArrow object
  • ARROW-1125 - partial schemas for Table.from_pandas
  • ARROW-1128 - [Docs] command to build a wheel is not properly rendered
  • ARROW-1129 - [C++] Fix gflags issue in Linux/macOS toolchain builds
  • ARROW-1130 - io-hdfs-test failure
  • ARROW-1131 - [Python] Enable the Parquet unit tests by default if the extension imports
  • ARROW-1132 - [Python] Unable to write pandas DataFrame w/MultiIndex containing duplicate values to parquet
  • ARROW-1136 - [C++] Add null checks for invalid streams
  • ARROW-1138 - Travis: Use OpenJDK7 instead of OracleJDK7
  • ARROW-1139 - Silence dlmalloc warning on clang-4.0
  • ARROW-1141 - on import get libjemalloc.so.2: cannot allocate memory in static TLS block
  • ARROW-1143 - C++: Fix comparison of NullArray
  • ARROW-1144 - [C++] Remove unused variable
  • ARROW-1147 - [C++] Allow optional vendoring of flatbuffers in plasma
  • ARROW-1150 - Silence AdaptiveIntBuilder compiler warning on MSVC
  • ARROW-1152 - [Cython] read_tensor should work with a readable file
  • ARROW-1153 - All non-Pandas column throws NotImplemented: unhandled type
  • ARROW-1155 - [Python] Add null check when user improperly instantiates ArrayValue instances
  • ARROW-1157 - C++/Python: Decimal templates are not correctly exported on OSX
  • ARROW-1159 - [C++] Use dllimport for visibility when not building Arrow library
  • ARROW-1162 - Empty data vector transfer between list vectors should no…
  • ARROW-1164 - C++: Templated functions need ARROW_EXPORT instead of ARROW_TEMPLATE_EXPORT
  • ARROW-1166 - Fix errors in example and missing reference in Layout.md
  • ARROW-1167 - [Python] Support chunking string columns in Table.from_pandas
  • ARROW-1168 - [Python] pandas metadata may contain "mixed" data types
  • ARROW-1169 - [C++] jemalloc externalproject doesn't build with CMake's ninja generator
  • ARROW-1170 - C++: Link to pthread on ARROW_JEMALLOC=OFF
  • ARROW-1174 - [GLib] Fix ListArray test failure
  • ARROW-1177 - [C++] Check for int32 offset overflow in ListBuilder, BinaryBuilder
  • ARROW-1179 - C++: Add missing virtual destructors
  • ARROW-1180 - [GLib] Fix a returning invalid address bug in garrow_tensor_get_dimension_name()
  • ARROW-1181 - [Python] Parquet multiindex test should be optional
  • ARROW-1182 - C++: Specify BUILD_BYPRODUCTS for zlib and zstd
  • ARROW-1186 - [C++] Add support to build only Parquet dependencies
  • ARROW-1188 - [Python] Handle Feather case where category values are null type
  • ARROW-1190 - [JAVA] Fixing VectorLoader for duplicate field names
  • ARROW-1191 - [JAVA] Implement getField() method for complex readers
  • ARROW-1194 - [Python] Expose MockOutputStream in pyarrow.
  • ARROW-1197 - [GLib] Fix a bug that record batch related functions for C++ aren't included
  • ARROW-1200 - C++: Switch DictionaryBuilder to signed integers
  • ARROW-1201 - [Python] Incomplete Python types cause a core dump when repr-ing
  • ARROW-1203 - [C++] Disallow BinaryBuilder to append byte strings larger than the maximum value of int32_t
  • ARROW-1205 - C++: Reference to type objects in ArrayLoader may cause segmentation faults
  • ARROW-1206 - [C++] Add finer grained control of compression library support, do not expose symbols which may not be built in compression.h
  • ARROW-1208 - [C++] Install zstd from conda for Toolchain Appveyor buil…
  • ARROW-1208 - [C++] Temporary remove conda's build of zstd from Toolcha…
  • ARROW-1215 - [Python] Generate documentation for class members in API Reference
  • ARROW-1216 - [Python] Fix creating numpy array from arrow buffers on python 2
  • ARROW-1218 - [C++] Fix arrow build if no compression library is used
  • ARROW-1222 - [Python] Raise exception when passing unsupported Python object type to pyarrow.array
  • ARROW-1223 - [GLib] Fix function name that returns wrapped object
  • ARROW-1228 - [GLib] Fix test file name
  • ARROW-1233 - [C++] Validate libs availability in conda toolchain
  • ARROW-1235 - [C++] Make operator<< for Array/Status and std::ostream inline
  • ARROW-1236 - Fix lib path in pkg-config file
  • ARROW-1284 - Windows can't install pyarrow 0.4.1 and 0.5.0

Apache Arrow 0.4.1 (2017-06-09)

Bug Fixes

  • ARROW-424 - [C++] Make ReadAt, Write HDFS functions threadsafe
  • ARROW-1039 - Python: pyarrow.Filesystem.read_parquet causing error if nthreads>1
  • ARROW-1050 - [C++] Export arrow::ValidateArray
  • ARROW-1051 - [Python] Opt in to Parquet unit tests to avoid accidental suppression of dynamic linking errors
  • ARROW-1056 - [Python] Ignore pandas index in parquet+hdfs test
  • ARROW-1057 - Fix cmake warning and msvc debug asserts
  • ARROW-1060 - [Python] Add unit tests for reference counts in memoryview interface
  • ARROW-1062 - [GLib] Follow API changes in examples
  • ARROW-1066 - [Python] pandas 0.20.1 deprecation of pd.lib causes a warning on import
  • ARROW-1070 - [C++] Use physical types for Feather date/time types
  • ARROW-1075 - [GLib] Fix build error on macOS
  • ARROW-1082 - [GLib] Add CI on macOS
  • ARROW-1085 - [java] Follow up on template cleanup. Missing method for …
  • ARROW-1086 - include additional pxd files during package build
  • ARROW-1088 - [Python] Only test unicode filenames if system supports them
  • ARROW-1090 - Improve build_ext usability with --bundle-arrow-cpp
  • ARROW-1091 - Decimal scale and precision are flipped
  • ARROW-1092 - More Decimal and scale flipped follow-up
  • ARROW-1094 - [C++] Always truncate buffer read in ReadableFile::Read if actual number of bytes less than request
  • ARROW-1127 - pyarrow 4.1 import failure on Travis

New Features and Improvements

  • ARROW-897 - [GLib] Extract CI configuration for GLib
  • ARROW-986 - [Format] Add brief explanation of dictionary batches in IPC.md
  • ARROW-990 - [JS] Add tslint support for linting TypeScript
  • ARROW-1020 - [Format] Revise language for Timestamp type in Schema.fbs to avoid possible confusion about tz-naive timestamps
  • ARROW-1034 - [PYTHON] Resolve wheel build issues on Windows
  • ARROW-1049 - [java] vector template cleanup
  • ARROW-1063 - [Website] Updates for 0.4.0 release, release posting
  • ARROW-1068 - [Python] Create external repo with appveyor.yml configured for building Python wheel installers
  • ARROW-1069 - Add instructions for publishing maven artifacts
  • ARROW-1078 - [Python] Account for Apache Parquet shared library consolidation
  • ARROW-1080 - C++: Add tutorial about converting to/from row-wise representation
  • ARROW-1084 - Implementations of BufferAllocator should handle Netty's OutOfDirectMemoryError
  • ARROW-1118 - [Website] Site updates for 0.4.1

Apache Arrow 0.4.0 (2017-05-22)

Bug Fixes

  • ARROW-813 - [Python] setup.py sdist must also bundle dependent cmake m…
  • ARROW-824 - Date and Time Vectors should reflect timezone-less semantics
  • ARROW-856 - Also read compiler info from stdout
  • ARROW-909 - Link jemalloc statically if build as external project
  • ARROW-939 - fix division by zero if one of the tensor dimensions is zero
  • ARROW-940 - [JS] Generate multiple artifacts
  • ARROW-944 - Python: Compat broken for pandas==0.18.1
  • ARROW-948 - [GLib] Update C++ header file list
  • ARROW-952 - fix regex include from C++ standard library
  • ARROW-958 - [Python] Fix conda source build instructions
  • ARROW-979 - [Python] Fix setuptools_scm version when release tag is not in the master timeline
  • ARROW-991 - [Python] Create new dtype when deserializing from Arrow to NumPy datetime64
  • ARROW-995 - [Website] Fix a typo
  • ARROW-998 - [Format] Clarify that the IPC file footer contains an additional copy of the schema
  • ARROW-1003 - [C++] Check flag WIN32 instead of _WIN32
  • ARROW-1004 - [Python] Add conversions for numpy object arrays with integers and floats
  • ARROW-1017 - [Python] Fix memory leaks in conversion to pandas.DataFrame
  • ARROW-1023 - Python: Fix bundling of arrow-cpp for macOS
  • ARROW-1033 - [Python] pytest discovers scripts/test_leak.py
  • ARROW-1045 - [JAVA] Add support for custom metadata in org.apache.arrow.vector.types.pojo.*
  • ARROW-1046 - [Python] Reconcile pandas metadata spec
  • ARROW-1053 - [Python] Remove unnecessary Py_INCREF in PyBuffer causing memory leak
  • ARROW-1054 - [Python] Test suite fails on pandas 0.19.2
  • ARROW-1061 - [C++] Harden decimal parsing against invalid strings
  • ARROW-1064 - ModuleNotFoundError: No module named 'pyarrow._parquet'

New Features and Improvements

  • ARROW-29 - [C++] FindRe2 cmake module
  • ARROW-182 - [C++] Factor out Array::Validate into a separate function
  • ARROW-376 - Python: Convert non-range Pandas indices (optionally) to Arrow
  • ARROW-446 - [Python] Expand Sphinx documentation for 0.3
  • ARROW-482 - [Java] Exposing custom field metadata
  • ARROW-532 - [Python] Expand pyarrow.parquet documentation for 0.3 release
  • ARROW-579 - Python: Provide redistributable pyarrow wheels on OSX
  • ARROW-596 - [Python] Add convenience function to convert pandas.DataFrame to pyarrow.Buffer containing a file or stream representation
  • ARROW-629 - [JS] Add unit test suite
  • ARROW-714 - [C++] Add import_pyarrow C API in the style of NumPy for thirdparty C++ users
  • ARROW-819 - Public Cython and C++ API in the style of lxml, arrow::py::import_pyarrow method
  • ARROW-872 - [JS] Read streaming format
  • ARROW-873 - [JS] Implement fixed width list type
  • ARROW-874 - [JS] Read dictionary-encoded vectors
  • ARROW-881 - [Python] Reconstruct Pandas DataFrame indexes using metadata
  • ARROW-891 - [Python] Expand Windows build instructions to not require looking at separate C++ docs
  • ARROW-899 - [Doc] Add 0.3.0 changelog
  • ARROW-901 - [Python] Add Parquet unit test for fixed size binary
  • ARROW-913 - [Python] Only link jemalloc to the Cython extension where it's needed
  • ARROW-923 - Changelog generation Python script, add 0.1.0 and 0.2.0 changelog
  • ARROW-929 - Remove KEYS file from git
  • ARROW-943 - [GLib] Support running unit tests with source archive
  • ARROW-945 - [GLib] Add a Lua example to show Torch integration
  • ARROW-946 - [GLib] Use "new" instead of "open" for constructor name
  • ARROW-947 - [Python] Improve execution time of manylinux1 build
  • ARROW-953 - Use conda-forge cmake, curl in CI toolchain
  • ARROW-954 - Flag for compiling Arrow with header-only boost
  • ARROW-956 - [Python] compat with pandas >= 0.20.0
  • ARROW-957 - [Doc] Add HDFS and Windows documents to doxygen output
  • ARROW-961 - [Python] Rename InMemoryOutputStream to BufferOutputStream
  • ARROW-963 - [GLib] Add equal
  • ARROW-967 - [GLib] Support initializing array with buffer
  • ARROW-970 - [Python] Nicer experience if user accidentally calls pyarrow.Table ctor directly
  • ARROW-977 - [java] Add Timezone aware timestamp vectors
  • ARROW-980 - Fix detection of "msvc" COMPILER_FAMILY
  • ARROW-982 - [Website] Improve website front copy to highlight serialization efficiency benefits
  • ARROW-984 - [GLib] Add Go examples
  • ARROW-985 - [GLib] Update package information
  • ARROW-988 - [JS] Add entry to Travis CI matrix
  • ARROW-993 - [GLib] Add missing error checks in Go examples
  • ARROW-996 - [Website] Add 0.3.0 release announce in Japanese
  • ARROW-997 - [Java] Implementing transferPair for FixedSizeListVector
  • ARROW-1000 - [GLib] Move install document to Website
  • ARROW-1001 - [GLib] Unify writer files
  • ARROW-1002 - [C++] Fix inconsistency with padding at start of IPC file format
  • ARROW-1008 - [C++] Add abstract stream writer and reader C++ APIs. Give clearer names to IPC reader/writer classes
  • ARROW-1010 - [Website] Provide for translations without repeating blog post in blogroll
  • ARROW-1011 - [FORMAT] fix typo and mistakes in Layout.md
  • ARROW-1014 - 0.4.0 release
  • ARROW-1015 - [Java] Schema-level metadata
  • ARROW-1016 - Python: Include C++ headers (optionally) in wheels
  • ARROW-1022 - [Python] Add multithreaded read option to read_feather
  • ARROW-1024 - Python: Update build time numpy version to 1.10.1
  • ARROW-1025 - [Website] Improved changelog for website, include git shortlog
  • ARROW-1027 - [Python] Allow negative indexing in fields/columns on pyarrow Table and Schema objects
  • ARROW-1028 - [Python] Fix IPC docs per API changes
  • ARROW-1029 - [Python] Fixes for building pyarrow with Parquet support on MSVC. Add to appveyor build
  • ARROW-1030 - Python: Account for library versioning in parquet-cpp
  • ARROW-1031 - [GLib] Support pretty print
  • ARROW-1037 - [GLib] Follow reader name change
  • ARROW-1038 - [GLib] Follow writer name change
  • ARROW-1040 - [GLib] Support tensor IO
  • ARROW-1044 - [GLib] Support Feather
  • ARROW-1126 - Python: Add function to convert NumPy/Pandas dtypes to Arrow DataTypes

Apache Arrow 0.3.0 (2017-05-05)

Bug Fixes

  • ARROW-109 - [C++] Add nesting stress tests up to 500 recursion depth
  • ARROW-208 - Add checkstyle policy to java project
  • ARROW-347 - Add method to pass CallBack when creating a transfer pair
  • ARROW-413 - DATE type is not specified clearly
  • ARROW-431 - [Python] Review GIL release and acquisition in to_pandas conversion
  • ARROW-443 - [Python] Support ingest of strided NumPy arrays from pandas
  • ARROW-451 - [C++] Implement DataType::Equals as TypeVisitor. Add default implementations for TypeVisitor, ArrayVisitor methods
  • ARROW-454 - pojo.Field doesn't implement hashCode()
  • ARROW-526 - [Format] Revise Format documents for evolution in IPC stream / file / tensor formats
  • ARROW-565 - [C++] Examine "Field::dictionary" member
  • ARROW-570 - Determine Java tools JAR location from project metadata
  • ARROW-584 - [C++] Fix compiler warnings exposed with -Wconversion
  • ARROW-586 - Problem with reading parquet files saved by Apache Spark
  • ARROW-588 - [C++] Fix some 32 bit compiler warnings
  • ARROW-595 - [Python] Set schema attribute on StreamReader
  • ARROW-604 - Python: boxed Field instances are missing the reference to their DataType
  • ARROW-611 - [Java] TimeVector TypeLayout is incorrectly specified as 64 bit width
  • ARROW-613 - WIP TypeScript Implementation
  • ARROW-617 - [Format] Add additional Time metadata and comments based on discussion in ARROW-617
  • ARROW-619 - [Python] Fixed remaining typo for LD_LIBRARY_PATH
  • ARROW-619 - Fix typos in setup.py args and LD_LIBRARY_PATH
  • ARROW-623 - Fix segfault in repr of empty field
  • ARROW-624 - [C++] Restore MakePrimitiveArray function, use in feather.cc
  • ARROW-627 - [C++] Add compatibility macros for exported extern templates
  • ARROW-628 - [Python] Install nomkl metapackage when building parquet-cpp in Travis CI
  • ARROW-630 - [C++] Create boolean batches for IPC testing, properly account for nonzero offset
  • ARROW-636 - [C++] Update README about Boost system requirement
  • ARROW-639 - [C++] Invalid offset in slices
  • ARROW-642 - [Java] Remove temporary file in java/tools
  • ARROW-644 - Python: Cython should be a setup-only requirement
  • ARROW-652 - Remove trailing f in merge script output
  • ARROW-654 - [C++] Serialize timezone in IPC metadata
  • ARROW-666 - [Python] Error in DictionaryArray __repr__
  • ARROW-667 - build of arrow-master/cpp fails with altivec error?
  • ARROW-668 - [Python] Box timestamp values as pandas.Timestamp if available, attach tzinfo
  • ARROW-671 - [GLib] Install missing license file
  • ARROW-673 - [Java] Support additional Time metadata
  • ARROW-677 - [java] Fix checkstyle jcl-over-slf4j conflict issue
  • ARROW-678 - [GLib] Fix dependencies
  • ARROW-680 - [C++] Support CMake 2 or older again
  • ARROW-682 - [Integration] Check implementations against themselves
  • ARROW-683 - [C++/Python] Refactor to make Date32 and Date64 types for new metadata. Test IPC roundtrip
  • ARROW-685 - [GLib] AX_CXX_COMPILE_STDCXX_11 error running ./configure
  • ARROW-686 - [C++] Account for time metadata changes, add Time32 and Time64 types
  • ARROW-689 - [GLib] Fix install directories
  • ARROW-691 - [Java] Encode dictionary type in message format
  • ARROW-697 - JAVA Throw exception for record batches > 2GB
  • ARROW-699 - [C++] Resolve Arrow and Arrow IPC build issues on Windows;
  • ARROW-702 - fix BitVector.copyFromSafe to reAllocate instead of returning false
  • ARROW-703 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-704 - Fix bad import caused by conflicting changes
  • ARROW-709 - [C++] Restore type comparator for DecimalType
  • ARROW-713 - [C++] Fix cmake linking issue in new IPC benchmark
  • ARROW-715 - [Python] Make pandas not a hard requirement, flake8 fixes
  • ARROW-716 - [Python] Update README build instructions after moving libpyarrow to C++ tree
  • ARROW-720 - arrow should not have a dependency on slf4j bridges in com…
  • ARROW-723 - [Python] Ensure that passing chunk_size=0 when writing Parquet file does not enter infinite loop
  • ARROW-726 - [C++] Fix segfault caused when passing non-buffer object to arrow::py::PyBuffer
  • ARROW-732 - [C++] Schema comparison bugs in struct and union types
  • ARROW-736 - [Python] Mixed-type object DataFrame columns should not silently co…
  • ARROW-738 - Fix manylinux1 build
  • ARROW-739 - Don't install jemalloc in parallel
  • ARROW-740 - FileReader fails for large objects
  • ARROW-747 - [C++] Calling add_dependencies with dl causes spurious CMake warning
  • ARROW-749 - [Python] Delete partially-written Feather file when column write fails
  • ARROW-753 - [Python] Fix linker error for python-test on OS X
  • ARROW-756 - [C++] MSVC build fixes and cleanup, remove -fPIC flag from EP builds on Windows, Dev docs
  • ARROW-757 - [C++] MSVC build fails on googletest when using NMake
  • ARROW-762 - [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS
  • ARROW-776 - [GLib] Fix wrong type name
  • ARROW-777 - restore getObject behavior on Date and Time
  • ARROW-778 - Port merge tool to work on Windows
  • ARROW-780 - PYTHON_EXECUTABLE Required to be set during build
  • ARROW-781 - [C++/Python] Increase reference count of the numpy base array?
  • ARROW-783 - [Java/C++] Fixes for 0-length record batches
  • ARROW-787 - [GLib] Fix compilation error caused by introducing BooleanBuilder::Append overload
  • ARROW-789 - Fix issue where setValueCount(0) doesn’t work in the case that we’ve shipped vectors across the wire
  • ARROW-793 - [GLib] Fix indent
  • ARROW-794 - [C++/Python] Disallow strided tensors in ipc::WriteTensor
  • ARROW-796 - [Java] Checkstyle additions causing build failure in some environments
  • ARROW-797 - [Python] Make more explicitly curated public API page, sphinx cleanup
  • ARROW-800 - [C++] Boost headers being transitively included in pyarrow
  • ARROW-805 - [C++] Don't throw IOError when listing empty HDFS dir
  • ARROW-809 - [C++] Do not write excess bytes in IPC writer after slicing arrays
  • ARROW-812 - Pip install pyarrow on mac failed.
  • ARROW-817 - [Python] Fix comment in date32 conversion
  • ARROW-821 - [Python] Extra file _table_api.h generated during Python build process
  • ARROW-822 - [Python] StreamWriter Wrapper for Socket and File-like Objects without tell()
  • ARROW-826 - [C++/Python] Fix compilation error on Mac with -DARROW_PYTHON=on
  • ARROW-829 - Don't deactivate Parquet dictionary encoding on column-wis…
  • ARROW-830 - [Python] Expose jemalloc memory pool and other memory pool functions in public pyarrow API
  • ARROW-836 - add test for pandas conversion of timedelta, currently unimplemented
  • ARROW-839 - [Python] Use mktime variant that is reliable on MSVC
  • ARROW-847 - Specify BUILD_BYPRODUCTS for gtest
  • ARROW-852 - Also search for ARROW libs when pkg-config provided the path
  • ARROW-853 - [Python] Only set RPATH when bundling the shared libraries
  • ARROW-858 - Remove boost_regex from arrow dependencies
  • ARROW-866 - [Python] Be robust to PyErr_Fetch returning a null exc value
  • ARROW-867 - [Python] pyarrow MSVC fixes
  • ARROW-875 - Avoid setting an extra empty in fillEmpties()
  • ARROW-879 - compat with pandas v0.20.0
  • ARROW-882 - [C++] Rename statically build library on Windows to avoid …
  • ARROW-883 - [JAVA] Introduction of new types has shifted Enumerations
  • ARROW-885 - [Python/C++] Decimal test failure on MSVC
  • ARROW-886 - [Java] Fixing reallocation of VariableLengthVector offsets
  • ARROW-887 - add default value to units for backward compatibility
  • ARROW-888 - Transfer ownership of buffer in BitVector transferTo()
  • ARROW-895 - Fix lastSet in fillEmpties() and copyFrom()
  • ARROW-900 - [Python] Fix UnboundLocalError in ParquetDatasetPiece.read
  • ARROW-903 - [GLib] Remove a needless "."
  • ARROW-914 - [C++/Python] Fix Decimal ToBytes
  • ARROW-922 - Allow Flatbuffers and RapidJSON to be used locally on Windows
  • ARROW-927 - C++/Python: Add manylinux1 builds to Travis matrix
  • ARROW-928 - [C++] Detect supported MSVC versions
  • ARROW-933 - [Python] Remove debug print statement
  • ARROW-934 - [GLib] Glib sources missing from result of 02-source.sh
  • ARROW-936 - add missing file; revert tag change
  • ARROW-936 - fix release README
  • ARROW-938 - Fix Rat license warnings

New Features and Improvements

  • ARROW-6 - Hope to add development document
  • ARROW-39 - C++: Logical chunked arrays / columns: conforming to fixed chunk sizes
  • ARROW-52 - Set up project blog
  • ARROW-95 - Add Jekyll-based website publishing toolchain, migrate existing arrow-site
  • ARROW-98 - Java: API documentation
  • ARROW-99 - C++: Explore if RapidCheck may be helpful for testing / worth adding to toolchain
  • ARROW-183 - C++: Add storage type to DecimalType
  • ARROW-231 - [C++] : Add typed Resize to PoolBuffer
  • ARROW-281 - [C++] IPC/RPC support on Win32 platforms
  • ARROW-316 - [Format] Changes to Date metadata format per discussion in ARROW-316
  • ARROW-341 - [Python] Move pyarrow's C++ code to the main C++ source tree, install libarrow_python and headers
  • ARROW-452 - [C++/Python] Incorporate C++ and Python codebases for Feather file format
  • ARROW-459 - [C++] Dictionary IPC support in file and stream formats
  • ARROW-483 - [C++/Python] Provide access to "custom_metadata" Field attribute in IPC setting
  • ARROW-491 - [Format / C++] Add FixedWidthBinary type to format, C++ implementation
  • ARROW-492 - [C++] Add arrow/arrow.h public API
  • ARROW-493 - [C++] Permit large (length > INT32_MAX) arrays in memory
  • ARROW-502 - [C++/Python] : Logging memory pool
  • ARROW-510 - ARROW-582 ARROW-663 ARROW-729: [Java] Added units for Time and Date types, and integration tests
  • ARROW-518 - C++: Make Status::OK method constexpr
  • ARROW-520 - [C++] STL-compliant allocator
  • ARROW-528 - [Python] Utilize improved Parquet writer C++ API, add write_metadata function, test _metadata files
  • ARROW-534 - [C++] Add IPC tests for date/time after ARROW-452, fix bugs
  • ARROW-539 - [Python] Add support for reading partitioned Parquet files with Hive-like directory schemes
  • ARROW-542 - Adding dictionary encoding to FileWriter
  • ARROW-550 - [Format] Draft experimental Tensor flatbuffer message type
  • ARROW-552 - [Python] Implement getitem for DictionaryArray by returning a value from the dictionary
  • ARROW-557 - [Python] Add option to explicitly opt in to HDFS tests, do not implicitly skip
  • ARROW-563 - Support non-standard gcc version strings
  • ARROW-566 - Bundle Arrow libraries in Python package
  • ARROW-568 - [C++] Add default implementations for TypeVisitor, ArrayVisitor methods that return NotImplemented
  • ARROW-569 - [C++] Set version for *.pc
  • ARROW-574 - Python: Add support for nested Python lists in Pandas conversion
  • ARROW-576 - [C++] Complete file/stream implementation for union types
  • ARROW-577 - [C++] Use private implementation pattern in ipc::StreamWriter and ipc::FileWriter
  • ARROW-578 - [C++] Add -DARROW_CXXFLAGS=... option to make CMake more consistent
  • ARROW-580 - C++: Also provide jemalloc_X targets if only a static or shared version is found
  • ARROW-582 - [Java] Added JSON reader/writer unit test for date, time, and timestamp
  • ARROW-589 - C++: Use system provided shared jemalloc if static is unavailable
  • ARROW-591 - [C++] Add round trip testing fixture for JSON format
  • ARROW-593 - [C++] : Rename ReadableFileInterface to RandomAccessFile
  • ARROW-598 - [Python] Add support for converting pyarrow.Buffer to a memoryview with zero copy
  • ARROW-603 - [C++] Add RecordBatch::Validate method, call in RecordBatch ctor in debug builds
  • ARROW-605 - [C++] Refactor IPC adapter code into generic ArrayLoader class. Add Date32Type
  • ARROW-606 - [C++] upgrade flatbuffers version to 1.6.0
  • ARROW-608 - [Format] Days since epoch date type
  • ARROW-610 - [C++] Win32 compatibility in file.cc
  • ARROW-612 - [Java] Added not null to Field.toString output
  • ARROW-615 - [Java] Moved ByteArrayReadableSeekableByteChannel to src main o.a.a.vector.util
  • ARROW-616 - [C++] Do not include debug symbols in release builds by default
  • ARROW-618 - [Python/C++] Support timestamp+timezone conversion to pandas
  • ARROW-620 - [C++] Implement JSON integration test support for date, time, timestamp, fixed width binary
  • ARROW-621 - [C++] Start IPC benchmark suite for record batches, implement "inline" visitor. Code reorg
  • ARROW-625 - [C++] Add TimeUnit to TimeType::ToString. Add timezone to TimestampType::ToString if present
  • ARROW-626 - [Python] Replace PyBytesBuffer with zero-copy, memoryview-based PyBuffer
  • ARROW-631 - [GLib] Import
  • ARROW-632 - [Python] Add support for FixedWidthBinary type
  • ARROW-635 - [C++] Add JSON read/write support for FixedWidthBinary
  • ARROW-637 - [Format] Add timezone to Timestamp metadata, comments describing the semantics
  • ARROW-646 - [Python] Conda s3 robustness, set CONDA_PKGS_DIR env variable and add Travis CI caching
  • ARROW-647 - [C++] Use Boost shared libraries for tests and utilities
  • ARROW-648 - [C++] Support multiarch on Debian
  • ARROW-650 - [GLib] Follow ReadableFileInterface -> RnadomAccessFile change
  • ARROW-651 - [C++] Set version to shared library
  • ARROW-655 - [C++/Python] Implement DecimalArray
  • ARROW-656 - [C++] Add random access writer for a mutable buffer. Rename WriteableFileInterface to WriteableFile for better consistency
  • ARROW-657 - [C++/Python] Expose Tensor IPC in Python. Add equals method. Add pyarrow.create_memory_map/memory_map functions
  • ARROW-658 - [C++] Implement a prototype in-memory arrow::Tensor type
  • ARROW-659 - [C++] Add multithreaded memcpy implementation
  • ARROW-660 - [C++] Restore function that can read a complete encapsulated record batch message
  • ARROW-661 - [C++] Add LargeRecordBatch metadata type, IPC support, associated refactoring
  • ARROW-662 - [Format] Move Schema flatbuffers into their own file that can be included
  • ARROW-663 - [Java] Support additional Time metadata + vector value accessors
  • ARROW-664 - [C++] Make C++ Arrow serialization deterministic
  • ARROW-669 - [Python] Attach proper tzinfo when computing boxed scalars for TimestampArray
  • ARROW-670 - Arrow 0.3 release
  • ARROW-672 - [Format] Add MetadataVersion::V3 for Arrow 0.3
  • ARROW-674 - [Java] Support additional Timestamp timezone metadata
  • ARROW-675 - [GLib] Update package metadata
  • ARROW-676 - move from MinorType to FieldType in ValueVectors to carry all the relevant type bits
  • ARROW-679 - [Format] Change FieldNode, RecordBatch lengths to long, remove LargeRecordBatch. Refactoring
  • ARROW-681 - [C++] Disable boost's autolinking if shared boost is used …
  • ARROW-684 - [Python] More helpful error message if libparquet_arrow not built
  • ARROW-687 - [C++] Build and run full test suite in Appveyor
  • ARROW-688 - [C++] Use CMAKE_INSTALL_INCLUDEDIR for consistency
  • ARROW-690 - Only send JIRA updates to issues@arrow.apache.org
  • ARROW-698 - Add flag to FileWriter::WriteRecordBatch for writing record batches with lengths over INT32_MAX
  • ARROW-700 - Add headroom interface for allocator
  • ARROW-701 - [Java] Support Additional Date Type Metadata
  • ARROW-706 - [GLib] Add package install document
  • ARROW-707 - [Python] Return NullArray for array of all None in Array.from_pandas. Revert from_numpy -> from_pandas
  • ARROW-708 - [C++] Simplify metadata APIs to all use the Message class, perf analysis
  • ARROW-710 - [Python] Read/write with file-like Python objects from read_feather/write_feather
  • ARROW-711 - [C++] Remove extern template declarations for NumericArray<T> types
  • ARROW-712 - [C++] Reimplement Array::Accept as inline visitor
  • ARROW-717 - [C++] Implement IPC zero-copy round trip for tensors
  • ARROW-718 - [Python] Implement pyarrow.Tensor container, zero-copy NumPy roundtrips
  • ARROW-719 - [GLib] Release source archive
  • ARROW-722 - [Python] Support additional date/time types and metadata, conversion to/from NumPy and pandas.DataFrame
  • ARROW-724 - Add How to Contribute section to README
  • ARROW-725 - [Formats/Java] FixedSizeList message and java implementation
  • ARROW-727 - [Python] Ensure that NativeFile.write accepts any bytes, unicode, or object providing buffer protocol. Rename build_arrow_buffer to pyarrow.frombuffer
  • ARROW-728 - [C++/Python] Add Table::RemoveColumn method, remove name member, some other code cleaning
  • ARROW-729 - [Java] Add vector type for 32-bit date as days since UNIX epoch
  • ARROW-731 - [C++] Add shared library related versions to .pc
  • ARROW-733 - [C++/Python] Rename FixedWidthBinary to FixedSizeBinary for consistency with FixedSizeList
  • ARROW-734 - [C++/Python] Support building PyArrow on MSVC
  • ARROW-735 - [C++] Developer instruction document for MSVC on Windows
  • ARROW-737 - [C++] Enable mutable buffer slices, SliceMutableBuffer function
  • ARROW-741 - [Python] Switch Travis CI to use Python 3.6 instead of 3.5
  • ARROW-743 - [C++] Consolidate all but decimal array tests into array-test, collect some tests in type-test.cc
  • ARROW-744 - [GLib] Re-add an assertion for garrow_table_new() test
  • ARROW-745 - [C++] Allow use of system cpplint
  • ARROW-746 - [GLib] Add garrow_array_get_data_type()
  • ARROW-748 - [Python] Pin runtime library versions in conda-forge packages to force upgrades
  • ARROW-751 - [Python] Make all Cython modules private. Some code tidying
  • ARROW-752 - [Python] Support boxed Arrow arrays as input to DictionaryArray.from_arrays
  • ARROW-754 - [GLib] Add garrow_array_is_null()
  • ARROW-755 - [GLib] Add garrow_array_get_value_type()
  • ARROW-758 - [C++] Build with /WX in Appveyor, fix MSVC compiler warnings
  • ARROW-761 - [C++/Python] Add GetTensorSize method, Python bindings
  • ARROW-763 - C++: Use to find libpythonX.X.dylib
  • ARROW-765 - [Python] Add more natural Exception type hierarchy for thirdparty users
  • ARROW-768 - [Java] Change the "boxed" object representation of date and time types
  • ARROW-769 - [GLib] Support building without installed Arrow C++
  • ARROW-770 - [C++] Move .clang* files back into cpp source tree
  • ARROW-771 - [Python] Add read_row_group / num_row_groups to ParquetFile
  • ARROW-773 - [CPP] Add Table::AddColumn API
  • ARROW-774 - [GLib] Remove needless LICENSE.txt copy
  • ARROW-775 - add simple constructors to value vectors
  • ARROW-779 - [C++] Check for old metadata and raise exception if found
  • ARROW-782 - [C++] API cleanup, change public member access in DataType classes to functions, use class instead of struct
  • ARROW-788 - [C++] Align WriteTensor message
  • ARROW-795 - [C++] Consolidate arrow/arrow_io/arrow_ipc into a single shared and static library
  • ARROW-798 - [Docs] Publish Format Markdown documents somehow on arrow.apache.org
  • ARROW-802 - [GLib] Add read examples
  • ARROW-803 - [GLib] Update package repository URL
  • ARROW-804 - [GLib] Update build document
  • ARROW-806 - [GLib] Support add/remove a column from table
  • ARROW-807 - [GLib] Update "Since" tag
  • ARROW-808 - [GLib] Remove needless ignore entries
  • ARROW-810 - [GLib] Remove io/ipc prefix
  • ARROW-811 - [GLib] Add GArrowBuffer
  • ARROW-815 - [Java] Exposing reAlloc for ValueVector
  • ARROW-816 - [C++] Travis CI script cleanup, add C++ toolchain env with Flatbuffers, RapidJSON
  • ARROW-818 - [Python] Expand Sphinx API docs, pyarrow.* namespace. Add factory functions for time32, time64
  • ARROW-820 - [C++] Build dependencies for Parquet library without arrow…
  • ARROW-825 - [Python] Rename pyarrow.from_pylist to pyarrow.array, test on tuples
  • ARROW-827 - [Python] Miscellaneous improvements to help with Dask support
  • ARROW-828 - [C++] Add new dependency to README
  • ARROW-831 - Switch from boost::regex to std::regex
  • ARROW-832 - [C++] Update to gtest 1.8.0, remove now unneeded test_main.cc
  • ARROW-833 - [Python] Add Developer quickstart for conda users
  • ARROW-841 - [Python] Add pyarrow build to Appveyor
  • ARROW-844 - [Format] Update README documents in format/
  • ARROW-845 - [Python] Sync changes from PARQUET-955; explicit ARROW_HOME will override pkgconfig
  • ARROW-846 - [GLib] Add GArrowTensor, GArrowInt8Tensor and GArrowUInt8Tensor
  • ARROW-848 - [Python] Another pass on conda dev guide
  • ARROW-849 - [C++] Support setting production build dependencies with ARROW_BUILD_TOOLCHAIN
  • ARROW-857 - [Python] Automate publishing Python documentation to arrow-site
  • ARROW-859 - [C++] Do not build unit tests by default?
  • ARROW-860 - [C++] Remove typed Tensor containers
  • ARROW-861 - [Python] Move DEVELOPMENT.md to Sphinx docs
  • ARROW-862 - [Python] Simplify README landing documentation to direct users and developers toward the documentation
  • ARROW-863 - [GLib] Use GBytes to implement zero-copy
  • ARROW-864 - [GLib] Unify Array files
  • ARROW-865 - [Python] Add unit tests validating Parquet date/time type roundtrips
  • ARROW-868 - [GLib] Use GBytes to reduce copy
  • ARROW-869 - [JS] Rename directory to js/
  • ARROW-871 - [GLib] Unify DataType files
  • ARROW-876 - [GLib] Unify ArrayBuilder files
  • ARROW-877 - [GLib] Add garrow_array_get_null_bitmap()
  • ARROW-878 - [GLib] Add garrow_binary_array_get_buffer()
  • ARROW-880 - [GLib] Support getting raw data of primitive arrays
  • ARROW-890 - [GLib] Add GArrowMutableBuffer
  • ARROW-892 - [GLib] Fix GArrowTensor document
  • ARROW-893 - Add GLib document to Web site
  • ARROW-894 - [GLib] Add GArrowResizableBuffer and GArrowPoolBuffer
  • ARROW-896 - Support Jupyter Notebook in Web site
  • ARROW-898 - [C++/Python] Use shared_ptr to avoid copying KeyValueMetadata, add to Field type also
  • ARROW-904 - [GLib] Simplify error check codes
  • ARROW-907 - C++: Construct Table from schema and arrays
  • ARROW-908 - [GLib] Unify OutputStream files
  • ARROW-910 - [C++] Write 0 length at EOS in StreamWriter
  • ARROW-916 - [GLib] Add GArrowBufferOutputStream
  • ARROW-917 - [GLib] Add GArrowBufferReader
  • ARROW-918 - [GLib] Use GArrowBuffer for read buffer
  • ARROW-919 - [GLib] Use "id" to get type enum value from GArrowDataType
  • ARROW-920 - [GLib] Add Lua examples
  • ARROW-925 - [GLib] Fix GArrowBufferReader test
  • ARROW-926 - Add wesm to KEYS
  • ARROW-930 - javadoc generation fails with java 8
  • ARROW-931 - [GLib] Reconstruct input stream
  • ARROW-965 - Website updates for 0.3.0 release

Apache Arrow 0.2.0 (2017-02-18)

Bug Fixes

  • ARROW-112 - Changed constexprs to kValue naming.
  • ARROW-202 - Integrate with appveyor ci for windows
  • ARROW-220 - [C++] Build conda artifacts in a build environment with better cross-linux ABI compatibility
  • ARROW-224 - [C++] Address static linking of boost dependencies
  • ARROW-230 - Python: Do not name modules like native ones (i.e. rename pyarrow.io)
  • ARROW-239 - Test reading remainder of file in HDFS with read() with no args
  • ARROW-261 - Refactor String/Binary code paths to reflect unnested (non-list-based) structure
  • ARROW-273 - Lists use unsigned offset vectors instead of signed (as defined in the spec)
  • ARROW-275 - Add tests for UnionVector in Arrow File
  • ARROW-294 - [C++] Do not use platform-dependent fopen/fclose functions for MemoryMappedFile
  • ARROW-322 - [C++] Remove ARROW_HDFS option, always build the module
  • ARROW-323 - [Python] Opt-in to pyarrow.parquet extension rather than attempting and failing silently
  • ARROW-334 - [Python] Remove INSTALL_RPATH_USE_LINK_PATH
  • ARROW-337 - UnionListWriter.list() is doing more than it should, this …
  • ARROW-339 - [Dev] Lingering Python 3 fixes
  • ARROW-339 - Python 3 compatibility in merge_arrow_pr.py
  • ARROW-340 - [C++] Opening a writeable file on disk that already exists does not truncate to zero
  • ARROW-342 - Set Python version on release
  • ARROW-345 - libhdfs integration doesn't work for Mac
  • ARROW-346 - Use conda environment to build API docs
  • ARROW-348 - [Python] Add build-type command line option to setup.py, build CMake extensions in a build type subdirectory
  • ARROW-349 - Add six as a requirement
  • ARROW-351 - Time type has no unit
  • ARROW-354 - Fix comparison of arrays of empty strings
  • ARROW-357 - Use a single RowGroup for Parquet files as default.
  • ARROW-358 - Add explicit environment variable to locate libhdfs in one's environment
  • ARROW-362 - Fix memory leak in zero-copy arrow to NumPy/pandas conversion
  • ARROW-371 - Handle pandas-nullable types correctly
  • ARROW-375 - Fix unicode Python 3 issue in columns argument of parquet.read_table
  • ARROW-384 - Align Java and C++ RecordBatch data and metadata layout
  • ARROW-386 - [Java] Respect case of struct / map field names
  • ARROW-387 - [C++] Verify zero-copy Buffer slices from BufferReader retain reference to parent Buffer
  • ARROW-390 - Only specify dependencies for json-integration-test on ARROW_BUILD_TESTS=ON
  • ARROW-392 - [C++/Java] String IPC integration testing / fixes. Add array / record batch pretty-printing
  • ARROW-393 - [JAVA] JSON file reader fails to set the buffer size on String data vector
  • ARROW-395 - Arrow file format writes record batches in reverse order.
  • ARROW-398 - Java file format requires bitmaps of all 1's to be written…
  • ARROW-399 - ListVector.loadFieldBuffers ignores the ArrowFieldNode len…
  • ARROW-400 - set struct length on load
  • ARROW-401 - Floating point vectors should do an approximate comparison…
  • ARROW-402 - Fix reference counting issue with empty buffers. Close #232
  • ARROW-403 - [Java] Create transfer pairs for internal vectors in UnionVector transfer impl
  • ARROW-404 - [Python] Fix segfault caused by HdfsClient getting closed before an HdfsFile
  • ARROW-405 - Use vendored hdfs.h if not found in include/ in $HADOOP_HOME
  • ARROW-406 - [C++] Set explicit 64K HDFS buffer size, test large reads
  • ARROW-408 - Remove defunct conda recipes
  • ARROW-414 - [Java] "Buffer too large to resize to ..." error
  • ARROW-420 - Align DATE type with Java implementation
  • ARROW-421 - [Python] Retain parent reference in PyBytesReader
  • ARROW-422 - IPC should depend on rapidjson_ep if RapidJSON is vendored
  • ARROW-429 - Revert ARROW-379 until git-archive issues are resolved
  • ARROW-433 - Correctly handle Arrow to Python date conversion for timezones west of London
  • ARROW-434 - [Python] Correctly handle Python file objects in Parquet read/write paths
  • ARROW-435 - Fix spelling of RAPIDJSON_VENDORED
  • ARROW-437 - [C++} Fix clang compiler warning
  • ARROW-445 - arrow_ipc_objlib depends on Flatbuffer generated files
  • ARROW-447 - Always return unicode objects for UTF-8 strings
  • ARROW-455 - [C++] Add dtor to BufferOutputStream that calls Close()
  • ARROW-469 - C++: Add option so that resize doesn't decrease the capacity
  • ARROW-481 - [Python] Fix 2.7 regression in Parquet path to open file code path
  • ARROW-486 - [C++] Use virtual inheritance for diamond inheritance
  • ARROW-487 - Python: ConvertTableToPandas segfaults if ObjectBlock::Write fails
  • ARROW-494 - [C++] Extend lifetime of memory mapped data if any buffers reference it
  • ARROW-499 - Update file serialization to use the streaming serialization format.
  • ARROW-505 - [C++] Fix compiler warning in gcc in release mode
  • ARROW-511 - Python: Implement List conversions for single arrays
  • ARROW-513 - [C++] Fixing Appveyor / MSVC build
  • ARROW-516 - Building pyarrow with parquet
  • ARROW-519 - [C++] Refactor array comparison code into a compare.h / compare.cc in part to resolve Xcode 6.1 linker issue
  • ARROW-523 - Python: Account for changes in PARQUET-834
  • ARROW-533 - [C++] arrow::TimestampArray / TimeArray has a broken constructor
  • ARROW-535 - [Python] Add type mapping for NPY_LONGLONG
  • ARROW-537 - [C++] Do not compare String/Binary data in null slots when comparing arrays
  • ARROW-540 - [C++] Build fixes after ARROW-33, PARQUET-866
  • ARROW-543 - C++: Lazily computed null_counts counts number of non-null entries
  • ARROW-544 - [C++] Test writing zero-length record batches, zero-length BinaryArray fixes
  • ARROW-545 - [Python] Ignore non .parq/.parquet files when reading directories as Parquet datasets
  • ARROW-548 - [Python] Add nthreads to Filesystem.read_parquet and pass through
  • ARROW-551 - C++: Construction of Column with nullptr Array segfaults
  • ARROW-556 - [Integration] Configure C++ integration test executable with a single environment variable. Update README
  • ARROW-561 - [JAVA][PYTHON] Update java & python dependencies to improve downstream packaging experience
  • ARROW-562 - Mockito should be in test scope

New Features and Improvements

  • ARROW-33 - [C++] Implement zero-copy array slicing, integrate with IPC code paths
  • ARROW-81 - [Format] Augment dictionary encoding metadata to accommodate additional use cases
  • ARROW-96 - Add C++ API documentation
  • ARROW-97 - API documentation via sphinx-apidoc
  • ARROW-108 - [C++] Add Union implementation and IPC/JSON serialization tests
  • ARROW-189 - Build 3rd party with ExternalProject.
  • ARROW-191 - Python: Provide infrastructure for manylinux1 wheels
  • ARROW-221 - Add switch for writing Parquet 1.0 compatible logical types
  • ARROW-227 - [C++/Python] Hook arrow_io generic reader / writer interface into arrow_parquet
  • ARROW-228 - [Python] Create an Arrow-cpp-compatible interface for reading bytes from Python file-like objects
  • ARROW-240 - Installation instructions for pyarrow
  • ARROW-243 - [C++] Add option to switch between libhdfs and libhdfs3 when creating HdfsClient
  • ARROW-268 - [C++] Flesh out union implementation to have all required methods for IPC
  • ARROW-303 - [C++] Also build static libraries for leaf libraries
  • ARROW-312 - [Java] IPC file round trip tool for integration testing
  • ARROW-312 - Read and write Arrow IPC file format from Python
  • ARROW-317 - Add Slice, Copy methods to Buffer
  • ARROW-327 - [Python] Remove conda builds from Travis CI setup
  • ARROW-328 - Return shared_ptr<T> by value instead of const-ref
  • ARROW-330 - CMake functions to simplify shared / static library configuration
  • ARROW-332 - Add RecordBatch.to_pandas method
  • ARROW-333 - Make writers update their internal schema even when no data is written
  • ARROW-335 - Improve Type apis and toString() by encapsulating flatbuffers better
  • ARROW-336 - Run Apache Rat in Travis builds
  • ARROW-338 - Implement visitor pattern for IPC loading/unloading
  • ARROW-344 - Instructions for building with conda
  • ARROW-350 - Added Kerberos to HDFS client
  • ARROW-353 - Arrow release 0.2
  • ARROW-355 - Add tests for serialising arrays of empty strings to Parquet
  • ARROW-356 - Add documentation about reading Parquet
  • ARROW-359 - Document ARROW_LIBHDFS_DIR
  • ARROW-360 - C++: Add method to shrink PoolBuffer using realloc
  • ARROW-361 - Python: Support reading a column-selection from Parquet files
  • ARROW-363 - [Java/C++] integration testing harness, initial integration tests
  • ARROW-365 - Python: Provide Array.to_pandas()
  • ARROW-366 - Java Dictionary Vector
  • ARROW-367 - converter json <=> Arrow file format for Integration tests
  • ARROW-368 - Added note for LD_LIBRARY_PATH in Python README
  • ARROW-369 - [Python] Convert multiple record batches at once to Pandas
  • ARROW-370 - Python: Pandas conversion from `datetime.date` columns
  • ARROW-372 - json vector serialization format
  • ARROW-373 - [C++] JSON serialization format for testing
  • ARROW-374 - More precise handling of bytes vs unicode in Python API
  • ARROW-377 - Python: Add support for conversion of Pandas.Categorical
  • ARROW-379 - Use setuptools_scm for Python versioning
  • ARROW-380 - [Java] optimize null count when serializing vectors
  • ARROW-381 - [C++] Simplify primitive array type builders to use a default type singleton
  • ARROW-382 - Extend Python API documentation
  • ARROW-383 - [C++] Integration testing CLI tool
  • ARROW-389 - Python: Write Parquet files to pyarrow.io.NativeFile objects
  • ARROW-394 - [Integration] Generate tests cases for numeric types, strings, lists, structs
  • ARROW-396 - [Python] Add pyarrow.schema.Schema.equals
  • ARROW-409 - [Python] Change record batches conversion to Table
  • ARROW-410 - [C++] Add virtual Writeable::Flush
  • ARROW-411 - [Java] Move compactor functions in Integration to a separate Validator module
  • ARROW-415 - C++: Add Equals implementation to compare Tables
  • ARROW-416 - C++: Add Equals implementation to compare Columns
  • ARROW-417 - Add Equals implementation to compare ChunkedArrays
  • ARROW-418 - [C++] Array / Builder class code reorganization, flattening
  • ARROW-419 - [C++] Promote util/{status.h, buffer.h, memory-pool.h} to top level of arrow/ source directory
  • ARROW-423 - Define BUILD_BYPRODUCTS for CMake 3.2+
  • ARROW-425 - Add private API to get python Table from a C++ object
  • ARROW-426 - Python: Conversion from pyarrow.Array to a Python list
  • ARROW-427 - [C++] Implement dictionary array type
  • ARROW-428 - [Python] Multithreaded conversion from Arrow table to pandas.DataFrame
  • ARROW-430 - Improved version handling
  • ARROW-432 - [Python] Construct precise pandas BlockManager structure for zero-copy DataFrame initialization
  • ARROW-438 - [C++/Python] Implement zero-data-copy record batch and table concatenation.
  • ARROW-440 - [C++] Support pkg-config
  • ARROW-441 - [Python] Expose Arrow's file and memory map classes as NativeFile subclasses
  • ARROW-442 - [Python] Inspect Parquet file metadata from Python
  • ARROW-444 - [Python] Native file reads into pre-allocated memory. Some IO API cleanup / niceness
  • ARROW-449 - Python: Conversion from pyarrow.{Table,RecordBatch} to a Python dict
  • ARROW-450 - Fixes for PARQUET-818
  • ARROW-456 - Add jemalloc based MemoryPool
  • ARROW-457 - Python: Better control over memory pool
  • ARROW-458 - [Python] Expose jemalloc MemoryPool
  • ARROW-461 - [Python] Add Python interfaces to DictionaryArray data, pandas interop
  • ARROW-463 - C++: Support jemalloc 4.x
  • ARROW-466 - Add ExternalProject for jemalloc
  • ARROW-467 - [Python] Run Python parquet-cpp unit tests in Travis CI
  • ARROW-468 - Python: Conversion of nested data in pd.DataFrames
  • ARROW-470 - [Python] Add "FileSystem" abstraction to access directories of files in a uniform way
  • ARROW-471 - [Python] Enable ParquetFile to pass down separately-obtained file metadata
  • ARROW-472 - [Python] Expose more C++ IO interfaces. Add equals methods to Parquet schemas. Pass Parquet metadata separately in reader
  • ARROW-474 - [Java] Add initial version of streaming serialized format.
  • ARROW-475 - [Python] Add support for reading multiple Parquet files as a single pyarrow.Table
  • ARROW-476 - Add binary integration test fixture, add Java support
  • ARROW-477 - [Java] Add support for second/microsecond/nanosecond timestamps in-memory and in IPC/JSON layer
  • ARROW-478 - Consolidate BytesReader and BufferReader to accept PyBytes or Buffer
  • ARROW-479 - Python: Test for expected schema in Pandas conversion
  • ARROW-484 - Revise README to include more detail about software components
  • ARROW-485 - [Java] Users are required to initialize VariableLengthVectors.offsetVector before calling VariableLengthVectors.mutator.getSafe
  • ARROW-490 - Python: Update manylinux1 build scripts
  • ARROW-495 - [C++] Implement streaming binary format, refactoring
  • ARROW-497 - Integration harness for streaming file format
  • ARROW-498 - [C++] Add command line utilities that convert between stream and file.
  • ARROW-503 - [Python] Implement Python interface to streaming file format
  • ARROW-506 - Java: Implement echo server for integration testing.
  • ARROW-508 - [C++] Add basic threadsafety to normal files and memory maps
  • ARROW-509 - [Python] Add support for multithreaded Parquet reads
  • ARROW-512 - C++: Add method to check for primitive types
  • ARROW-514 - [Python] Automatically wrap pyarrow.io.Buffer in BufferReader
  • ARROW-515 - [Python] Add read_all methods to FileReader, StreamReader
  • ARROW-521 - [C++] Track peak allocations in default memory pool
  • ARROW-524 - provide apis to access nested vectors and buffers
  • ARROW-525 - Python: Add more documentation to the package
  • ARROW-527 - Remove drill-module.conf file
  • ARROW-529 - Python: Add jemalloc and Python 3.6 to manylinux1 build
  • ARROW-531 - Python: Document jemalloc, extend Pandas section, add Getting Involved
  • ARROW-538 - [C++] Set up AddressSanitizer (ASAN) builds
  • ARROW-546 - Python: Account for changes in PARQUET-867
  • ARROW-547 - [Python] Add zero-copy slice methods to Array, RecordBatch
  • ARROW-553 - C++: Faster valid bitmap building
  • ARROW-558 - Add KEYS files

Apache Arrow 0.1.0 (2016-10-10)

New Features and Improvements

  • ARROW-1 - Initial Arrow Code Commit
  • ARROW-2 - Post Simple Website
  • ARROW-3 - This patch includes a WIP draft specification document for the physical Arrow memory layout produced over a series of discussions amongst the to-be Arrow committers during late 2015. There are also a few small PNG diagrams that illustrate some of the Arrow layout concepts.
  • ARROW-4 - This provides an partial C++11 implementation of the Apache Arrow data structures along with a cmake-based build system. The codebase generally follows Google C++ style guide, but more cleaning to be more conforming is needed. It uses googletest for unit testing.
  • ARROW-7 - Add barebones Python library build toolchain
  • ARROW-8 - Add .travis.yml and test script for Arrow C++. OS X build fixes
  • ARROW-9 - Rename some unchanged "Drill" to "Arrow" (follow-up)
  • ARROW-9 - Replace straggler references to Drill
  • ARROW-10 - Fix mismatch of javadoc names and method parameters
  • ARROW-11 - Mirror JIRA activity to dev@arrow.apache.org
  • ARROW-13 - Add PR merge tool from parquet-mr, suitably modified
  • ARROW-14 - Add JIRA components
  • ARROW-15 - Fix a naming typo for memory.AllocationManager.AllocationOutcome
  • ARROW-19 - Add an externalized MemoryPool interface for use in builder classes
  • ARROW-20 - Add null_count_ member to array containers, remove nullable_ member
  • ARROW-21 - Implement a simple in-memory Schema data structure
  • ARROW-22 - [C++] Convert flat Parquet schemas to Arrow schemas
  • ARROW-23 - Add a logical Column data structure
  • ARROW-24 - C++: Implement a logical Table container type
  • ARROW-26 - Add instructions for enabling Arrow C++ Parquet adapter build
  • ARROW-28 - Adding google's benchmark library to the toolchain
  • ARROW-30 - [Python] Routines for converting between arrow::Array/Table and pandas.DataFrame
  • ARROW-31 - Python: prototype user object model, add PyList conversion path with type inference
  • ARROW-35 - Add a short call-to-action in the top level README.md
  • ARROW-37 - [C++ / Python] Implement BooleanArray and BooleanBuilder. Handle Python built-in bool
  • ARROW-42 - Add Python tests to Travis CI build
  • ARROW-43 - Python: format array values to in repr for interactive computing
  • ARROW-44 - Python: prototype object model for array slot values ("scalars")
  • ARROW-48 - Python: Add Schema object wrapper
  • ARROW-49 - [Python] Add Column and Table wrapper interface
  • ARROW-50 - C++: Enable library builds for 3rd-party users without having to build thirdparty googletest
  • ARROW-53 - Python: Fix RPATH and add source installation instructions
  • ARROW-54 - [Python] Rename package to "pyarrow"
  • ARROW-56 - Format: Specify LSB bit ordering in bit arrays
  • ARROW-57 - Format: Draft data headers IDL for data interchange
  • ARROW-58 - Format: Draft type metadata ("schemas") IDL
  • ARROW-59 - Python: Boolean data support for builtin data structures
  • ARROW-60 - [C++] Struct type builder API
  • ARROW-64 - Add zsh support to C++ build scripts
  • ARROW-66 - Maybe some missing steps in installation guide
  • ARROW-67 - C++ metadata flatbuffer serialization and data movement to memory maps
  • ARROW-68 - Better error handling for not fully setup systems
  • ARROW-70 - Add adapt 'lite' DCHECK macros from Kudu as also used in Parquet
  • ARROW-71 - [C++] Add clang-tidy and clang-format to the tool chain.
  • ARROW-73 - Support older CMake versions
  • ARROW-76 - Revise format document to include null count, defer non-nullable arrays to the domain of metadata
  • ARROW-78 - C++: Add constructor for DecimalType
  • ARROW-79 - [Python] Add benchmarks
  • ARROW-82 - Initial IPC support for ListArray
  • ARROW-85 - memcmp can be avoided in Equal when comparing with the same …
  • ARROW-86 - [Python] Implement zero-copy Arrow-to-Pandas conversion
  • ARROW-87 - [C++] Add all four possible ways to encode Decimals in Parquet to schema conversion
  • ARROW-89 - [Python] Add benchmarks for Arrow<->Pandas conversion
  • ARROW-90 - [C++] Check for SIMD instruction set support
  • ARROW-91 - Basic Parquet read support
  • ARROW-92 - Arrow to Parquet Schema conversion
  • ARROW-100 - [C++] Computing RowBatch size
  • ARROW-101 - Fix java compiler warnings
  • ARROW-102 - travis-ci support for java project
  • ARROW-106 - [C++] Add IPC to binary/string types
  • ARROW-107 - [C++] Implement IPC for structs
  • ARROW-190 - Python: Provide installable sdist builds
  • ARROW-196 - [C++] Add conda dev recipe for libarrow and libarrow_parquet
  • ARROW-197 - Working first draft of a conda recipe for pyarrow
  • ARROW-199 - [C++] Refine third party dependency
  • ARROW-201 - [C++] Initial ParquetWriter implementation
  • ARROW-203 - Python: Basic filename based Parquet read/write
  • ARROW-204 - Add Travis CI builds that post conda artifacts for Linux and OS X
  • ARROW-206 - Expose a C++ api to compare ranges of slots between two arrays
  • ARROW-207 - Extend BufferAllocator interface to allow decorators around BufferAllocator
  • ARROW-212 - Change contract of PrimitiveArray to reflect its abstractness
  • ARROW-213 - Exposing static arrow build
  • ARROW-214 - C++: Add String support to Parquet I/O
  • ARROW-215 - Support other integer types and strings in Parquet I/O
  • ARROW-218 - Add optional API token authentication option to PR merge tool
  • ARROW-222 - Prototyping an IO interface for Arrow, with initial HDFS target
  • ARROW-233 - Add visibility macros, add static build option
  • ARROW-234 - Build libhdfs IO extension in conda artifacts
  • ARROW-236 - Bridging IO interfaces under the hood in pyarrow
  • ARROW-237 - Implement parquet-cpp's abstract IO interfaces for memory allocation and file reading
  • ARROW-238 - Change InternalMemoryPool::Free() to return Status::Invalid when ther…
  • ARROW-242 - Support Timestamp Data Type
  • ARROW-245 - add endianness to RecordBatch
  • ARROW-251 - Expose APIs for getting code and message of the status
  • ARROW-252 - Add implementation guidelines to the documentation
  • ARROW-253 - restrict ints to 8, 16, 32, or 64 bits in V1
  • ARROW-254 - remove Bit type as it is redundant with Boolean
  • ARROW-255 - Finalize Dictionary representation
  • ARROW-256 - [Format] Add a version number to the IPC/RPC metadata
  • ARROW-257 - Add a typeids Vector to Union type
  • ARROW-262 - Start metadata specification document
  • ARROW-264 - File format
  • ARROW-267 - [C++] Implement file format layout for IPC/RPC
  • ARROW-270 - Define more generic Interval logical type
  • ARROW-271 - Update Field structure to be more explicit
  • ARROW-272 - Arrow release 0.1
  • ARROW-279 - rename vector module to arrow-vector
  • ARROW-280 - [C++] Refactor IPC / memory map IO to use common arrow_io interfaces. Create arrow_ipc leaf library
  • ARROW-282 - Make parquet-cpp an optional dependency of pyarrow
  • ARROW-285 - Optional flatc download
  • ARROW-286 - Build thirdparty dependencies in parallel
  • ARROW-289 - Install test-util.h
  • ARROW-290 - Specialize alloc() in ArrowBuf
  • ARROW-291 - [Python] Update NOTICE file for Python codebase
  • ARROW-292 - [Java] Upgrade Netty to 4.0.41
  • ARROW-293 - [C++] Implement Arrow IO interfaces for operating system files
  • ARROW-296 - [Python / C++] Remove arrow::parquet, make pyarrow link against parquet_arrow
  • ARROW-298 - create release scripts
  • ARROW-299 - Use absolute namespace in macros
  • ARROW-301 - Add user field metadata to IPC schemas
  • ARROW-302 - [C++/Python] Implement C++ IO interfaces for interacting with Python file and bytes objects
  • ARROW-305 - Add compression and use_dictionary options to Parquet
  • ARROW-306 - Add option to pass cmake arguments via environment variable
  • ARROW-315 - finalize timestamp
  • ARROW-318 - Revise python/README.md given recent changes in codebase
  • ARROW-319 - Add canonical Arrow Schema json representation
  • ARROW-324 - Update arrow metadata diagram
  • ARROW-325 - make TestArrowFile not dependent on timezone

Bug Fixes

  • ARROW-5 - Correct Apache Maven repo for maven plugin use
  • ARROW-5 - Update drill-fmpp-maven-plugin to 1.5.0
  • ARROW-16 - Building cpp issues on XCode 7.2.1
  • ARROW-17 - set some vector fields to package level access for Drill compatibility
  • ARROW-18 - Fix decimal precision and scale in MapWriters
  • ARROW-36 - Remove fixVersions from JIRA resolve code path
  • ARROW-46 - ListVector should initialize bits in allocateNew
  • ARROW-51 - Add simple ValueVector tests
  • ARROW-55 - [Python] Fix unit tests in 2.7
  • ARROW-62 - Clarify null bitmap interpretation, indicate bit-endianness, add null count, remove non-nullable physical distinction
  • ARROW-63 - [C++] Enable ctest to work on systems with Python 3 as the default Python
  • ARROW-65 - Be less restrictive on PYTHON_LIBRARY search paths
  • ARROW-69 - Change permissions for assignable users
  • ARROW-72 - Search for alternative parquet-cpp header
  • ARROW-75 - Fix handling of empty strings
  • ARROW-77 - [C++] Conform bitmap interpretation to ARROW-62; 1 for nulls, 0 for non-nulls
  • ARROW-80 - Handle len call for pre-init arrays
  • ARROW-83 - [C++] Add basic test infrastructure for DecimalType
  • ARROW-84 - C++: separate test codes
  • ARROW-88 - [C++] Refactor usages of parquet_cpp namespace
  • ARROW-93 - Fix builds when using XCode 7.3
  • ARROW-94 - [Format] Expand list example to clarify null vs empty list
  • ARROW-103 - Add files to gitignore
  • ARROW-104 - [FORMAT] Add alignment and padding requirements + union clarification
  • ARROW-105 - Unit tests fail if assertions are disabled
  • ARROW-113 - TestValueVector test fails if cannot allocate 2GB of memory
  • ARROW-185 - Make padding and alignment for all buffers be 64 bytes
  • ARROW-188 - Add numpy as install requirement
  • ARROW-193 - typos "int his" fix to "in this"
  • ARROW-194 - C++: Allow read-only memory mapped source
  • ARROW-200 - [C++/Python] Return error status on string initialization failure
  • ARROW-205 - builds failing on master branch with apt-get error
  • ARROW-209 - [C++] Triage builds due to unavailable LLVM apt repo
  • ARROW-210 - Cleanup of the string related types in C++ code base
  • ARROW-211 - [Format] Fixed typos in layout examples
  • ARROW-217 - Fix Travis w.r.t conda 4.1.0 changes
  • ARROW-219 - Preserve CMAKE_CXX_FLAGS, fix compiler warnings
  • ARROW-223 - Do not link against libpython
  • ARROW-225 - [C++/Python] master Travis CI build is broken
  • ARROW-244 - Some global APIs of IPC module should be visible to the outside
  • ARROW-246 - [Java] UnionVector doesn't call allocateNew() when creating it's vectorType
  • ARROW-247 - Missing explicit destructor in RowBatchReader causes an incomplete type error
  • ARROW-250 - Fix for ARROW-246 may cause memory leaks
  • ARROW-259 - Use Flatbuffer Field type instead of MaterializedField
  • ARROW-260 - Fix flaky oversized tests
  • ARROW-265 - Fix few decimal bugs
  • ARROW-265 - Pad negative decimal values with1
  • ARROW-266 - [C++] Fix broken build due to Flatbuffers namespace change
  • ARROW-274 - Add NullableMapVector to support nullable maps
  • ARROW-277 - Flatbuf serialization fails for Timestamp type
  • ARROW-278 - [Format] Rename Tuple to Struct_ in flatbuffers IDL
  • ARROW-283 - [C++] Account for upstream changes in parquet-cpp
  • ARROW-284 - Disable arrow_parquet module in Travis CI to triage builds
  • ARROW-287 - Make nullable vectors use a BitVecor instead of UInt1Vector for bits
  • ARROW-297 - Fix Arrow pom for release
  • ARROW-304 - NullableMapReaderImpl.isSet() always returns true
  • ARROW-308 - UnionListWriter.setPosition() should not call startList()
  • ARROW-309 - Types.getMinorTypeForArrowType() does not work for Union type
  • ARROW-313 - Build on any version of XCode
  • ARROW-314 - JSONScalar is unnecessary and unused
  • ARROW-320 - ComplexCopier.copy(FieldReader, FieldWriter) should not st…
  • ARROW-321 - fix arrow licenses
  • ARROW-855 - Arrow Memory Leak