localapi Package

localapi Package

Modules dealing with the local index-space view of DistArrays.

In other words, the view from an engine.

construct Module

distarray.localapi.construct.init_base_comm(comm)

Sanitize an MPI.comm instance or create one.

distarray.localapi.construct.init_comm(base_comm, grid_shape)

Create an MPI communicator with a cartesian topology.

error Module

exception distarray.localapi.error.IncompatibleArrayError

Bases: distarray.error.DistArrayError

Exception class when arrays are incompatible.

exception distarray.localapi.error.InvalidBaseCommError

Bases: distarray.error.DistArrayError

Exception class when an object expected to be an MPI.Comm object is not one.

exception distarray.localapi.error.InvalidDimensionError

Bases: distarray.error.DistArrayError

Exception class when a specified dimension is invalid.

exception distarray.localapi.error.NullCommError

Bases: distarray.error.DistArrayError

Exception class when an MPI communicator is NULL.

format Module

Define a simple format for saving LocalArrays to disk with full information about them. This format, .dnpy, draws heavily from the .npy format specification from NumPy and from the data structure defined in the Distributed Array Protocol.

Version numbering

The version numbering of this format is independent of DistArray’s and the Distributed Array Protocol’s version numberings.

Format Version 1.0

The first 6 bytes are a magic string: exactly \x93DARRY.

The next 1 byte is an unsigned byte: the major version number of the file format, e.g. \x01.

The next 1 byte is an unsigned byte: the minor version number of the file format, e.g. \x00. Note: the version of the file format is not tied to the version of the DistArray package.

The next 2 bytes form a little-endian unsigned short int: the length of the header data HEADER_LEN.

The next HEADER_LEN bytes form the header data describing the distribution of this chunk of the LocalArray. It is an ASCII string which contains a Python literal expression of a dictionary. It is terminated by a newline (\n) and padded with spaces (\x20) to make the total length of magic string + 4 + HEADER_LEN be evenly divisible by 16 for alignment purposes.

The dictionary contains two keys, both described in the Distributed Array Protocol:

“__version__” : str
Version of the Distributed Array Protocol used in this header.
“dim_data” : tuple of dict
One dictionary per array dimension; see the Distributed Array Protocol for the details of this data structure.

For repeatability and readability, the dictionary keys are sorted in alphabetic order. This is for convenience only. A writer SHOULD implement this if possible. A reader MUST NOT depend on this.

Following this header is the output of numpy.save for the underlying data buffer. This contains the full output of save, beginning with the magic number for .npy files, followed by the .npy header and array data.

The .npy format, including reasons for creating it and a comparison of alternatives, is described fully in the “npy-format” NEP and in the module docstring for numpy.lib.format.

distarray.localapi.format.magic(major, minor, prefix=<MagicMock name='mock.asbytes()' id='140555425385680'>)

Return the magic string for the given file format version.

Parameters:
  • major (int in [0, 255]) –
  • minor (int in [0, 255]) –
Returns:

magic

Return type:

str

Raises:

ValueError – if the version cannot be formatted.

distarray.localapi.format.read_array_header_1_0(fp)

Read an array header from a filelike object using the 1.0 file format version.

This will leave the file object located just after the header.

Parameters:fp (filelike object) – A file object or something with a .read() method like a file.
Returns:
  • __version__ (str) – Version of the Distributed Array Protocol used.
  • dim_data (tuple) – A tuple containing a dictionary for each dimension of the underlying array, as described in the Distributed Array Protocol.
Raises:ValueError – If the data is invalid.
distarray.localapi.format.read_localarray(fp)

Read a LocalArray from an .dnpy file.

Parameters:fp (file_like object) – If this is not a real file object, then this may take extra memory and time.
Returns:distbuffer – The Distributed Array Protocol structure created from the data on disk.
Return type:dict
Raises:ValueError – If the data is invalid.
distarray.localapi.format.read_magic(fp)

Read the magic string to get the version of the file format.

Parameters:fp (filelike object) –
Returns:
  • major (int)
  • minor (int)
distarray.localapi.format.write_localarray(fp, arr, version=(1, 0))

Write a LocalArray to a .dnpy file, including a header.

The __version__ and dim_data keys from the Distributed Array Protocol are written to a header, then numpy.save is used to write the value of the buffer key.

Parameters:
  • fp (file_like object) – An open, writable file object, or similar object with a .write() method.
  • arr (LocalArray) – The array to write to disk.
  • version ((int, int), optional) – The version number of the file format. Default: (1, 0)
Raises:
  • ValueError – If the array cannot be persisted.
  • Various other errors – If the underlying numpy array contains Python objects as part of its dtype, the process of pickling them may raise various errors if the objects are not picklable.

localarray Module

The LocalArray data structure.

DistArray objects are proxies for collections of LocalArray objects (that usually reside on engines).

class distarray.localapi.localarray.GlobalIndex(distribution, ndarray)

Bases: object

Object which provides access to global indexing on LocalArrays.

checked_getitem(global_inds)
checked_setitem(global_inds, value)
get_slice(global_inds, new_distribution)
class distarray.localapi.localarray.GlobalIterator(arr)

Bases: distarray.externals.six.Iterator

class distarray.localapi.localarray.LocalArray(distribution, dtype=None, buf=None)

Bases: object

Distributed memory Python arrays.

__array_wrap__(obj, context=None)

Return a LocalArray based on obj.

This method constructs a new LocalArray object using the distribution from self and the buffer from obj.

This is used to construct return arrays for ufuncs.

__distarray__()

Returns the data structure required by the DAP.

DAP = Distributed Array Protocol

See the project’s documentation for the Protocol’s specification.

__getitem__(index)

Get a local item.

__setitem__(index, value)

Set a local item.

asdist_like(other)

Return a version of self that has shape, dist and grid_shape like other.

astype(newdtype)

Return a copy of this LocalArray with a new underlying dtype.

cart_coords
comm
comm_rank
comm_size
compatibility_hash()
coords_from_rank(rank)
copy()

Return a copy of this LocalArray.

dim_data
dist
dtype
fill(scalar)
classmethod from_distarray(comm, obj)

Make a LocalArray from Distributed Array Protocol data structure.

An object that supports the Distributed Array Protocol will have a __distarray__ method that returns the data structure described here:

https://github.com/enthought/distributed-array-protocol

Parameters:obj (an object with a __distarray__ method or a dict) – If a dict, it must conform to the structure defined by the distributed array protocol.
Returns:A LocalArray encapsulating the buffer of the original data. No copy is made.
Return type:LocalArray
global_from_local(local_ind)
global_limits(dim)
global_shape
global_size
grid_shape
itemsize
local_data
local_from_global(global_ind)
local_shape
local_size
local_view(dtype=None)
nbytes
ndarray
ndim
pack_index(inds)
rank_from_coords(coords)
sync()
unpack_index(packed_ind)
view(distribution, dtype)

Return a new LocalArray whose underlying ndarray is a view on self.ndarray.

class distarray.localapi.localarray.LocalArrayBinaryOperation(numpy_ufunc)

Bases: object

class distarray.localapi.localarray.LocalArrayUnaryOperation(numpy_ufunc)

Bases: object

distarray.localapi.localarray.arecompatible(a, b)

Do these arrays have the same compatibility hash?

distarray.localapi.localarray.compact_indices(dim_data)

Given a dim_data structure, return a tuple of compact indices.

For every dimension in dim_data, return a representation of the indices indicated by that dim_dict; return a slice if possible, else, return the list of global indices.

Parameters:dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol we use only the indexing related keys from this structure here.
Returns:index – Efficient structure usable for indexing into a numpy-array-like data structure.
Return type:tuple of slices and/or lists of int
distarray.localapi.localarray.empty(distribution, dtype=<type 'float'>)

Create an empty LocalArray.

distarray.localapi.localarray.empty_like(arr, dtype=None)

Create an empty LocalArray with a distribution like arr.

distarray.localapi.localarray.fromfunction(function, distribution, **kwargs)
distarray.localapi.localarray.fromndarray_like(ndarray, like_arr)

Create a new LocalArray like like_arr with buffer set to ndarray.

distarray.localapi.localarray.get_printoptions()
distarray.localapi.localarray.load_dnpy(comm, file)

Load a LocalArray from a .dnpy file.

Parameters:file (file-like object or str) – The file to read. It must support seek() and read() methods.
Returns:result – A LocalArray encapsulating the data loaded.
Return type:LocalArray
distarray.localapi.localarray.load_hdf5(comm, filename, dim_data, key='buffer')

Load a LocalArray from an .hdf5 file.

Parameters:
  • filename (str) – The filename to read.
  • dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol, describing which portions of the HDF5 file to load into this LocalArray, and with what metadata.
  • comm (MPI comm object) –
  • key (str, optional) – The identifier for the group to load the LocalArray from (the default is ‘buffer’).
Returns:

result – A LocalArray encapsulating the data loaded.

Return type:

LocalArray

Note

For dim_data dimension dictionaries containing unstructured (‘u’) distribution types, the indices selected by the ‘indices’ key must be in increasing order. This is a limitation of h5py / hdf5.

distarray.localapi.localarray.load_npy(comm, filename, dim_data)

Load a LocalArray from a .npy file.

Parameters:
  • filename (str) – The file to read.
  • dim_data (tuple of dict) – A dict for each dimension, with the data described here: https://github.com/enthought/distributed-array-protocol, describing which portions of the HDF5 file to load into this LocalArray, and with what metadata.
  • comm (MPI comm object) –
Returns:

result – A LocalArray encapsulating the data loaded.

Return type:

LocalArray

distarray.localapi.localarray.local_reduction(out_comm, reducer, larr, ddpr, dtype, axes)

Entry point for reductions on local arrays.

Parameters:
  • reducer (callable) – Performs the core reduction operation.
  • out_comm (MPI Comm instance.) – The MPI communicator for the result of the reduction. Is equal to MPI.COMM_NULL when this rank is not part of the output communicator.
  • larr (LocalArray) – Input. Defined for all ranks.
  • ddpr (sequence of dim-data dictionaries.) –
  • axes (Sequence of ints or None.) –
Returns:

When out_comm == MPI.COMM_NULL, returns None. Otherwise, returns the LocalArray section of the reduction result.

Return type:

LocalArray or None

distarray.localapi.localarray.max_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for max.

distarray.localapi.localarray.mean_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for mean.

distarray.localapi.localarray.min_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for min.

distarray.localapi.localarray.ndenumerate(arr)
distarray.localapi.localarray.ones(distribution, dtype=<type 'float'>)

Create a LocalArray filled with ones.

distarray.localapi.localarray.save_dnpy(file, arr)

Save a LocalArray to a .dnpy file.

Parameters:
  • file (file-like object or str) – The file or filename to which the data is to be saved.
  • arr (LocalArray) – Array to save to a file.
distarray.localapi.localarray.save_hdf5(filename, arr, key='buffer', mode='a')

Save a LocalArray to a dataset in an .hdf5 file.

Parameters:
  • filename (str) – Name of file to write to.
  • arr (LocalArray) – Array to save to a file.
  • key (str, optional) – The identifier for the group to save the LocalArray to (the default is ‘buffer’).
  • mode (optional, {‘w’, ‘w-‘, ‘a’}, default ‘a’) –
    'w'
    Create file, truncate if exists
    'w-'
    Create file, fail if exists
    'a'
    Read/write if exists, create otherwise (default)
distarray.localapi.localarray.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None)
distarray.localapi.localarray.std_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for std.

distarray.localapi.localarray.sum_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for sum.

distarray.localapi.localarray.var_reducer(reduce_comm, larr, out, axes, dtype)

Core reduction function for var.

distarray.localapi.localarray.zeros(distribution, dtype=<type 'float'>)

Create a LocalArray filled with zeros.

distarray.localapi.localarray.zeros_like(arr, dtype=<type 'float'>)

Create a LocalArray of zeros with a distribution like arr.

maps Module

Classes to manage the distribution-specific aspects of a LocalArray.

The Distribution class is the main entry point and is meant to be used by LocalArrays to help translate between local and global index spaces. It manages ndim one-dimensional map objects.

The one-dimensional map classes BlockMap, CyclicMap, BlockCyclicMap, and UnstructuredMap all manage the mapping tasks for their particular dimension. All are subclasses of MapBase. The reason for the several subclasses is to allow more compact and efficient operations.

class distarray.localapi.maps.BlockCyclicMap(global_size, grid_size, grid_rank, start, block_size)

Bases: distarray.localapi.maps.MapBase

One-dimensional block cyclic map class.

dim_dict
dist = 'c'
global_from_local_index(lidx)
global_iter
global_slice

Return a slice representing the global index space of this dimension; only possible for block_size == 1.

local_from_global_index(gidx)
size
class distarray.localapi.maps.BlockMap(global_size, grid_size, grid_rank, start, stop)

Bases: distarray.localapi.maps.MapBase

One-dimensional block map class.

dim_dict
dist = 'b'
global_from_local_index(lidx)
global_from_local_slice(lidx)
global_iter
global_slice

Return a slice representing the global index space of this dimension.

local_from_global_index(gidx)
local_from_global_slice(gidx)
size
class distarray.localapi.maps.CyclicMap(global_size, grid_size, grid_rank, start)

Bases: distarray.localapi.maps.MapBase

One-dimensional cyclic map class.

dim_dict
dist = 'c'
global_from_local_index(lidx)
global_iter
global_slice

Return a slice representing the global index space of this dimension.

local_from_global_index(gidx)
size
class distarray.localapi.maps.Distribution(comm, dim_data)

Bases: object

Multi-dimensional Map class.

Manages one or more one-dimensional map classes.

cart_coords
comm_rank
comm_size
coords_from_rank(rank)
dim_data
dist
classmethod from_shape(comm, shape, dist=None, grid_shape=None)

Create a Distribution from a shape and optional arguments.

global_from_local(local_ind)

Given local_ind indices, translate into global indices.

global_shape
global_size
global_slice

Return a slice representing the global index space of this dimension.

grid_shape
local_from_global(global_ind)

Given global_ind indices, translate into local indices.

local_shape
local_size
ndim
rank_from_coords(coords)
class distarray.localapi.maps.MapBase

Bases: object

Base class for all one dimensional Map classes.

class distarray.localapi.maps.UnstructuredMap(global_size, grid_size, grid_rank, indices)

Bases: distarray.localapi.maps.MapBase

One-dimensional unstructured map class.

dim_dict
dist = 'u'
global_from_local_index(lidx)
global_iter
local_from_global_index(gidx)
size
distarray.localapi.maps.map_from_dim_dict(dd)

Factory function that returns a 1D map for a given dimension dictionary.

mpiutils Module

Entry point for MPI.

distarray.localapi.mpiutils.create_comm_of_size(size=4)

Create a subcommunicator of COMM_PRIVATE of given size.

distarray.localapi.mpiutils.create_comm_with_list(nodes, base_comm=None)

Create a subcommunicator of base_comm with a list of ranks.

If base_comm is not specified, defaults to COMM_PRIVATE.

distarray.localapi.mpiutils.get_comm_private()
distarray.localapi.mpiutils.mpi_type_for_ndarray(a)

proxyize Module

class distarray.localapi.proxyize.Proxy(name, obj, module_name)

Bases: object

cleanup()
dereference()

Callable only on the engines.

class distarray.localapi.proxyize.Proxyize

Bases: object

next_name()
set_state(state)
str_counter()

random Module

Pseudo-random number generation routines for local arrays.

This module provides a number of routines for generating random numbers, from a variety of probability distributions.

distarray.localapi.random.beta(a, b, distribution=None)

Return an array with random numbers from the beta probability distribution.

Parameters:
  • a (float) – Parameter that describes the beta probability distribution.
  • b (float) – Parameter that describes the beta probability distribution.
  • distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns:

Return type:

An array with random numbers.

distarray.localapi.random.label_state(comm)

Label/personalize the random generator state for the local rank.

This ensures that each separate engine, when using the same global seed, will generate a different sequence of pseudo-random numbers.

distarray.localapi.random.normal(loc=0.0, scale=1.0, distribution=None)

Return an array with random numbers from a normal (Gaussian) probability distribution.

Parameters:
  • loc (float) – The mean (or center) of the probability distribution.
  • scale (float) – The standard deviation (or width) of the probability distribution.
  • distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns:

Return type:

An array with random numbers.

distarray.localapi.random.rand(distribution=None)

Return an array with random numbers distributed over the interval [0, 1).

Parameters:distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns:
Return type:An array with random numbers.
distarray.localapi.random.randint(low, high=None, distribution=None)

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Parameters:
  • low (int) – Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is the highest such integer).
  • high (int, optional) – If provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).
  • distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns:

Return type:

An array with random numbers.

distarray.localapi.random.randn(distribution=None)

Return a sample (or samples) from the “standard normal” distribution.

Parameters:distribution (The desired distribution of the array.) – If None, then a normal NumPy array is returned. Otherwise, a LocalArray with this distribution is returned.
Returns:
Return type:An array with random numbers.