dist Package

dist Package

cleanup Module

distarray.dist.cleanup.cleanup(view, module_name, prefix)

Delete Context object with the given name from the given module

distarray.dist.cleanup.cleanup_all(module_name, prefix)

Connects to all engines and runs cleanup() on them.

distarray.dist.cleanup.clear(view)

Removes all distarray-related modules from engines’ sys.modules.

distarray.dist.cleanup.clear_all()
distarray.dist.cleanup.get_local_keys(view, prefix)

Returns a dictionary of keyname -> target_list mapping for all names that start with prefix on engines in view.

context Module

Context objects contain the information required for DistArrays to communicate with LocalArrays.

class distarray.dist.context.Context(client=None, targets=None)

Bases: object

Context objects manage the setup and communication of the worker processes for DistArray objects. A DistArray object has a context, and contexts have an MPI intracommunicator that they use to communicate with worker processes.

Typically there is just one context object that uses all processes, although it is possible to have more than one context with a different selection of engines.

apply(func, args=None, kwargs=None, targets=None)

Analogous to IPython.parallel.view.apply_sync

Parameters:
  • func (function) –
  • args (tuple) – positional arguments to func
  • kwargs (dict) – key word arguments to func
  • targets (sequence of integers) – engines func is to be run on.
Return type:

return a list of the results on the each engine.

cleanup()

Delete keys that this context created from all the engines.

close()
delete_key(key, targets=None)

Delete the specific key from all the engines.

empty(distribution, dtype=<type 'float'>)

Create an empty Distarray.

Parameters:distribution (Distribution object) –
Returns:A DistArray distributed as specified, with uninitialized values.
Return type:DistArray
fromarray(arr, distribution=None)

Create a DistArray from an ndarray.

Parameters:distribution (Distribution object, optional) – If a Distribution object is not provided, one is created with Distribution.from_shape(arr.shape).
Returns:A DistArray distributed as specified, using the values and dtype from arr.
Return type:DistArray
fromfunction(function, shape, **kwargs)

Create a DistArray from a function over global indices.

Unlike numpy’s fromfunction, the result of distarray’s fromfunction is restricted to the same Distribution as the index array generated from shape.

See numpy.fromfunction for more details.

fromndarray(arr, distribution=None)

Create a DistArray from an ndarray.

Parameters:distribution (Distribution object, optional) – If a Distribution object is not provided, one is created with Distribution.from_shape(arr.shape).
Returns:A DistArray distributed as specified, using the values and dtype from arr.
Return type:DistArray
load_dnpy(name)

Load a distributed array from .dnpy files.

The .dnpy file format is a binary format inspired by NumPy’s .npy format. The header of a particular .dnpy file contains information about which portion of a DistArray is saved in it (using the metadata outlined in the Distributed Array Protocol), and the data portion contains the output of NumPy’s save function for the local array data. See the module docstring for distarray.local.format for full details.

Parameters:name (str or list of str) – If a str, this is used as the prefix for the filename used by each engine. Each engine will load a file named <name>_<rank>.dnpy. If a list of str, each engine will use the name at the index corresponding to its rank. An exception is raised if the length of this list is not the same as the context’s communicator’s size.
Returns:result – A DistArray encapsulating the file loaded on each engine.
Return type:DistArray
Raises:TypeError – If name is an iterable whose length is different from the context’s communicator’s size.

See also

save_dnpy()
Saving files to load with with load_dnpy.
load_hdf5(filename, distribution, key='buffer')

Load a DistArray from a dataset in an .hdf5 file.

Parameters:
  • filename (str) – Filename to load.
  • distribution (Distribution object) –
  • key (str, optional) – The identifier for the group to load the DistArray from (the default is ‘buffer’).
Returns:

result – A DistArray encapsulating the file loaded.

Return type:

DistArray

load_npy(filename, distribution)

Load a DistArray from a dataset in a .npy file.

Parameters:filename (str) – Filename to load.
Returns:result – A DistArray encapsulating the file loaded.
Return type:DistArray
ones(distribution, dtype=<type 'float'>)

Create a Distarray filled with ones.

Parameters:distribution (Distribution object) –
Returns:A DistArray distributed as specified, filled with ones.
Return type:DistArray
save_dnpy(name, da)

Save a distributed array to files in the .dnpy format.

The .dnpy file format is a binary format inspired by NumPy’s .npy format. The header of a particular .dnpy file contains information about which portion of a DistArray is saved in it (using the metadata outlined in the Distributed Array Protocol), and the data portion contains the output of NumPy’s save function for the local array data. See the module docstring for distarray.local.format for full details.

Parameters:
  • name (str or list of str) – If a str, this is used as the prefix for the filename used by each engine. Each engine will save a file named <name>_<rank>.dnpy. If a list of str, each engine will use the name at the index corresponding to its rank. An exception is raised if the length of this list is not the same as the context’s communicator’s size.
  • da (DistArray) – Array to save to files.
Raises:

TypeError – If name is an sequence whose length is different from the context’s communicator’s size.

See also

load_dnpy()
Loading files saved with save_dnpy.
save_hdf5(filename, da, key='buffer', mode='a')

Save a DistArray to a dataset in an .hdf5 file.

Parameters:
  • filename (str) – Name of file to write to.
  • da (DistArray) – Array to save to a file.
  • key – The identifier for the group to save the DistArray to (the default is ‘buffer’).
zeros(distribution, dtype=<type 'float'>)

Create a Distarray filled with zeros.

Parameters:distribution (Distribution object) –
Returns:A DistArray distributed as specified, filled with zeros.
Return type:DistArray

decorators Module

Decorators for defining functions that use DistArrays.

class distarray.dist.decorators.DecoratorBase(fn)

Bases: object

Base class for decorators, handles name wrapping and allows the decorator to take an optional kwarg.

determine_context(args, kwargs)

Determine a context from a functions arguments.

key_and_push_args(args, kwargs, context=None, da_handler=None)

Push a tuple of args and dict of kwargs to the engines. Return a tuple with keys corresponding to args values on the engines. And a dictionary with the same keys and values which are the keys to the input dictionary’s values.

This allows us to use the following interface to execute code on the engines:

>>> def foo(*args, **kwargs):
>>>     args, kwargs = _key_and_push_args(args, kwargs)
>>>     exec_str = "remote_foo(*%s, **%s)"
>>>     exec_str %= (args, kwargs)
>>>     context.execute(exec_str)
process_return_value(context, result_key)

Figure out what to return on the Client.

Parameters:key (string) – Key corresponding to wrapped function’s return value.
Returns:A DistArray (if locally all values are DistArray), a None (if locally all values are None), or else, pull the result back to the client and return it. If all but one of the pulled values is None, return that non-None value only.
Return type:Varied
push_fn(context, fn_key, fn)

Push function to the engines.

class distarray.dist.decorators.local(fn)

Bases: distarray.dist.decorators.DecoratorBase

Decorator to run a function locally on the engines.

class distarray.dist.decorators.vectorize(fn)

Bases: distarray.dist.decorators.DecoratorBase

Analogous to numpy.vectorize. Input DistArray’s must all be the same shape, and this will be the shape of the output distarray.

get_ndarray(da, arg_keys)

distarray Module

The Distarray data structure.`DistArray` objects are proxies for collections of LocalArray objects. They are meant to roughly emulate NumPy ndarrays.

class distarray.dist.distarray.DistArray(distribution, dtype=<type 'float'>)

Bases: object

context
dist
dtype
fill(value)
classmethod from_localarrays(key, context=None, targets=None, distribution=None, dtype=None)

The caller has already created the LocalArray objects. key is their name on the engines. This classmethod creates a DistArray that refers to these LocalArrays.

Either a context or a distribution must also be provided. If context is provided, a dim_data_per_rank will be pulled from the existing LocalArrays and a Distribution will be created from it. If distribution is provided, it should accurately reflect the distribution of the existing LocalArrays.

If dtype is not provided, it will be fetched from the engines.

get_dist_matrix()
get_localarrays()

Pull the LocalArray objects from the engines.

Returns:one localarray per process
Return type:list of localarrays
get_localshapes()
get_ndarrays()

Pull the local ndarrays from the engines.

Returns:one ndarray per process
Return type:list of ndarrays
global_size
grid_shape
itemsize
max(axis=None, dtype=None, out=None)

Return the maximum of array elements over the given axis.

mean(axis=None, dtype=<type 'float'>, out=None)

Return the mean of array elements over the given axis.

min(axis=None, dtype=None, out=None)

Return the minimum of array elements over the given axis.

nbytes
ndim
shape
std(axis=None, dtype=<type 'float'>, out=None)

Return the standard deviation of array elements over the given axis.

sum(axis=None, dtype=None, out=None)

Return the sum of array elements over the given axis.

targets
toarray()

Returns the distributed array as an ndarray.

tondarray()

Returns the distributed array as an ndarray.

var(axis=None, dtype=<type 'float'>, out=None)

Return the variance of array elements over the given axis.

functions Module

Distributed unfuncs for distributed arrays.

distarray.dist.functions.absolute(a, *args, **kwargs)
distarray.dist.functions.arccos(a, *args, **kwargs)
distarray.dist.functions.arccosh(a, *args, **kwargs)
distarray.dist.functions.arcsin(a, *args, **kwargs)
distarray.dist.functions.arcsinh(a, *args, **kwargs)
distarray.dist.functions.arctan(a, *args, **kwargs)
distarray.dist.functions.arctanh(a, *args, **kwargs)
distarray.dist.functions.conjugate(a, *args, **kwargs)
distarray.dist.functions.cos(a, *args, **kwargs)
distarray.dist.functions.cosh(a, *args, **kwargs)
distarray.dist.functions.exp(a, *args, **kwargs)
distarray.dist.functions.expm1(a, *args, **kwargs)
distarray.dist.functions.invert(a, *args, **kwargs)
distarray.dist.functions.log(a, *args, **kwargs)
distarray.dist.functions.log10(a, *args, **kwargs)
distarray.dist.functions.log1p(a, *args, **kwargs)
distarray.dist.functions.negative(a, *args, **kwargs)
distarray.dist.functions.reciprocal(a, *args, **kwargs)
distarray.dist.functions.rint(a, *args, **kwargs)
distarray.dist.functions.sign(a, *args, **kwargs)
distarray.dist.functions.sin(a, *args, **kwargs)
distarray.dist.functions.sinh(a, *args, **kwargs)
distarray.dist.functions.sqrt(a, *args, **kwargs)
distarray.dist.functions.square(a, *args, **kwargs)
distarray.dist.functions.tan(a, *args, **kwargs)
distarray.dist.functions.tanh(a, *args, **kwargs)
distarray.dist.functions.add(a, b, *args, **kwargs)
distarray.dist.functions.arctan2(a, b, *args, **kwargs)
distarray.dist.functions.bitwise_and(a, b, *args, **kwargs)
distarray.dist.functions.bitwise_or(a, b, *args, **kwargs)
distarray.dist.functions.bitwise_xor(a, b, *args, **kwargs)
distarray.dist.functions.divide(a, b, *args, **kwargs)
distarray.dist.functions.floor_divide(a, b, *args, **kwargs)
distarray.dist.functions.fmod(a, b, *args, **kwargs)
distarray.dist.functions.hypot(a, b, *args, **kwargs)
distarray.dist.functions.left_shift(a, b, *args, **kwargs)
distarray.dist.functions.mod(a, b, *args, **kwargs)
distarray.dist.functions.multiply(a, b, *args, **kwargs)
distarray.dist.functions.power(a, b, *args, **kwargs)
distarray.dist.functions.remainder(a, b, *args, **kwargs)
distarray.dist.functions.right_shift(a, b, *args, **kwargs)
distarray.dist.functions.subtract(a, b, *args, **kwargs)
distarray.dist.functions.true_divide(a, b, *args, **kwargs)
distarray.dist.functions.less(a, b, *args, **kwargs)
distarray.dist.functions.less_equal(a, b, *args, **kwargs)
distarray.dist.functions.equal(a, b, *args, **kwargs)
distarray.dist.functions.not_equal(a, b, *args, **kwargs)
distarray.dist.functions.greater(a, b, *args, **kwargs)
distarray.dist.functions.greater_equal(a, b, *args, **kwargs)

ipython_utils Module

The single IPython entry point.

maps Module

Distribution class and auxiliary ClientMap classes.

The Distribution is a multi-dimensional map class that manages the one-dimensional maps for each DistArray dimension. The Distribution class represents the distribution information for a distributed array, independent of the distributed array’s data. Distributions allow DistArrays to reduce overall communication when indexing and slicing by determining which processes own (or may possibly own) the indices in question. Two DistArray objects can share the same Distribution if they have the exact same distribution.

The one-dimensional ClientMap classes keep track of which process owns which index in that dimension. This class has several subclasses for specific distribution types, including BlockMap, CyclicMap, NoDistMap, and UnstructuredMap.

class distarray.dist.maps.BlockCyclicMap(size, grid_size, block_size=1)

Bases: distarray.dist.maps.MapBase

dist = 'c'
classmethod from_axis_dim_dicts(axis_dim_dicts)
classmethod from_global_dim_dict(glb_dim_dict)
get_dimdicts()
owners(idx)
class distarray.dist.maps.BlockMap(size, grid_size)

Bases: distarray.dist.maps.MapBase

dist = 'b'
classmethod from_axis_dim_dicts(axis_dim_dicts)
classmethod from_global_dim_dict(glb_dim_dict)
get_dimdicts()
owners(idx)
class distarray.dist.maps.Distribution(context, global_dim_data, targets=None)

Bases: object

Governs the mapping between global indices and process ranks for multi-dimensional objects.

classmethod from_dim_data_per_rank(context, dim_data_per_rank, targets=None)

Create a Distribution from a sequence of dim_data tuples.

classmethod from_shape(context, shape, dist=None, grid_shape=None, targets=None)
get_dim_data_per_rank()
has_precise_index

Does the client-side Distribution know precisely who owns all indices?

This can be used to determine whether one needs to use the checked version of __getitem__ or __setitem__ on LocalArrays.

is_compatible(o)
owning_ranks(idxs)

Returns a list of ranks that may possibly own the location in the idxs tuple.

For many distribution types, the owning rank is precisely known; for others, it is only probably known. When the rank is precisely known, owning_ranks() returns a list of exactly one rank. Otherwise, returns a list of more than one rank.

If the idxs tuple is out of bounds, raises IndexError.

owning_targets(idxs)

Like owning_ranks() but returns a list of targets rather than ranks.

Convenience method meant for IPython parallel usage.

reduce(axes)

Returns a new Distribution reduced along axis, i.e., the new distribution has one fewer dimension than self.

class distarray.dist.maps.MapBase

Bases: object

Base class for one-dimensional client-side maps.

Maps keep track of the relevant distribution information for a single dimension of a distributed array. Maps allow distributed arrays to keep track of which process to talk to when indexing and slicing.

Classes that inherit from MapBase must implement the owners() abstractmethod.

is_compatible(map)
owners(idx)

Returns a list of process IDs in this dimension that might possibly own idx.

Raises IndexError if idx is out of bounds.

class distarray.dist.maps.NoDistMap(size, grid_size)

Bases: distarray.dist.maps.MapBase

dist = 'n'
classmethod from_axis_dim_dicts(axis_dim_dicts)
classmethod from_global_dim_dict(glb_dim_dict)
get_dimdicts()
owners(idx)
class distarray.dist.maps.UnstructuredMap(size, grid_size, indices=None)

Bases: distarray.dist.maps.MapBase

dist = 'u'
classmethod from_axis_dim_dicts(axis_dim_dicts)
classmethod from_global_dim_dict(glb_dim_dict)
get_dimdicts()
owners(idx)
distarray.dist.maps.choose_map(dist_type)

Choose a map class given one of the distribution types.

distarray.dist.maps.map_from_global_dim_dict(global_dim_dict)

Given a global_dim_dict return map.

distarray.dist.maps.map_from_sizes(size, dist_type, grid_size)

Returns an instance of the appropriate subclass of MapBase.

random Module

Emulate numpy.random

class distarray.dist.random.Random(context)

Bases: object

normal(distribution, loc=0.0, scale=1.0)

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [2], is often called the bell curve because of its characteristic shape (see the example below).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution [2].

Parameters:
  • loc (float) – Mean (“centre”) of the distribution.
  • scale (float) – Standard deviation (spread or “width”) of the distribution.

Notes

The probability density for the Gaussian distribution is

\[p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },\]

where \(\mu\) is the mean and \(\sigma\) the standard deviation. The square of the standard deviation, \(\sigma^2\), is called the variance.

The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at \(x + \sigma\) and \(x - \sigma\) [2]). This implies that numpy.random.normal is more likely to return samples lying close to the mean, rather than those far away.

References

[1]Wikipedia, “Normal distribution”, http://en.wikipedia.org/wiki/Normal_distribution
[2](1, 2, 3) P. R. Peebles Jr., “Central Limit Theorem” in “Probability, Random Variables and Random Signal Principles”, 4th ed., 2001, pp. 51, 51, 125.
rand(distribution)

Random values over a given distribution.

Create a distarray of the given shape and propagate it with random samples from a uniform distribution over [0, 1).

Returns:out – Random values.
Return type:DistArray
randint(distribution, low, high=None)

Return random integers from low (inclusive) to high (exclusive).

Return random integers from the “discrete uniform” distribution in the “half-open” interval [low, high). If high is None (the default), then results are from [0, low).

Parameters:
  • distribution (Distribution object) –
  • low (int) – Lowest (signed) integer to be drawn from the distribution (unless high=None, in which case this parameter is the highest such integer).
  • high (int, optional) – if provided, one above the largest (signed) integer to be drawn from the distribution (see above for behavior if high=None).
Returns:

out – DistArray of random integers from the appropriate distribution.

Return type:

DistArray of ints

randn(distribution)

Return samples from the “standard normal” distribution.

Returns:out – A DistArray of floating-point samples from the standard normal distribution.
Return type:DistArray
seed(seed=None)

Seed the random number generators on each engine.

Parameters:seed (None, int, or array of integers) –

Base random number seed to use on each engine. If None, then a non-deterministic seed is obtained from the operating system. Otherwise, the seed is used as passed, and the sequence of random numbers will be deterministic.

Each individual engine has its state adjusted so that it is different from each other engine. Thus, each engine will compute a different sequence of random numbers.