DistArray provides general multidimensional NumPy-like distributed arrays to Python. It intends to bring the strengths of NumPy to data-parallel high-performance computing. DistArray has a similar API to NumPy.
The project is currently under heavy development and things are changing quickly!
DistArray is for users who
- know and love Python and NumPy,
- want to scale NumPy to larger distributed datasets,
- want to interactively play with distributed data but also
- want to run batch-oriented distributed programs;
- want an easier way to drive and coordinate existing MPI-based codes,
- have a lot of data that may already be distributed,
- want a global view (“think globally”) with local control (“act locally”),
- need to tap into existing parallel libraries like Trilinos, PETSc, or Elemental,
- want the interactivity of IPython and the performance of MPI.
DistArray is designed to work with other packages that implement the Distributed Array Protocol.
Dependencies for DistArray:
- For HDF5 IO: h5py built against a parallel-enabled build of HDF5
- For plotting: matplotlib
Dependencies to build the documentation:
If you have the above, you should be able to install this package with:
python setup.py install
python setup.py develop
To run the tests, you will need to start an IPython.parallel cluster. You can use ipcluster, or you can use the dacluster command which comes with DistArray:
You should then be able to run all the tests with:
To build this documentation, navigate to the docs directory and use the Makefile there. For example, to build the html documentation:
from the docs directory.
To see some initial examples of what distarray can do, check out the examples directory and our tests. More usage examples will be forthcoming as the API stabilizes.
DistArray was started by Brian Granger in 2008 and is currently being developed at Enthought by a team led by Kurt Smith, in partnership with Bill Spotz from Sandia’s (Py)Trilinos project and Brian Granger and Min RK from the IPython project.