Getting Started

This page provides information on how to quickly get up and running with umap.

Dependencies

At a minimum, cmake 3.5.1 or greater is required for building umap.

To enable the network-based datastore, the following external libraries are required.

  1. The Mercury RPC library (https://github.com/mercury-hpc/mercury.git)

  2. The Argobots threading framework (The https://github.com/pmodels/argobots.git)

  3. The Margo wrapper library (https://xgitlab.cels.anl.gov/sds/margo.git)

You can either install each of the above libraries manually, or run setup.sh to setup automatically. Define UMAP_DEP_ROOT if you want to install the libraries in a custom path:

$ export UMAP_DEP_ROOT=<Place to install the Margo libraries>
$ ./setup.sh

UMAP Build and Installation

Clone from the UMap git repository and set MARGO_ROOT to where you installed the dependency libraries:

$ git clone https://github.com/LLNL/umap.git
$ git checkout remote_region
$ mkdir build && cd build
$ cmake3 -DCMAKE_INSTALL_PREFIX="<Place to install umap>" -DMARGO_ROOT="$UMAP_DEP_ROOT" ..
$ make
$ make install

By default, umap will build a Release type build and will use the system defined directories for installation. To specify different build types or specify alternate installation paths, see the Advanced Configuration.

Umap install files to the lib, include and bin directories of the CMAKE_INSTALL_PREFIX.

Basic Usage

The interface to umap mirrors that of mmap(2) as shown:

  void* base_addr = umap(NULL, totalbytes, PROT_READ|PROT_WRITE, UMAP_PRIVATE, fd, 0);
  if ( base_addr == UMAP_FAILED ) {
    int eno = errno;
    std::cerr << "Failed to umap " << fname << ": " << strerror(eno) << std::endl;
    return;
  }

The following code is a simple example of how one may use umap:

//////////////////////////////////////////////////////////////////////////////
// Copyright 2017-2020 Lawrence Livermore National Security, LLC and other
// UMAP Project Developers. See the top-level LICENSE file for details.
//
// SPDX-License-Identifier: LGPL-2.1-only
//////////////////////////////////////////////////////////////////////////////

/*
 * It is a simple example showing how an application may map to a a file,
 * Initialize the file with data, sort the data, then verify that sort worked
 * correctly.
 */
#include <iostream>
#include <parallel/algorithm>
#include <fcntl.h>
#include <omp.h>
#include <cstdio>
#include <cstring>
#include <vector>
#include "errno.h"
#include "umap/umap.h"

void
initialize_and_sort_file( const char* fname, uint64_t arraysize, uint64_t totalbytes, uint64_t psize )
{
  if ( unlink(fname) ) {
    int eno = errno;
    if ( eno != ENOENT ) {
      std::cerr << "Failed to unlink " << fname << ": " 
        << strerror(eno) << " Errno=" << eno << std::endl;
    }
  }

  int fd = open(fname, O_RDWR | O_LARGEFILE | O_DIRECT | O_CREAT, S_IRUSR | S_IWUSR);
  if ( fd == -1 ) {
    int eno = errno;
    std::cerr << "Failed to create " << fname << ": " << strerror(eno) << std::endl;
    return;
  }

  // If we are initializing, attempt to pre-allocate disk space for the file.
  try {
    int x;
    if ( ( x = posix_fallocate(fd, 0, totalbytes) != 0 ) ) {
      int eno = errno;
      std::cerr << "Failed to pre-allocate " << fname << ": " << strerror(eno) << std::endl;
      return;
    }
  } catch(const std::exception& e) {
    std::cerr << "posix_fallocate: " << e.what() << std::endl;
    return;
  } catch(...) {
    int eno = errno;
    std::cerr << "Failed to pre-allocate " << fname << ": " << strerror(eno) << std::endl;
    return;
  }

  void* base_addr = umap(NULL, totalbytes, PROT_READ|PROT_WRITE, UMAP_PRIVATE, fd, 0);
  if ( base_addr == UMAP_FAILED ) {
    int eno = errno;
    std::cerr << "Failed to umap " << fname << ": " << strerror(eno) << std::endl;
    return;
  }

  std::vector<umap_prefetch_item> pfi;
  char* base = (char*)base_addr;
  uint64_t PagesInTest = totalbytes / psize;

  std::cout << "Prefetching Pages\n";
  for ( int i{0}; i < PagesInTest; ++i) {
    umap_prefetch_item x = { .page_base_addr = &base[i * psize] };
    pfi.push_back(x);
  };
  umap_prefetch(PagesInTest, &pfi[0]);

  uint64_t *arr = (uint64_t *) base_addr;

  std::cout << "Initializing Array\n";

#pragma omp parallel for
  for(uint64_t i=0; i < arraysize; ++i)
    arr[i] = (uint64_t) (arraysize - i);

  std::cout << "Sorting Data\n";
  __gnu_parallel::sort(arr, &arr[arraysize], std::less<uint64_t>(), __gnu_parallel::quicksort_tag());


  if (uunmap(base_addr, totalbytes) < 0) {
    int eno = errno;
    std::cerr << "Failed to uumap " << fname << ": " << strerror(eno) << std::endl;
    return;
  }
  close(fd);
}

void
verify_sortfile( const char* fname, uint64_t arraysize, uint64_t totalbytes )
{
  int fd = open(fname, O_RDWR | O_LARGEFILE | O_DIRECT, S_IRUSR | S_IWUSR);
  if ( fd == -1 ) {
    std::cerr << "Failed to create " << fname << std::endl;
    return;
  }

  void* base_addr = umap(NULL, totalbytes, PROT_READ|PROT_WRITE, UMAP_PRIVATE, fd, 0);
  if ( base_addr == UMAP_FAILED ) {
    std::cerr << "umap failed\n";
    return;
  }
  uint64_t *arr = (uint64_t *) base_addr;

  std::cout << "Verifying Data\n";

#pragma omp parallel for
  for(uint64_t i = 0; i < arraysize; ++i)
    if (arr[i] != (i+1)) {
      std::cerr << "Data miscompare\n";
      i = arraysize;
    }
  
  if (uunmap(base_addr, totalbytes) < 0) {
    std::cerr << "uunamp failed\n";
    return;
  }
  std::cout << "Data is verified. uunmap done.\n"; 

  close(fd);
}

int
main(int argc, char **argv)
{
  const char* filename = argv[1];

  // Optional: Make umap's pages size double the default system page size
  //
  // Use UMAP_PAGESIZE environment variable to set page size for umap
  //
  uint64_t psize = umapcfg_get_umap_page_size();

  const uint64_t pagesInTest = 64;
  const uint64_t elemPerPage = psize / sizeof(uint64_t);

  const uint64_t arraySize = elemPerPage * pagesInTest;
  const uint64_t totalBytes = arraySize * sizeof(uint64_t);

  // Optional: Set umap's buffer to half the number of pages we need so that
  //           we may simulate an out-of-core experience
  //
  // Use UMAP_BUFSIZE environment variable to set number of pages in buffer
  //
  initialize_and_sort_file(filename, arraySize, totalBytes, psize);
  verify_sortfile(filename, arraySize, totalBytes);
  return 0;
}

Network-based Usage

The following code demonstrates how to create a network-based datastore on a server process:

Umap::Store* datastore  = new Umap::StoreNetworkServer("a", server_mem_addr, umap_region_length);

The following code demonstrates how to umap a network-based datastore on a client process:

void* base_addr   = umap_network("a", NULL, umap_region_length);
if ( base_addr == UMAP_FAILED ) {
  int eno = errno;
  std::cerr << "Failed to umap network" << ": " << strerror(eno) << std::endl;
  return 0;
}

To run the application on a cluster, first start the server processes.

srun --ntasks-per-node=$numServerProcPerNode -N $numServerNodes ${EXE}_server &

Then start the client processes after the server has published its connection information in serverfile.

srun --ntasks-per-node=$numClientProcPerNode -N $numClientNodes ${EXE}_client

Tests of the network based handler can be find in <root>/tests/remote_xx folders