HCC
HCC is a single-source, C/C++ compiler for heterogeneous computing. It's optimized with HSA (http://www.hsafoundation.com/).
Public Member Functions | Friends | List of all members
hc::accelerator_view Class Reference

Represents a logical (isolated) accelerator view of a compute accelerator. More...

#include <hc.hpp>

Collaboration diagram for hc::accelerator_view:
Collaboration graph

Public Member Functions

 accelerator_view (const accelerator_view &other)
 Copy-constructs an accelerator_view object. More...
 
accelerator_viewoperator= (const accelerator_view &other)
 Assigns an accelerator_view object to "this" accelerator_view object and returns a reference to "this" object. More...
 
queuing_mode get_queuing_mode () const
 Returns the queuing mode that this accelerator_view was created with. More...
 
execute_order get_execute_order () const
 Returns the execution order of this accelerator_view.
 
bool get_is_auto_selection ()
 Returns a boolean value indicating whether the accelerator view when passed to a parallel_for_each would result in automatic selection of an appropriate execution target by the runtime. More...
 
unsigned int get_version () const
 Returns a 32-bit unsigned integer representing the version number of this accelerator view. More...
 
accelerator get_accelerator () const
 Returns the accelerator that this accelerator_view has been created on.
 
bool get_is_debug () const
 Returns a boolean value indicating whether the accelerator_view supports debugging through extensive error reporting. More...
 
void wait (hcWaitMode waitMode=hcWaitModeBlocked)
 Performs a blocking wait for completion of all commands submitted to the accelerator view prior to calling wait(). More...
 
void flush ()
 Sends the queued up commands in the accelerator_view to the device for execution. More...
 
completion_future create_marker (memory_scope fence_scope=system_scope) const
 This command inserts a marker event into the accelerator_view's command queue. More...
 
completion_future create_blocking_marker (completion_future &dependent_future, memory_scope fence_scope=system_scope) const
 This command inserts a marker event into the accelerator_view's command queue with a prior dependent asynchronous event. More...
 
completion_future create_blocking_marker (std::initializer_list< completion_future > dependent_future_list, memory_scope fence_scope=system_scope) const
 This command inserts a marker event into the accelerator_view's command queue with arbitrary number of dependent asynchronous events. More...
 
template<typename InputIterator >
completion_future create_blocking_marker (InputIterator first, InputIterator last, memory_scope scope) const
 This command inserts a marker event into the accelerator_view's command queue with arbitrary number of dependent asynchronous events. More...
 
void copy (const void *src, void *dst, size_t size_bytes)
 Copies size_bytes bytes from src to dst. More...
 
void copy_ext (const void *src, void *dst, size_t size_bytes, hcCommandKind copyDir, const hc::AmPointerInfo &srcInfo, const hc::AmPointerInfo &dstInfo, const hc::accelerator *copyAcc, bool forceUnpinnedCopy)
 Copies size_bytes bytes from src to dst. More...
 
void copy_ext (const void *src, void *dst, size_t size_bytes, hcCommandKind copyDir, const hc::AmPointerInfo &srcInfo, const hc::AmPointerInfo &dstInfo, bool forceUnpinnedCopy)
 
completion_future copy_async (const void *src, void *dst, size_t size_bytes)
 Copies size_bytes bytes from src to dst. More...
 
completion_future copy_async_ext (const void *src, void *dst, size_t size_bytes, hcCommandKind copyDir, const hc::AmPointerInfo &srcInfo, const hc::AmPointerInfo &dstInfo, const hc::accelerator *copyAcc)
 Copies size_bytes bytes from src to dst. More...
 
bool operator== (const accelerator_view &other) const
 Compares "this" accelerator_view with the passed accelerator_view object to determine if they represent the same underlying object. More...
 
bool operator!= (const accelerator_view &other) const
 Compares "this" accelerator_view with the passed accelerator_view object to determine if they represent different underlying objects. More...
 
size_t get_max_tile_static_size ()
 Returns the maximum size of tile static area available on this accelerator view.
 
int get_pending_async_ops ()
 Returns the number of pending asynchronous operations on this accelerator view. More...
 
bool get_is_empty ()
 Returns true if the accelerator_view is currently empty. More...
 
void * get_hsa_queue ()
 Returns an opaque handle which points to the underlying HSA queue. More...
 
void * get_hsa_agent ()
 Returns an opaque handle which points to the underlying HSA agent. More...
 
void * get_hsa_am_region ()
 Returns an opaque handle which points to the AM region on the HSA agent. More...
 
void * get_hsa_am_system_region ()
 Returns an opaque handle which points to the AM system region on the HSA agent. More...
 
void * get_hsa_am_finegrained_system_region ()
 Returns an opaque handle which points to the AM system region on the HSA agent. More...
 
void * get_hsa_kernarg_region ()
 Returns an opaque handle which points to the Kernarg region on the HSA agent. More...
 
bool is_hsa_accelerator ()
 Returns if the accelerator view is based on HSA.
 
void dispatch_hsa_kernel (const hsa_kernel_dispatch_packet_t *aql, const void *args, size_t argsize, hc::completion_future *cf=nullptr, const char *kernel_name=nullptr)
 Dispatch a kernel into the accelerator_view. More...
 
bool set_cu_mask (const std::vector< bool > &cu_mask)
 Set a CU affinity to specific command queues. More...
 

Friends

class accelerator
 
template<typename Q , int K>
class array
 
template<typename Q , int K>
class array_view
 
template<typename Kernel >
void * Kalmar::mcw_cxxamp_get_kernel (const std::shared_ptr< Kalmar::KalmarQueue > &, const Kernel &)
 
template<typename Kernel , int dim_ext>
void Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory (const std::shared_ptr< Kalmar::KalmarQueue > &, size_t *, size_t *, const Kernel &, void *, size_t)
 
template<typename Kernel , int dim_ext>
std::shared_ptr< Kalmar::KalmarAsyncOp > Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async (const std::shared_ptr< Kalmar::KalmarQueue > &, size_t *, size_t *, const Kernel &, void *, size_t)
 
template<typename Kernel , int dim_ext>
void Kalmar::mcw_cxxamp_launch_kernel (const std::shared_ptr< Kalmar::KalmarQueue > &, size_t *, size_t *, const Kernel &)
 
template<typename Kernel , int dim_ext>
std::shared_ptr< Kalmar::KalmarAsyncOp > Kalmar::mcw_cxxamp_launch_kernel_async (const std::shared_ptr< Kalmar::KalmarQueue > &, size_t *, size_t *, const Kernel &)
 
template<int N, typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const extent< N > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const extent< 1 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const extent< 2 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const extent< 3 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 3 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 2 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 1 > &, const Kernel &)
 

Detailed Description

Represents a logical (isolated) accelerator view of a compute accelerator.

An object of this type can be obtained by calling the default_view property or create_view member functions on an accelerator object.

Constructor & Destructor Documentation

hc::accelerator_view::accelerator_view ( const accelerator_view other)
inline

Copy-constructs an accelerator_view object.

This function does a shallow copy with the newly created accelerator_view object pointing to the same underlying view as the "other" parameter.

Parameters
[in]otherThe accelerator_view object to be copied.

Member Function Documentation

void hc::accelerator_view::copy ( const void *  src,
void *  dst,
size_t  size_bytes 
)
inline

Copies size_bytes bytes from src to dst.

Src and dst must not overlap. Note the src is the first parameter and dst is second, following C++ convention. The copy command will execute after any commands already inserted into the accelerator_view finish. This is a synchronous copy command, and the copy operation complete before this call returns.

completion_future hc::accelerator_view::copy_async ( const void *  src,
void *  dst,
size_t  size_bytes 
)
inline

Copies size_bytes bytes from src to dst.

Src and dst must not overlap. Note the src is the first parameter and dst is second, following C++ convention. This is an asynchronous copy command, and this call may return before the copy operation completes. If the source or dest is host memory, the memory must be pinned or a runtime exception will be thrown. Pinned memory can be created with am_alloc with flag=amHostPinned flag.

The copy command will be implicitly ordered with respect to commands previously equeued to this accelerator_view:

  • If the accelerator_view execute_order is execute_in_order (the default), then the copy will execute after all previously sent commands finish execution.
  • If the accelerator_view execute_order is execute_any_order, then the copy will start after all previously send commands start but can execute in any order.
completion_future hc::accelerator_view::copy_async_ext ( const void *  src,
void *  dst,
size_t  size_bytes,
hcCommandKind  copyDir,
const hc::AmPointerInfo srcInfo,
const hc::AmPointerInfo dstInfo,
const hc::accelerator copyAcc 
)
inline

Copies size_bytes bytes from src to dst.

Src and dst must not overlap. Note the src is the first parameter and dst is second, following C++ convention. This is an asynchronous copy command, and this call may return before the copy operation completes. If the source or dest is host memory, the memory must be pinned or a runtime exception will be thrown. Pinned memory can be created with am_alloc with flag=amHostPinned flag.

The copy command will be implicitly ordered with respect to commands previously enqueued to this accelerator_view:

  • If the accelerator_view execute_order is execute_in_order (the default), then the copy will execute after all previously sent commands finish execution.
  • If the accelerator_view execute_order is execute_any_order, then the copy will start after all previously send commands start but can execute in any order. The copyAcc determines where the copy is executed and does not affect the ordering.

The copy_async_ext flavor allows caller to provide additional information about each pointer, which can improve performance by eliminating replicated lookups, and also allow control over which device performs the copy. This interface is intended for language runtimes such as HIP.

copyDir : Specify direction of copy. Must be hcMemcpyHostToHost, hcMemcpyHostToDevice, hcMemcpyDeviceToHost, or hcMemcpyDeviceToDevice. copyAcc : Specify which accelerator performs the copy operation. The specified accelerator must have access to the source and dest pointers - either because the memory is allocated on those devices or because the accelerator has peer access to the memory. If copyAcc is nullptr, then the copy will be performed by the host. In this case, the host accelerator must have access to both pointers. The copy operation will be performed by the specified engine but is not synchronized with respect to any operations on that device.

void hc::accelerator_view::copy_ext ( const void *  src,
void *  dst,
size_t  size_bytes,
hcCommandKind  copyDir,
const hc::AmPointerInfo srcInfo,
const hc::AmPointerInfo dstInfo,
const hc::accelerator copyAcc,
bool  forceUnpinnedCopy 
)
inline

Copies size_bytes bytes from src to dst.

Src and dst must not overlap. Note the src is the first parameter and dst is second, following C++ convention. The copy command will execute after any commands already inserted into the accelerator_view finish. This is a synchronous copy command, and the copy operation complete before this call returns. The copy_ext flavor allows caller to provide additional information about each pointer, which can improve performance by eliminating replicated lookups. This interface is intended for language runtimes such as HIP.

copyDir : Specify direction of copy. Must be hcMemcpyHostToHost, hcMemcpyHostToDevice, hcMemcpyDeviceToHost, or hcMemcpyDeviceToDevice. forceUnpinnedCopy : Force copy to be performed with host involvement rather than with accelerator copy engines.

completion_future hc::accelerator_view::create_blocking_marker ( completion_future dependent_future,
memory_scope  fence_scope = system_scope 
) const
inline

This command inserts a marker event into the accelerator_view's command queue with a prior dependent asynchronous event.

This marker is returned as a completion_future object. When its dependent event and all commands submitted prior to the marker event creation have been completed, the future is ready.

Regardless of the accelerator_view's execute_order (execute_any_order, execute_in_order), the marker always ensures older commands complete before the returned completion_future is marked ready. Thus, markers provide a mechanism to enforce order between commands in an execute_any_order accelerator_view.

fence_scope controls the scope of the acquire and release fences applied after the marker executes. Options are:

  • no_scope : No fence operation is performed.
  • accelerator_scope: Memory is acquired from and released to the accelerator scope where the marker executes.
  • system_scope: Memory is acquired from and released to system scope (all accelerators including CPUs)

dependent_futures may be recorded in another queue or another accelerator. If in another accelerator, the runtime performs cross-accelerator sychronization.

Returns
A future which can be waited on, and will block until the current batch of commands, plus the dependent event have been completed.
completion_future hc::accelerator_view::create_blocking_marker ( std::initializer_list< completion_future dependent_future_list,
memory_scope  fence_scope = system_scope 
) const
inline

This command inserts a marker event into the accelerator_view's command queue with arbitrary number of dependent asynchronous events.

This marker is returned as a completion_future object. When its dependent events and all commands submitted prior to the marker event creation have been completed, the completion_future is ready.

Regardless of the accelerator_view's execute_order (execute_any_order, execute_in_order), the marker always ensures older commands complete before the returned completion_future is marked ready. Thus, markers provide a mechanism to enforce order between commands in an execute_any_order accelerator_view.

fence_scope controls the scope of the acquire and release fences applied after the marker executes. Options are:

  • no_scope : No fence operation is performed.
  • accelerator_scope: Memory is acquired from and released to the accelerator scope where the marker executes.
  • system_scope: Memory is acquired from and released to system scope (all accelerators including CPUs)
Returns
A future which can be waited on, and will block until the current batch of commands, plus the dependent event have been completed.
template<typename InputIterator >
completion_future hc::accelerator_view::create_blocking_marker ( InputIterator  first,
InputIterator  last,
memory_scope  scope 
) const
inline

This command inserts a marker event into the accelerator_view's command queue with arbitrary number of dependent asynchronous events.

This marker is returned as a completion_future object. When its dependent events and all commands submitted prior to the marker event creation have been completed, the completion_future is ready.

Regardless of the accelerator_view's execute_order (execute_any_order, execute_in_order), the marker always ensures older commands complete before the returned completion_future is marked ready. Thus, markers provide a mechanism to enforce order between commands in an execute_any_order accelerator_view.

Returns
A future which can be waited on, and will block until the current batch of commands, plus the dependent event have been completed.
completion_future hc::accelerator_view::create_marker ( memory_scope  fence_scope = system_scope) const
inline

This command inserts a marker event into the accelerator_view's command queue.

This marker is returned as a completion_future object. When all commands that were submitted prior to the marker event creation have completed, the future is ready.

Regardless of the accelerator_view's execute_order (execute_any_order, execute_in_order), the marker always ensures older commands complete before the returned completion_future is marked ready. Thus, markers provide a mechanism to enforce order between commands in an execute_any_order accelerator_view.

fence_scope controls the scope of the acquire and release fences applied after the marker executes. Options are:

  • no_scope : No fence operation is performed.
  • accelerator_scope: Memory is acquired from and released to the accelerator scope where the marker executes.
  • system_scope: Memory is acquired from and released to system scope (all accelerators including CPUs)
Returns
A future which can be waited on, and will block until the current batch of commands has completed.
void hc::accelerator_view::dispatch_hsa_kernel ( const hsa_kernel_dispatch_packet_t *  aql,
const void *  args,
size_t  argsize,
hc::completion_future cf = nullptr,
const char *  kernel_name = nullptr 
)
inline

Dispatch a kernel into the accelerator_view.

This function is intended to provide a gateway to dispatch code objects, with some assistance from HCC. Kernels are specified in the standard code object format, and can be created from a varety of compiler tools including the assembler, offline cl compilers, or other tools. The caller also specifies the execution configuration and kernel arguments. HCC will copy the kernel arguments into an appropriate segment and insert the packet into the queue. HCC will also automatically handle signal and kernarg allocation and deallocation for the command.

The kernel is dispatched asynchronously, and thus this API may return before the kernel finishes executing.

Kernels dispatched with this API may be interleaved with other copy and kernel commands generated from copy or parallel_for_each commands. The kernel honors the execute_order associated with the accelerator_view. Specifically, if execute_order is execute_in_order, then the kernel will wait for older data and kernel commands in the same queue before beginning execution. If execute_order is execute_any_order, then the kernel may begin executing without regards to the state of older kernels. This call honors the packer barrier bit (1 << HSA_PACKET_HEADER_BARRIER) if set in the aql.header field. If set, this provides the same synchronization behaviora as execute_in_order for the command generated by this API.

aql is an HSA-format "AQL" packet. The following fields must be set by the caller: aql.kernel_object aql.group_segment_size : includes static + dynamic group size aql.private_segment_size aql.grid_size_x, aql.grid_size_y, aql.grid_size_z aql.group_size_x, aql.group_size_y, aql.group_size_z aql.setup : The 2 bits at HSA_KERNEL_DISPATCH_PACKET_SETUP_DIMENSIONS. aql.header : Must specify the desired memory fence operations, and barrier bit (if desired.). A typical conservative setting would be: aql.header = (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_ACQUIRE_FENCE_SCOPE) | (HSA_FENCE_SCOPE_SYSTEM << HSA_PACKET_HEADER_RELEASE_FENCE_SCOPE) | (1 << HSA_PACKET_HEADER_BARRIER);

The following fields are ignored. The API will will set up these fields before dispatching the AQL packet: aql.completion_signal aql.kernarg

args : Pointer to kernel arguments with the size and aligment expected by the kernel. The args are copied and then passed directly to the kernel. After this function returns, the args memory may be deallocated. argSz : Size of the arguments. cf : Written with a completion_future that can be used to track the status of the dispatch. May be NULL, in which case no completion_future is returned and the caller must use other synchronization techniqueues such as calling accelerator_view::wait() or waiting on a younger command in the same queue. kernel_name : Optionally specify the name of the kernel for debug and profiling. May be null. If specified, the caller is responsible for ensuring the memory for the name remains allocated until the kernel completes.

The dispatch_hsa_kernel call will perform the following operations:

  • Efficiently allocate a kernarg region and copy the arguments.
  • Efficiently allocate a signal, if required.
  • Dispatch the command into the queue and flush it to the GPU.
  • Kernargs and signals are automatically reclaimed by the HCC runtime.
void hc::accelerator_view::flush ( )
inline

Sends the queued up commands in the accelerator_view to the device for execution.

An accelerator_view internally maintains a buffer of commands such as data transfers between the host memory and device buffers, and kernel invocations (parallel_for_each calls). This member function sends the commands to the device for processing. Normally, these commands to the GPU automatically whenever the runtime determines that they need to be, such as when the command buffer is full or when waiting for transfer of data from the device buffers to host memory. The flush member function will send the commands manually to the device.

Calling this member function incurs an overhead and must be used with discretion. A typical use of this member function would be when the CPU waits for an arbitrary amount of time and would like to force the execution of queued device commands in the meantime. It can also be used to ensure that resources on the accelerator are reclaimed after all references to them have been removed.

Because flush operates asynchronously, it can return either before or after the device finishes executing the buffered commandser, the commands will eventually always complete.

If the queuing_mode is queuing_mode_immediate, this function has no effect.

Returns
None
void* hc::accelerator_view::get_hsa_agent ( )
inline

Returns an opaque handle which points to the underlying HSA agent.

Returns
An opaque handle of the underlying HSA agent, if the accelerator view is based on HSA. NULL otherwise.
void* hc::accelerator_view::get_hsa_am_finegrained_system_region ( )
inline

Returns an opaque handle which points to the AM system region on the HSA agent.

This region can be used to allocate finegrained system memory which is accessible from the specified accelerator.

Returns
An opaque handle of the region, if the accelerator is based on HSA. NULL otherwise.
void* hc::accelerator_view::get_hsa_am_region ( )
inline

Returns an opaque handle which points to the AM region on the HSA agent.

This region can be used to allocate accelerator memory which is accessible from the specified accelerator.

Returns
An opaque handle of the region, if the accelerator is based on HSA. NULL otherwise.
void* hc::accelerator_view::get_hsa_am_system_region ( )
inline

Returns an opaque handle which points to the AM system region on the HSA agent.

This region can be used to allocate system memory which is accessible from the specified accelerator.

Returns
An opaque handle of the region, if the accelerator is based on HSA. NULL otherwise.
void* hc::accelerator_view::get_hsa_kernarg_region ( )
inline

Returns an opaque handle which points to the Kernarg region on the HSA agent.

Returns
An opaque handle of the region, if the accelerator view is based on HSA. NULL otherwise.
void* hc::accelerator_view::get_hsa_queue ( )
inline

Returns an opaque handle which points to the underlying HSA queue.

Returns
An opaque handle of the underlying HSA queue, if the accelerator view is based on HSA. NULL if otherwise.
bool hc::accelerator_view::get_is_auto_selection ( )
inline

Returns a boolean value indicating whether the accelerator view when passed to a parallel_for_each would result in automatic selection of an appropriate execution target by the runtime.

In other words, this is the accelerator view that will be automatically selected if parallel_for_each is invoked without explicitly specifying an accelerator view.

Returns
A boolean value indicating if the accelerator_view is the auto selection accelerator_view.
bool hc::accelerator_view::get_is_debug ( ) const
inline

Returns a boolean value indicating whether the accelerator_view supports debugging through extensive error reporting.

The is_debug property of the accelerator view is usually same as that of the parent accelerator.

bool hc::accelerator_view::get_is_empty ( )
inline

Returns true if the accelerator_view is currently empty.

Care must be taken to use this API in a thread-safe manner. As the accelerator completes work, the queue may become empty after this function returns false;

int hc::accelerator_view::get_pending_async_ops ( )
inline

Returns the number of pending asynchronous operations on this accelerator view.

Care must be taken to use this API in a thread-safe manner,

queuing_mode hc::accelerator_view::get_queuing_mode ( ) const
inline

Returns the queuing mode that this accelerator_view was created with.

See "Queuing Mode".

Returns
The queuing mode.
unsigned int hc::accelerator_view::get_version ( ) const
inline

Returns a 32-bit unsigned integer representing the version number of this accelerator view.

The format of the integer is major.minor, where the major version number is in the high-order 16 bits, and the minor version number is in the low-order bits.

The version of the accelerator view is usually the same as that of the parent accelerator.

bool hc::accelerator_view::operator!= ( const accelerator_view other) const
inline

Compares "this" accelerator_view with the passed accelerator_view object to determine if they represent different underlying objects.

Parameters
[in]otherThe accelerator_view object to be compared against.
Returns
A boolean value indicating whether the passed accelerator_view object is different from "this" accelerator_view.
accelerator_view& hc::accelerator_view::operator= ( const accelerator_view other)
inline

Assigns an accelerator_view object to "this" accelerator_view object and returns a reference to "this" object.

This function does a shallow assignment with the newly created accelerator_view object pointing to the same underlying view as the passed accelerator_view parameter.

Parameters
[in]otherThe accelerator_view object to be assigned from.
Returns
A reference to "this" accelerator_view object.
bool hc::accelerator_view::operator== ( const accelerator_view other) const
inline

Compares "this" accelerator_view with the passed accelerator_view object to determine if they represent the same underlying object.

Parameters
[in]otherThe accelerator_view object to be compared against.
Returns
A boolean value indicating whether the passed accelerator_view object is same as "this" accelerator_view.
bool hc::accelerator_view::set_cu_mask ( const std::vector< bool > &  cu_mask)
inline

Set a CU affinity to specific command queues.

The setting is permanent until the queue is destroyed or CU affinity is set again. This setting is "atomic", it won't affect the dispatch in flight.

Parameters
cu_maska bool vector to indicate what CUs you want to use. True represents using the cu. The first 32 elements represents the first 32 CUs, and so on. If its size is greater than physical CU number, the extra elements are ignored. It is user's responsibility to make sure the input is meaningful.
Returns
true if operations succeeds or false if not.
void hc::accelerator_view::wait ( hcWaitMode  waitMode = hcWaitModeBlocked)
inline

Performs a blocking wait for completion of all commands submitted to the accelerator view prior to calling wait().

Parameters
waitMode[in]An optional parameter to specify the wait mode. By default it would be hcWaitModeBlocked. hcWaitModeActive would be used to reduce latency with the expense of using one CPU core for active waiting.

The documentation for this class was generated from the following file: