HCC
HCC is a single-source, C/C++ compiler for heterogeneous computing. It's optimized with HSA (http://www.hsafoundation.com/).
Classes | Typedefs | Enumerations | Functions | Variables
hc Namespace Reference

Heterogeneous C++ (HC) namespace. More...

Classes

struct  __has_data
 
struct  __has_size
 
struct  __is_container
 
union  __u
 
class  accelerator
 Represents a physical accelerated computing device. More...
 
class  accelerator_view
 Represents a logical (isolated) accelerator view of a compute accelerator. More...
 
struct  am_allocator
 
class  AmPointerInfo
 
class  array
 Represents an N-dimensional region of memory (with type T) located on an accelerator. More...
 
struct  array_projection_helper
 
struct  array_projection_helper< T, 1 >
 
class  array_view
 The array_view<T,N> type represents a possibly cached view into the data held in an array<T,N>, or a section thereof. More...
 
class  array_view< const T, N >
 The partial specialization array_view<const T,N> represents a view over elements of type const T with rank N. More...
 
class  completion_future
 This class is the return type of all asynchronous APIs and has an interface analogous to std::shared_future<void>. More...
 
struct  copy_bidir
 
struct  copy_bidir< T, N, N >
 
struct  copy_input
 
struct  copy_input< InputIter, T, N, N >
 
struct  copy_output
 
struct  copy_output< OutputIter, T, N, N >
 
struct  do_copy
 
struct  do_copy< Iter, T, 1 >
 
struct  do_copy< T *, T, 1 >
 
struct  do_copy< T *, T, N >
 
class  extent
 Represents a unique position in N-dimensional space. More...
 
struct  pfe_helper
 
struct  pfe_helper< 0, Kernel, _Tp >
 
class  pfe_wrapper
 
class  PrintfPacket
 
union  PrintfPacketData
 
struct  projection_helper
 
struct  projection_helper< const T, 1 >
 
struct  projection_helper< const T, N >
 
struct  projection_helper< T, 1 >
 
class  tile_barrier
 The tile_barrier class is a capability class that is only creatable by the system, and passed to a tiled parallel_for_each function object as part of the tiled_index parameter. More...
 
class  tiled_extent
 Represents an extent subdivided into tiles. More...
 
class  tiled_extent< 1 >
 Represents an extent subdivided into tiles. More...
 
class  tiled_extent< 2 >
 Represents an extent subdivided into tiles. More...
 
class  tiled_extent< 3 >
 Represents an extent subdivided into tiles. More...
 
class  tiled_index
 Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More...
 
class  tiled_index< 1 >
 Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More...
 
class  tiled_index< 2 >
 Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More...
 

Typedefs

template<int N>
using index = Kalmar::index< N >
 Represents a unique position in N-dimensional space.
 
using runtime_exception = Kalmar::runtime_exception
 
using invalid_compute_domain = Kalmar::invalid_compute_domain
 
using accelerator_view_removed = Kalmar::accelerator_view_removed
 
typedef __fp16 half
 
template<typename T >
using pinned_vector = std::vector< T, am_allocator< T >>
 

Enumerations

enum  PrintfPacketDataType {
  PRINTF_BUFFER_SIZE = 0, PRINTF_STRING_BUFFER = 1, PRINTF_STRING_BUFFER_SIZE = 2, PRINTF_OFFSETS = 3,
  PRINTF_HEADER_SIZE = 4, PRINTF_MIN_SIZE = 5, PRINTF_UNUSED, PRINTF_UNSIGNED_INT,
  PRINTF_SIGNED_INT, PRINTF_FLOAT, PRINTF_DOUBLE, PRINTF_VOID_PTR,
  PRINTF_CONST_VOID_PTR, PRINTF_CHAR_PTR, PRINTF_CONST_CHAR_PTR
}
 
enum  PrintfError { PRINTF_SUCCESS = 0, PRINTF_BUFFER_OVERFLOW = 1, PRINTF_STRING_BUFFER_OVERFLOW = 2, PRINTF_UNKNOWN_ERROR = 3 }
 

Functions

uint64_t get_system_ticks ()
 Get the current tick count for the GPU platform. More...
 
uint64_t get_tick_frequency ()
 Get the frequency of ticks per second for the underlying asynchrnous operation. More...
 
unsigned int __wavesize () __HC__
 
unsigned int __popcount_u32_b32 (unsigned int input) __HC__
 Count number of 1 bits in the input. More...
 
unsigned int __popcount_u32_b64 (unsigned long long int input) __HC__
 Count number of 1 bits in the input. More...
 
unsigned int __firstbit_u32_u32 (unsigned int input) __HC__
 Count leading zero bits in the input. More...
 
unsigned int __firstbit_u32_u64 (unsigned long long int input) __HC__
 Count leading zero bits in the input. More...
 
unsigned int __firstbit_u32_s32 (int input) __HC__
 Count leading zero bits in the input. More...
 
unsigned int __firstbit_u32_s64 (long long int input) __HC__
 Count leading zero bits in the input. More...
 
unsigned int __bitalign_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Align 32 bits within 64 bits of data on an arbitrary bit boundary. More...
 
unsigned int __bytealign_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Align 32 bits within 64 bis of data on an arbitrary byte boundary. More...
 
unsigned int __lerp_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Do linear interpolation and computes the unsigned 8-bit average of packed data. More...
 
unsigned int __packcvt_u8x4_f32 (float src0, float src1, float src2, float src3) __HC__
 Takes four floating-point number, convers them to unsigned integer values, and packs them into a packed u8x4 value. More...
 
float __unpackcvt_f32_u8x4 (unsigned int src0, unsigned int src1) __HC__
 Unpacks a single element from a packed u8x4 value and converts it to an f32. More...
 
unsigned int __sadhi_u16x2_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 This function is mostly the same as sad except the sum of absolute differences is added to the most significant 16 bits of the result. More...
 
uint64_t __clock_u64 () __HC__
 Get system timestamp.
 
uint64_t __cycle_u64 () __HC__
 Get hardware cycle count. More...
 
unsigned int __activelaneid_u32 () __HC__
 Get the count of the number of earlier (in flattened work-item order) active work-items within the same wavefront. More...
 
uint64_t __activelanemask_v4_b64_b1 (unsigned int input) __HC__
 Return a bit mask shows which active work-items in the wavefront have a non-zero input. More...
 
unsigned int __activelanecount_u32_b1 (unsigned int input) __HC__
 Count the number of active work-items in the current wavefront that have a non-zero input. More...
 
int __any (int predicate) __HC__
 Evaluate predicate for all active work-items in the wavefront and return non-zero if and only if predicate evaluates to non-zero for all of them.
 
int __all (int predicate) __HC__
 Evaluate predicate for all active work-items in the wavefront and return non-zero if and only if predicate evaluates to non-zero for any of them.
 
uint64_t __ballot (int predicate) __HC__
 Evaluate predicate for all active work-items in the wavefront and return an integer whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth work-item of the wavefront and the Nth work-item is active.
 
unsigned int __shfl_xor (unsigned int var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 
unsigned int __mul24 (unsigned int x, unsigned int y)[[hc]]
 Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication. More...
 
int __mul24 (int x, int y)[[hc]]
 Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication. More...
 
unsigned int __mad24 (unsigned int x, unsigned int y, unsigned int z)[[hc]]
 Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit unsigned integer. More...
 
int __mad24 (int x, int y, int z)[[hc]]
 Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit integer. More...
 
void abort () __HC__
 
unsigned int get_group_segment_size () __HC__
 Fetch the size of group segment. More...
 
unsigned int get_static_group_segment_size () __HC__
 Fetch the size of static group segment. More...
 
void * get_group_segment_base_pointer () __HC__
 Fetch the address of the beginning of group segment.
 
void * get_dynamic_group_segment_base_pointer () __HC__
 Fetch the address of the beginning of dynamic group segment.
 
void all_memory_fence (const tile_barrier &) __HC__
 Establishes a thread-tile scoped memory fence for both global and tile-static memory operations. More...
 
void global_memory_fence (const tile_barrier &) __HC__
 Establishes a thread-tile scoped memory fence for global (but not tile-static) memory operations. More...
 
void tile_static_memory_fence (const tile_barrier &) __HC__
 Establishes a thread-tile scoped memory fence for tile-static (but not global) memory operations. More...
 
template<int N>
const extent< N > & check (const extent< N > &ext)
 
template<typename T , int N>
void copy (const array< T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename OutputIter , typename T , int N>
void copy (const array_view< T, N > &src, OutputIter destBegin)
 The contents of a source array are copied into "dest" starting with iterator destBegin. More...
 
template<typename OutputIter , typename T , int N>
void copy (const array< T, N > &src, OutputIter destBegin)
 The contents of a source array are copied into "dest" starting with iterator destBegin. More...
 
template<typename T , int N>
completion_future copy_async (const array< T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
completion_future copy_async (const array< T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename OutputIter , typename T , int N>
completion_future copy_async (const array< T, N > &src, OutputIter destBegin)
 The contents of a source array are copied into "dest" starting with iterator destBegin. More...
 
template<typename OutputIter , typename T , int N>
completion_future copy_async (const array_view< T, N > &src, OutputIter destBegin)
 The contents of a source array are copied into "dest" starting with iterator destBegin. More...
 
template<typename T , int N>
completion_future copy_async (const array< T, N > &src, const array< T, N > &dest)
 
template<typename T , int N>
completion_future copy_async (const array_view< const T, N > &src, const array< T, N > &dest)
 
template<typename T , int N>
completion_future copy_async (const array_view< T, N > &src, const array< T, N > &dest)
 
unsigned int __atomic_wrapinc (unsigned int *address, unsigned int val) __HC__
 Atomically do the following operations: More...
 
unsigned int __atomic_wrapdec (unsigned int *address, unsigned int val) __HC__
 Atomically do the following operations: More...
 
template<int N, typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const extent< N > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 3 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 2 > &, const Kernel &)
 
template<typename Kernel >
completion_future parallel_for_each (const accelerator_view &, const tiled_extent< 1 > &, const Kernel &)
 
template<int N, typename Kernel >
completion_future parallel_for_each (const extent< N > &compute_domain, const Kernel &f)
 
template<typename Kernel >
completion_future parallel_for_each (const tiled_extent< 3 > &compute_domain, const Kernel &f)
 
template<typename Kernel >
completion_future parallel_for_each (const tiled_extent< 2 > &compute_domain, const Kernel &f)
 
template<typename Kernel >
completion_future parallel_for_each (const tiled_extent< 1 > &compute_domain, const Kernel &f)
 
template<int N, typename Kernel >
 __attribute__ ((noinline, used)) completion_future parallel_for_each(const accelerator_view &av
 
 if (av.get_accelerator().get_device_path()==L"cpu")
 
return completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< pfe_wrapper< N, Kernel >, 3 >(av.pQueue, ext, NULL, _pf))
 
template<typename Kernel >
 __attribute__ ((noinline, used)) completion_future parallel_for_each(const accelerator_view &av
 
 if (compute_domain[0]< 0)
 
 if (static_cast< size_t >(compute_domain[0]) > 4294967295L) throw invalid_compute_domain("Extent size too large.")
 
return completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 1 >(av.pQueue,&ext, NULL, f))
 
 if (static_cast< size_t >(compute_domain[1]) > 4294967295L) throw invalid_compute_domain("Extent size too large.")
 
return completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 2 >(av.pQueue, ext, NULL, f))
 
 if (static_cast< size_t >(compute_domain[2]) > 4294967295L) throw invalid_compute_domain("Extent size too large.")
 
return completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 3 >(av.pQueue, ext, NULL, f))
 
return completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 1 >(av.pQueue,&ext,&tile, f, kernel, compute_domain.get_dynamic_group_segment_size()))
 
return completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 2 >(av.pQueue, ext, tile, f, kernel, compute_domain.get_dynamic_group_segment_size()))
 
return completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 3 >(av.pQueue, ext, tile, f, kernel, compute_domain.get_dynamic_group_segment_size()))
 
auto_voidp am_aligned_alloc (std::size_t size, hc::accelerator &acc, unsigned flags, std::size_t alignment=0)
 Allocate a block of size bytes of memory on the specified acc. More...
 
auto_voidp am_alloc (std::size_t size, hc::accelerator &acc, unsigned flags)
 Allocate a block of size bytes of memory on the specified acc. More...
 
am_status_t am_free (void *ptr)
 Free a block of memory previously allocated with am_alloc. More...
 
 while (str[size]!='\0') size++
 
template<class T , class U >
bool operator== (const am_allocator< T > &, const am_allocator< U > &)
 
template<class T , class U >
bool operator!= (const am_allocator< T > &, const am_allocator< U > &)
 
template<int N>
extent< N > operator+ (const extent< N > &lhs, const extent< N > &rhs) __CPU__ __HC__
 Adds (or subtracts) two objects of extent<N> to form a new extent. More...
 
template<int N>
extent< N > operator- (const extent< N > &lhs, const extent< N > &rhs) __CPU__ __HC__
 Adds (or subtracts) two objects of extent<N> to form a new extent. More...
 
template<int N>
extent< N > operator+ (const extent< N > &ext, int value) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator+ (int value, const extent< N > &ext) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator- (const extent< N > &ext, int value) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator- (int value, const extent< N > &ext) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator* (const extent< N > &ext, int value) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator* (int value, const extent< N > &ext) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator/ (const extent< N > &ext, int value) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator/ (int value, const extent< N > &ext) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator% (const extent< N > &ext, int value) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
template<int N>
extent< N > operator% (int value, const extent< N > &ext) __CPU__ __HC__
 Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More...
 
unsigned int __bitextract_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Extract a range of bits. More...
 
uint64_t __bitextract_u64 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__
 Extract a range of bits. More...
 
int __bitextract_s32 (int src0, unsigned int src1, unsigned int src2) __HC__
 Extract a range of bits. More...
 
int64_t __bitextract_s64 (int64_t src0, unsigned int src1, unsigned int src2) __HC__
 Extract a range of bits. More...
 
unsigned int __bitinsert_u32 (unsigned int src0, unsigned int src1, unsigned int src2, unsigned int src3) __HC__
 Replace a range of bits. More...
 
uint64_t __bitinsert_u64 (uint64_t src0, uint64_t src1, unsigned int src2, unsigned int src3) __HC__
 Replace a range of bits. More...
 
int __bitinsert_s32 (int src0, int src1, unsigned int src2, unsigned int src3) __HC__
 Replace a range of bits. More...
 
int64_t __bitinsert_s64 (int64_t src0, int64_t src1, unsigned int src2, unsigned int src3) __HC__
 Replace a range of bits. More...
 
unsigned int __bitmask_b32 (unsigned int src0, unsigned int src1) __HC__
 Create a bit mask that can be used with bitselect. More...
 
uint64_t __bitmask_b64 (unsigned int src0, unsigned int src1) __HC__
 Create a bit mask that can be used with bitselect. More...
 
unsigned int __bitrev_b32 (unsigned int src0)[[hc]] __asm("llvm.bitreverse.i32")
 Reverse the bits. More...
 
uint64_t __bitrev_b64 (uint64_t src0)[[hc]] __asm("llvm.bitreverse.i64")
 Reverse the bits. More...
 
unsigned int __bitselect_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Do bit field selection. More...
 
uint64_t __bitselect_b64 (uint64_t src0, uint64_t src1, uint64_t src2) __HC__
 Do bit field selection. More...
 
unsigned int __lastbit_u32_u32 (unsigned int input) __HC__
 Find the first bit set to 1 in a number starting from the least significant bit. More...
 
unsigned int __lastbit_u32_u64 (unsigned long long int input) __HC__
 Find the first bit set to 1 in a number starting from the least significant bit. More...
 
unsigned int __lastbit_u32_s32 (int input) __HC__
 Find the first bit set to 1 in a number starting from the least significant bit. More...
 
unsigned int __lastbit_u32_s64 (unsigned long long input) __HC__
 Find the first bit set to 1 in a number starting from the least significant bit. More...
 
unsigned int __unpacklo_u8x4 (unsigned int src0, unsigned int src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
uint64_t __unpacklo_u8x8 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
unsigned int __unpacklo_u16x2 (unsigned int src0, unsigned int src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
uint64_t __unpacklo_u16x4 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
uint64_t __unpacklo_u32x2 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
int __unpacklo_s8x4 (int src0, int src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
int64_t __unpacklo_s8x8 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
int __unpacklo_s16x2 (int src0, int src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
int64_t __unpacklo_s16x4 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
int64_t __unpacklo_s32x2 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the lower half of the elements from each source into the desitionation. More...
 
unsigned int __unpackhi_u8x4 (unsigned int src0, unsigned int src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
uint64_t __unpackhi_u8x8 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
unsigned int __unpackhi_u16x2 (unsigned int src0, unsigned int src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
uint64_t __unpackhi_u16x4 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
uint64_t __unpackhi_u32x2 (uint64_t src0, uint64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
int __unpackhi_s8x4 (int src0, int src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
int64_t __unpackhi_s8x8 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
int __unpackhi_s16x2 (int src0, int src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
int64_t __unpackhi_s16x4 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
int64_t __unpackhi_s32x2 (int64_t src0, int64_t src1) __HC__
 Copy and interleave the upper half of the elements from each source into the desitionation. More...
 
unsigned int __pack_u8x4_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
uint64_t __pack_u8x8_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
unsigned __pack_u16x2_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
uint64_t __pack_u16x4_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
uint64_t __pack_u32x2_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
int __pack_s8x4_s32 (int src0, int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
int64_t __pack_s8x8_s32 (int64_t src0, int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
int __pack_s16x2_s32 (int src0, int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
int64_t __pack_s16x4_s32 (int64_t src0, int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
int64_t __pack_s32x2_s32 (int64_t src0, int src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
double __pack_f32x2_f32 (double src0, float src1, unsigned int src2) __HC__
 Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More...
 
unsigned int __unpack_u32_u8x4 (unsigned int src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
unsigned int __unpack_u32_u8x8 (uint64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
unsigned int __unpack_u32_u16x2 (unsigned int src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
unsigned int __unpack_u32_u16x4 (uint64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
unsigned int __unpack_u32_u32x2 (uint64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
int __unpack_s32_s8x4 (int src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
int __unpack_s32_s8x8 (int64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
int __unpack_s32_s16x2 (int src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
int __unpack_s32_s16x4 (int64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
int __unpack_s32_s3x2 (int64_t src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
float __unpack_f32_f32x2 (double src0, unsigned int src1) __HC__
 Assign the elements specified by src1 from the packed value in src0. More...
 
unsigned int __sad_u32_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More...
 
unsigned int __sad_u32_u16x2 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More...
 
unsigned int __sad_u32_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__
 Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More...
 
int __amdgcn_mbcnt_lo (int mask, int src)[[hc]] __asm("llvm.amdgcn.mbcnt.lo")
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_mbcnt_hi (int mask, int src)[[hc]] __asm("llvm.amdgcn.mbcnt.hi")
 Direct copy from indexed active work-item within a wavefront. More...
 
int __lane_id (void)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_ds_bpermute (int index, int src)[[hc]] __asm("llvm.amdgcn.ds.bpermute")
 ds_bpermute intrinsic FIXME: We need to add __builtin_amdgcn_ds_bpermute to clang and call it here instead.
 
unsigned int __amdgcn_ds_bpermute (int index, unsigned int src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_ds_bpermute (int index, float src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_ds_permute (int index, int src)[[hc]]
 ds_permute intrinsic
 
unsigned int __amdgcn_ds_permute (int index, unsigned int src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_ds_permute (int index, float src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_ds_swizzle (int src, int pattern)[[hc]]
 ds_swizzle intrinsic
 
unsigned int __amdgcn_ds_swizzle (unsigned int src, int pattern)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_ds_swizzle (float src, int pattern)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_move_dpp (int src, int dpp_ctrl, int row_mask, int bank_mask, bool bound_ctrl)[[hc]]
 move DPP intrinsic
 
int __amdgcn_wave_sr1 (int src, bool bound_ctrl)[[hc]]
 Shift the value of src to the right by one thread within a wavefront. More...
 
unsigned int __amdgcn_wave_sr1 (unsigned int src, bool bound_ctrl)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_wave_sr1 (float src, bool bound_ctrl)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_wave_sl1 (int src, bool bound_ctrl)[[hc]]
 Shift the value of src to the left by one thread within a wavefront. More...
 
unsigned int __amdgcn_wave_sl1 (unsigned int src, bool bound_ctrl)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_wave_sl1 (float src, bool bound_ctrl)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_wave_rr1 (int src)[[hc]]
 Rotate the value of src to the right by one thread within a wavefront. More...
 
unsigned int __amdgcn_wave_rr1 (unsigned int src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_wave_rr1 (float src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __amdgcn_wave_rl1 (int src)[[hc]]
 Rotate the value of src to the left by one thread within a wavefront. More...
 
unsigned int __amdgcn_wave_rl1 (unsigned int src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
float __amdgcn_wave_rl1 (float src)[[hc]]
 Direct copy from indexed active work-item within a wavefront. More...
 
int __shfl (int var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Direct copy from indexed active work-item within a wavefront. More...
 
unsigned int __shfl (unsigned int var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Direct copy from indexed active work-item within a wavefront. More...
 
float __shfl (float var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Direct copy from indexed active work-item within a wavefront. More...
 
int __shfl_up (int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with lower ID relative to caller within a wavefront. More...
 
unsigned int __shfl_up (unsigned int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with lower ID relative to caller within a wavefront. More...
 
float __shfl_up (float var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with lower ID relative to caller within a wavefront. More...
 
int __shfl_down (int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with higher ID relative to caller within a wavefront. More...
 
unsigned int __shfl_down (unsigned int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with higher ID relative to caller within a wavefront. More...
 
float __shfl_down (float var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item with higher ID relative to caller within a wavefront. More...
 
int __shfl_xor (int var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront. More...
 
float __shfl_xor (float var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__
 Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront. More...
 
template<typename T , int N>
void copy (const array_view< const T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
void copy (const array_view< T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T >
void copy (const array_view< const T, 1 > &src, const array_view< T, 1 > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
void copy (const array< T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T >
void copy (const array< T, 1 > &src, const array_view< T, 1 > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
void copy (const array_view< const T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
void copy (const array_view< T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T >
void copy (const array_view< const T, 1 > &src, array< T, 1 > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
void copy (InputIter srcBegin, InputIter srcEnd, const array_view< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
void copy (InputIter srcBegin, const array_view< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
void copy (InputIter srcBegin, InputIter srcEnd, array< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
void copy (InputIter srcBegin, array< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename T , int N>
completion_future copy_async (const array_view< const T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
completion_future copy_async (const array_view< T, N > &src, array< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
completion_future copy_async (const array_view< const T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename T , int N>
completion_future copy_async (const array_view< T, N > &src, const array_view< T, N > &dest)
 The contents of "src" are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
completion_future copy_async (InputIter srcBegin, InputIter srcEnd, array< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
completion_future copy_async (InputIter srcBegin, array< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
completion_future copy_async (InputIter srcBegin, InputIter srcEnd, const array_view< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
template<typename InputIter , typename T , int N>
completion_future copy_async (InputIter srcBegin, const array_view< T, N > &dest)
 The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More...
 
unsigned int atomic_exchange (unsigned int *dest, unsigned int val) __CPU__ __HC__
 Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More...
 
int atomic_exchange (int *dest, int val) __CPU__ __HC__
 Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More...
 
float atomic_exchange (float *dest, float val) __CPU__ __HC__
 Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More...
 
uint64_t atomic_exchange (uint64_t *dest, uint64_t val) __CPU__ __HC__
 Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More...
 
bool atomic_compare_exchange (unsigned int *dest, unsigned int *expected_val, unsigned int val) __CPU__ __HC__
 These functions attempt to perform these three steps atomically: More...
 
bool atomic_compare_exchange (int *dest, int *expected_val, int val) __CPU__ __HC__
 These functions attempt to perform these three steps atomically: More...
 
bool atomic_compare_exchange (uint64_t *dest, uint64_t *expected_val, uint64_t val) __CPU__ __HC__
 These functions attempt to perform these three steps atomically: More...
 
unsigned atomic_fetch_add (unsigned *x, unsigned y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_add (int *x, int y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
float atomic_fetch_add (float *x, float y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_add (uint64_t *x, uint64_t y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned atomic_fetch_sub (unsigned *x, unsigned y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_sub (int *x, int y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
float atomic_fetch_sub (float *x, float y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned atomic_fetch_and (unsigned *x, unsigned y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_and (int *x, int y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_and (uint64_t *x, uint64_t y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned atomic_fetch_or (unsigned *x, unsigned y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_or (int *x, int y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_or (uint64_t *x, uint64_t y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned atomic_fetch_xor (unsigned *x, unsigned y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_xor (int *x, int y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_xor (uint64_t *x, uint64_t y) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_max (int *dest, int val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned int atomic_fetch_max (unsigned int *dest, unsigned int val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_max (uint64_t *dest, uint64_t val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_min (int *dest, int val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
unsigned int atomic_fetch_min (unsigned int *dest, unsigned int val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
uint64_t atomic_fetch_min (uint64_t *dest, uint64_t val) __CPU__ __HC__
 Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More...
 
int atomic_fetch_inc (int *_Dest) __CPU__ __HC__
 Atomically increment or decrement the value stored at the location point to by dest. More...
 
unsigned int atomic_fetch_inc (unsigned int *_Dest) __CPU__ __HC__
 Atomically increment or decrement the value stored at the location point to by dest. More...
 
int atomic_fetch_dec (int *_Dest) __CPU__ __HC__
 Atomically increment or decrement the value stored at the location point to by dest. More...
 
unsigned int atomic_fetch_dec (unsigned int *_Dest) __CPU__ __HC__
 Atomically increment or decrement the value stored at the location point to by dest. More...
 

Variables

const extent< N > & compute_domain
 
const extent< N > const Kernel &f __CPU__ __HC__
 
const pfe_wrapper< N, Kernel > _pf (compute_domain, f)
 
size_t ext = compute_domain[0]
 
size_t tile = compute_domain.tile_dim[0]
 
void * kernel = Kalmar::mcw_cxxamp_get_kernel<Kernel>(av.pQueue, f)
 
PrintfPacketprintf_buffer
 
static unsigned int cpu
 
return size
 

Detailed Description

Heterogeneous C++ (HC) namespace.

Function Documentation

unsigned int hc::__activelanecount_u32_b1 ( unsigned int  input)
inline

Count the number of active work-items in the current wavefront that have a non-zero input.

Parameters
[in]inputAn unsigned 32-bit integer.
Returns
The number of active work-items in the current wavefront that have a non-zero input.

References __activelanemask_v4_b64_b1(), and __popcount_u32_b64().

Referenced by __all().

unsigned int hc::__activelaneid_u32 ( )

Get the count of the number of earlier (in flattened work-item order) active work-items within the same wavefront.

Returns
The result will be in the range 0 to WAVESIZE - 1.

Referenced by __lastbit_u32_s64().

uint64_t hc::__activelanemask_v4_b64_b1 ( unsigned int  input)

Return a bit mask shows which active work-items in the wavefront have a non-zero input.

The affected bit position within the registers of dest corresponds to each work-item's lane ID.

The HSAIL instruction would return 4 64-bit registers but the current implementation would only return the 1st one and ignore the other 3 as right now all HSA agents have wavefront of size 64.

Parameters
[in]inputAn unsigned 32-bit integer.
Returns
The bitmask calculated.

Referenced by __activelanecount_u32_b1(), __all(), __any(), __ballot(), and __lastbit_u32_s64().

unsigned int hc::__amdgcn_ds_bpermute ( int  index,
unsigned int  src 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute().

float hc::__amdgcn_ds_bpermute ( int  index,
float  src 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_permute().

Referenced by __amdgcn_ds_bpermute(), __lane_id(), __shfl(), __shfl_down(), __shfl_up(), and __shfl_xor().

unsigned int hc::__amdgcn_ds_permute ( int  index,
unsigned int  src 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_permute().

float hc::__amdgcn_ds_permute ( int  index,
float  src 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_swizzle().

Referenced by __amdgcn_ds_bpermute(), and __amdgcn_ds_permute().

unsigned int hc::__amdgcn_ds_swizzle ( unsigned int  src,
int  pattern 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_swizzle().

float hc::__amdgcn_ds_swizzle ( float  src,
int  pattern 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_move_dpp(), and __amdgcn_wave_sr1().

Referenced by __amdgcn_ds_permute(), and __amdgcn_ds_swizzle().

int hc::__amdgcn_mbcnt_hi ( int  mask,
int  src 
)

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __lane_id().

int hc::__amdgcn_mbcnt_lo ( int  mask,
int  src 
)

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __lane_id().

int hc::__amdgcn_wave_rl1 ( int  src)

Rotate the value of src to the left by one thread within a wavefront.

Parameters
[in]srcvariable being rotated
Returns
value of src being rotated into from the neighboring lane
unsigned int hc::__amdgcn_wave_rl1 ( unsigned int  src)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_rl1().

float hc::__amdgcn_wave_rl1 ( float  src)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __amdgcn_wave_rl1(), and __amdgcn_wave_rr1().

int hc::__amdgcn_wave_rr1 ( int  src)

Rotate the value of src to the right by one thread within a wavefront.

Parameters
[in]srcvariable being rotated
Returns
value of src being rotated into from the neighboring lane
unsigned int hc::__amdgcn_wave_rr1 ( unsigned int  src)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_rr1().

float hc::__amdgcn_wave_rr1 ( float  src)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_rl1().

Referenced by __amdgcn_wave_rr1(), and __amdgcn_wave_sl1().

int hc::__amdgcn_wave_sl1 ( int  src,
bool  bound_ctrl 
)

Shift the value of src to the left by one thread within a wavefront.

Parameters
[in]srcvariable being shifted
[in]bound_ctrlWhen set to true, a zero will be shifted into thread 63; otherwise, the original value will be returned for thread 63
Returns
value of src being shifted into from the neighboring lane
unsigned int hc::__amdgcn_wave_sl1 ( unsigned int  src,
bool  bound_ctrl 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_sl1().

float hc::__amdgcn_wave_sl1 ( float  src,
bool  bound_ctrl 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_rr1().

Referenced by __amdgcn_wave_sl1(), and __amdgcn_wave_sr1().

int hc::__amdgcn_wave_sr1 ( int  src,
bool  bound_ctrl 
)

Shift the value of src to the right by one thread within a wavefront.

Parameters
[in]srcvariable being shifted
[in]bound_ctrlWhen set to true, a zero will be shifted into thread 0; otherwise, the original value will be returned for thread 0
Returns
value of src being shifted into from the neighboring lane
unsigned int hc::__amdgcn_wave_sr1 ( unsigned int  src,
bool  bound_ctrl 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_sr1().

float hc::__amdgcn_wave_sr1 ( float  src,
bool  bound_ctrl 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_wave_sl1().

Referenced by __amdgcn_ds_swizzle(), and __amdgcn_wave_sr1().

unsigned int hc::__atomic_wrapdec ( unsigned int *  address,
unsigned int  val 
)

Atomically do the following operations:

  • reads the 32-bit value (original) from address pointer in global or group segment
  • computes ((original == 0) || (original > val)) ? val : (original - 1)
  • stores the result back to the address
Returns
The original value retrieved from address pointer.

Please refer to atomic_wrapdec in HSA PRM 6.6 for more detailed specification of the function.

Referenced by copy_async().

unsigned int hc::__atomic_wrapinc ( unsigned int *  address,
unsigned int  val 
)

Atomically do the following operations:

  • reads the 32-bit value (original) from address pointer in global or group segment
  • computes ((original >= val) ? 0 : (original + 1))
  • stores the result back to the address
Returns
The original value retrieved from address pointer.

Please refer to atomic_wrapinc in HSA PRM 6.6 for more detailed specification of the function.

Referenced by copy_async().

unsigned int hc::__bitalign_b32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Align 32 bits within 64 bits of data on an arbitrary bit boundary.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

int hc::__bitextract_s32 ( int  src0,
unsigned int  src1,
unsigned int  src2 
)

Extract a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

int64_t hc::__bitextract_s64 ( int64_t  src0,
unsigned int  src1,
unsigned int  src2 
)

Extract a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bitextract_u32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)
inline
uint64_t hc::__bitextract_u64 ( uint64_t  src0,
unsigned int  src1,
unsigned int  src2 
)

Extract a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

int hc::__bitinsert_s32 ( int  src0,
int  src1,
unsigned int  src2,
unsigned int  src3 
)

Replace a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

int64_t hc::__bitinsert_s64 ( int64_t  src0,
int64_t  src1,
unsigned int  src2,
unsigned int  src3 
)

Replace a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bitinsert_u32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2,
unsigned int  src3 
)

Replace a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

uint64_t hc::__bitinsert_u64 ( uint64_t  src0,
uint64_t  src1,
unsigned int  src2,
unsigned int  src3 
)

Replace a range of bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bitmask_b32 ( unsigned int  src0,
unsigned int  src1 
)

Create a bit mask that can be used with bitselect.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

uint64_t hc::__bitmask_b64 ( unsigned int  src0,
unsigned int  src1 
)

Create a bit mask that can be used with bitselect.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bitrev_b32 ( unsigned int  src0)

Reverse the bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

uint64_t hc::__bitrev_b64 ( uint64_t  src0)

Reverse the bits.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bitselect_b32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Do bit field selection.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

uint64_t hc::__bitselect_b64 ( uint64_t  src0,
uint64_t  src1,
uint64_t  src2 
)

Do bit field selection.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __bitextract_u32().

unsigned int hc::__bytealign_b32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Align 32 bits within 64 bis of data on an arbitrary byte boundary.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

uint64_t hc::__cycle_u64 ( )

Get hardware cycle count.

Notice the return value of this function is implementation defined.

Referenced by __lastbit_u32_s64().

unsigned int hc::__firstbit_u32_s32 ( int  input)
inline

Count leading zero bits in the input.

Parameters
[in]inputAn signed 32-bit integer.
Returns
Finds the first bit set in a positive integer starting from the most significant bit, or finds the first bit clear in a negative integer from the most significant bit. If no bits in the input are set, then dest is set to -1.

References __firstbit_u32_u32().

unsigned int hc::__firstbit_u32_s64 ( long long int  input)
inline

Count leading zero bits in the input.

Parameters
[in]inputAn signed 64-bit integer.
Returns
Finds the first bit set in a positive integer starting from the most significant bit, or finds the first bit clear in a negative integer from the most significant bit. If no bits in the input are set, then dest is set to -1.

References __firstbit_u32_u64().

unsigned int hc::__firstbit_u32_u32 ( unsigned int  input)
inline

Count leading zero bits in the input.

Parameters
[in]inputAn unsigned 32-bit integer.
Returns
Number of 0 bits until a 1 bit is found, counting start from the most significant bit. -1 if there is no 0 bit.

Referenced by __firstbit_u32_s32().

unsigned int hc::__firstbit_u32_u64 ( unsigned long long int  input)
inline

Count leading zero bits in the input.

Parameters
[in]inputAn unsigned 64-bit integer.
Returns
Number of 0 bits until a 1 bit is found, counting start from the most significant bit. -1 if there is no 0 bit.

Referenced by __firstbit_u32_s64().

int hc::__lane_id ( void  )
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute(), __amdgcn_mbcnt_hi(), and __amdgcn_mbcnt_lo().

Referenced by __shfl(), __shfl_down(), __shfl_up(), and __shfl_xor().

unsigned int hc::__lastbit_u32_s32 ( int  input)
inline

Find the first bit set to 1 in a number starting from the least significant bit.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

References __lastbit_u32_u32().

unsigned int hc::__lastbit_u32_s64 ( unsigned long long  input)
inline
unsigned int hc::__lastbit_u32_u32 ( unsigned int  input)
inline

Find the first bit set to 1 in a number starting from the least significant bit.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __lastbit_u32_s32().

unsigned int hc::__lastbit_u32_u64 ( unsigned long long int  input)
inline

Find the first bit set to 1 in a number starting from the least significant bit.

Please refer to HSA PRM 5.7 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__lerp_u8x4 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Do linear interpolation and computes the unsigned 8-bit average of packed data.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

unsigned int hc::__mad24 ( unsigned int  x,
unsigned int  y,
unsigned int  z 
)
inline

Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit unsigned integer.

Parameters
[in]x24-bit unsigned integer multiplier
[in]y24-bit unsigned integer multiplicand
[in]z32-bit unsigned integer to be added to the product
Returns
32-bit unsigned integer result of mad24

References __mul24().

int hc::__mad24 ( int  x,
int  y,
int  z 
)
inline

Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit integer.

Parameters
[in]x24-bit integer multiplier
[in]y24-bit integer multiplicand
[in]z32-bit integer to be added to the product
Returns
32-bit integer result of mad24

References __mul24(), get_dynamic_group_segment_base_pointer(), get_group_segment_base_pointer(), get_group_segment_size(), and get_static_group_segment_size().

unsigned int hc::__mul24 ( unsigned int  x,
unsigned int  y 
)
inline

Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication.

Parameters
[in]x24-bit unsigned integer multiplier
[in]y24-bit unsigned integer multiplicand
Returns
32-bit unsigned integer product
int hc::__mul24 ( int  x,
int  y 
)
inline

Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication.

Parameters
[in]x24-bit integer multiplier
[in]y24-bit integer multiplicand
Returns
32-bit integer product

Referenced by __mad24().

double hc::__pack_f32x2_f32 ( double  src0,
float  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__pack_s16x2_s32 ( int  src0,
int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__pack_s16x4_s32 ( int64_t  src0,
int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__pack_s32x2_s32 ( int64_t  src0,
int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__pack_s8x4_s32 ( int  src0,
int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__pack_s8x8_s32 ( int64_t  src0,
int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned hc::__pack_u16x2_u32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__pack_u16x4_u32 ( uint64_t  src0,
unsigned int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__pack_u32x2_u32 ( uint64_t  src0,
unsigned int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__pack_u8x4_u32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__pack_u8x8_u32 ( uint64_t  src0,
unsigned int  src1,
unsigned int  src2 
)

Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__packcvt_u8x4_f32 ( float  src0,
float  src1,
float  src2,
float  src3 
)

Takes four floating-point number, convers them to unsigned integer values, and packs them into a packed u8x4 value.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

unsigned int hc::__popcount_u32_b32 ( unsigned int  input)
inline

Count number of 1 bits in the input.

Parameters
[in]inputAn unsinged 32-bit integer.
Returns
Number of 1 bits in the input.
unsigned int hc::__popcount_u32_b64 ( unsigned long long int  input)
inline

Count number of 1 bits in the input.

Parameters
[in]inputAn unsinged 64-bit integer.
Returns
Number of 1 bits in the input.

Referenced by __activelanecount_u32_b1(), __all(), and __any().

unsigned int hc::__sad_u32_u16x2 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

unsigned int hc::__sad_u32_u32 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

unsigned int hc::__sad_u32_u8x4 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

unsigned int hc::__sadhi_u16x2_u8x4 ( unsigned int  src0,
unsigned int  src1,
unsigned int  src2 
)

This function is mostly the same as sad except the sum of absolute differences is added to the most significant 16 bits of the result.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

int hc::__shfl ( int  var,
int  srcLane,
int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute(), and __lane_id().

unsigned int hc::__shfl ( unsigned int  var,
int  srcLane,
int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __shfl().

float hc::__shfl ( float  var,
int  srcLane,
int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Direct copy from indexed active work-item within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __shfl().

int hc::__shfl_down ( int  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with higher ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute(), and __lane_id().

unsigned int hc::__shfl_down ( unsigned int  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with higher ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __shfl_down().

float hc::__shfl_down ( float  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with higher ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __shfl_down().

int hc::__shfl_up ( int  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with lower ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute(), and __lane_id().

unsigned int hc::__shfl_up ( unsigned int  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with lower ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __shfl_up().

float hc::__shfl_up ( float  var,
const unsigned int  delta,
const int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item with lower ID relative to caller within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

Referenced by __shfl_up().

int hc::__shfl_xor ( int  var,
int  laneMask,
int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

THe function calculates a source work-item ID by performing a bitwise XOR of the caller's work-item ID with laneMask: the value of var held by the resulting work-item ID is returned.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __amdgcn_ds_bpermute(), __lane_id(), and __shfl_xor().

Referenced by __shfl_xor().

float hc::__shfl_xor ( float  var,
int  laneMask,
int  width = __HSA_WAVEFRONT_SIZE__ 
)
inline

Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront.

Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.

THe function calculates a source work-item ID by performing a bitwise XOR of the caller's work-item ID with laneMask: the value of var held by the resulting work-item ID is returned.

The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.

References __HSA_WAVEFRONT_SIZE__, and __shfl_xor().

float hc::__unpack_f32_f32x2 ( double  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpack_s32_s16x2 ( int  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpack_s32_s16x4 ( int64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpack_s32_s3x2 ( int64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpack_s32_s8x4 ( int  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpack_s32_s8x8 ( int64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpack_u32_u16x2 ( unsigned int  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpack_u32_u16x4 ( uint64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpack_u32_u32x2 ( uint64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpack_u32_u8x4 ( unsigned int  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpack_u32_u8x8 ( uint64_t  src0,
unsigned int  src1 
)

Assign the elements specified by src1 from the packed value in src0.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

float hc::__unpackcvt_f32_u8x4 ( unsigned int  src0,
unsigned int  src1 
)

Unpacks a single element from a packed u8x4 value and converts it to an f32.

Please refer to HSA PRM 5.15 for more detailed specification.

Referenced by __lastbit_u32_s64().

int hc::__unpackhi_s16x2 ( int  src0,
int  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpackhi_s16x4 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpackhi_s32x2 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpackhi_s8x4 ( int  src0,
int  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpackhi_s8x8 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpackhi_u16x2 ( unsigned int  src0,
unsigned int  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpackhi_u16x4 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpackhi_u32x2 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpackhi_u8x4 ( unsigned int  src0,
unsigned int  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpackhi_u8x8 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the upper half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpacklo_s16x2 ( int  src0,
int  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpacklo_s16x4 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpacklo_s32x2 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int hc::__unpacklo_s8x4 ( int  src0,
int  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

int64_t hc::__unpacklo_s8x8 ( int64_t  src0,
int64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpacklo_u16x2 ( unsigned int  src0,
unsigned int  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpacklo_u16x4 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpacklo_u32x2 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

unsigned int hc::__unpacklo_u8x4 ( unsigned int  src0,
unsigned int  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

uint64_t hc::__unpacklo_u8x8 ( uint64_t  src0,
uint64_t  src1 
)

Copy and interleave the lower half of the elements from each source into the desitionation.

Please refer to HSA PRM 5.9 for more detailed specification of these functions.

Referenced by __lastbit_u32_s64().

void hc::all_memory_fence ( const tile_barrier )

Establishes a thread-tile scoped memory fence for both global and tile-static memory operations.

This function does not imply a barrier and is therefore permitted in divergent code.

Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().

auto_voidp hc::am_aligned_alloc ( std::size_t  size,
hc::accelerator acc,
unsigned  flags,
std::size_t  alignment = 0 
)

Allocate a block of size bytes of memory on the specified acc.

The contents of the newly allocated block of memory are not initialized.

If size == 0, 0 is returned.

Flags: amHostPinned : Allocated pinned host memory and map it into the address space of the specified accelerator.

Returns
: On success, pointer to the newly allocated memory is returned. The pointer is typecast to the desired return type.

If an error occurred trying to allocate the requested memory, 0 is returned.

See also
am_free, am_copy
auto_voidp hc::am_alloc ( std::size_t  size,
hc::accelerator acc,
unsigned  flags 
)

Allocate a block of size bytes of memory on the specified acc.

The contents of the newly allocated block of memory are not initialized.

If size == 0, 0 is returned.

Flags: amHostPinned : Allocated pinned host memory and map it into the address space of the specified accelerator.

Returns
: On success, pointer to the newly allocated memory is returned. The pointer is typecast to the desired return type.

If an error occurred trying to allocate the requested memory, 0 is returned.

See also
am_free, am_copy
am_status_t hc::am_free ( void *  ptr)

Free a block of memory previously allocated with am_alloc.

Returns
AM_SUCCESS
See also
am_alloc, am_copy
bool hc::atomic_compare_exchange ( unsigned int *  dest,
unsigned int *  expected_val,
unsigned int  val 
)

These functions attempt to perform these three steps atomically:

  1. Read the value stored in the location pointed to by dest
  2. Compare the value read in the previous step with the value contained in the location pointed by expected_val
  3. Carry the following operations depending on the result of the comparison of the previous step: a. If the values are identical, then the function tries to atomically change the value pointed by dest to the value in val. The function indicates by its return value whether this transformation has been successful or not. b. If the values are not identical, then the function stores the value read in step (1) into the location pointed to by expected_val, and returns false.
Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[out]expected_valA pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest.
[in]valThe new value to be stored in the location pointed to be dest
Returns
The return value indicates whether the function has been successful in atomically reading, comparing and modifying the contents of the memory location.
bool hc::atomic_compare_exchange ( int *  dest,
int *  expected_val,
int  val 
)

These functions attempt to perform these three steps atomically:

  1. Read the value stored in the location pointed to by dest
  2. Compare the value read in the previous step with the value contained in the location pointed by expected_val
  3. Carry the following operations depending on the result of the comparison of the previous step: a. If the values are identical, then the function tries to atomically change the value pointed by dest to the value in val. The function indicates by its return value whether this transformation has been successful or not. b. If the values are not identical, then the function stores the value read in step (1) into the location pointed to by expected_val, and returns false.
Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[out]expected_valA pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest.
[in]valThe new value to be stored in the location pointed to be dest
Returns
The return value indicates whether the function has been successful in atomically reading, comparing and modifying the contents of the memory location.
bool hc::atomic_compare_exchange ( uint64_t *  dest,
uint64_t *  expected_val,
uint64_t  val 
)

These functions attempt to perform these three steps atomically:

  1. Read the value stored in the location pointed to by dest
  2. Compare the value read in the previous step with the value contained in the location pointed by expected_val
  3. Carry the following operations depending on the result of the comparison of the previous step: a. If the values are identical, then the function tries to atomically change the value pointed by dest to the value in val. The function indicates by its return value whether this transformation has been successful or not. b. If the values are not identical, then the function stores the value read in step (1) into the location pointed to by expected_val, and returns false.
Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[out]expected_valA pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest.
[in]valThe new value to be stored in the location pointed to be dest
Returns
The return value indicates whether the function has been successful in atomically reading, comparing and modifying the contents of the memory location.

Referenced by copy_async().

unsigned int hc::atomic_exchange ( unsigned int *  dest,
unsigned int  val 
)

Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.

This function provides overloads for int , unsigned int and float parameters.

Parameters
[out]destA pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable.
[in]valThe new value to be stored in the location pointed to be dest
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_exchange ( int *  dest,
int  val 
)

Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.

This function provides overloads for int , unsigned int and float parameters.

Parameters
[out]destA pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable.
[in]valThe new value to be stored in the location pointed to be dest
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
float hc::atomic_exchange ( float *  dest,
float  val 
)

Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.

This function provides overloads for int , unsigned int and float parameters.

Parameters
[out]destA pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable.
[in]valThe new value to be stored in the location pointed to be dest
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_exchange ( uint64_t *  dest,
uint64_t  val 
)

Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.

This function provides overloads for int , unsigned int and float parameters.

Parameters
[out]destA pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable.
[in]valThe new value to be stored in the location pointed to be dest
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

unsigned hc::atomic_fetch_add ( unsigned *  x,
unsigned  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_fetch_add ( int *  x,
int  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
float hc::atomic_fetch_add ( float *  x,
float  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_add ( uint64_t *  x,
uint64_t  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

unsigned hc::atomic_fetch_and ( unsigned *  x,
unsigned  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_fetch_and ( int *  x,
int  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_and ( uint64_t *  x,
uint64_t  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

int hc::atomic_fetch_dec ( int *  _Dest)

Atomically increment or decrement the value stored at the location point to by dest.

Parameters
[in,out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
unsigned int hc::atomic_fetch_dec ( unsigned int *  _Dest)

Atomically increment or decrement the value stored at the location point to by dest.

Parameters
[in,out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

int hc::atomic_fetch_inc ( int *  _Dest)

Atomically increment or decrement the value stored at the location point to by dest.

Parameters
[in,out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
unsigned int hc::atomic_fetch_inc ( unsigned int *  _Dest)

Atomically increment or decrement the value stored at the location point to by dest.

Parameters
[in,out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

int hc::atomic_fetch_max ( int *  dest,
int  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
unsigned int hc::atomic_fetch_max ( unsigned int *  dest,
unsigned int  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_max ( uint64_t *  dest,
uint64_t  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

int hc::atomic_fetch_min ( int *  dest,
int  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
unsigned int hc::atomic_fetch_min ( unsigned int *  dest,
unsigned int  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_min ( uint64_t *  dest,
uint64_t  val 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

unsigned hc::atomic_fetch_or ( unsigned *  x,
unsigned  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_fetch_or ( int *  x,
int  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_or ( uint64_t *  x,
uint64_t  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

unsigned hc::atomic_fetch_sub ( unsigned *  x,
unsigned  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_fetch_sub ( int *  x,
int  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
float hc::atomic_fetch_sub ( float *  x,
float  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

unsigned hc::atomic_fetch_xor ( unsigned *  x,
unsigned  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
int hc::atomic_fetch_xor ( int *  x,
int  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.
uint64_t hc::atomic_fetch_xor ( uint64_t *  x,
uint64_t  y 
)

Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.

In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:

*dest = *dest $\otimes$ val;

Where the operation denoted by $\otimes$ is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).

Parameters
[out]destAn pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable.
[in]valThe second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest.
Returns
These functions return the old value which was previously stored at dest, and that was atomically replaced. These functions always succeed.

Referenced by copy_async().

template<typename T , int N>
void hc::copy ( const array_view< const T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References hc::array_view< T, N >::get_extent().

template<typename T , int N>
void hc::copy ( const array_view< T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename T , int N>
void hc::copy ( const array< T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References hc::array_view< T, N >::get_extent().

template<typename T , int N>
void hc::copy ( const array< T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

The source and destination may reside on different accelerators. If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destAn object of type array<T,N> to be copied to.
template<typename T , int N>
void hc::copy ( const array_view< const T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References hc::array< T, N >::get_extent().

template<typename T , int N>
void hc::copy ( const array_view< T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename InputIter , typename T , int N>
void hc::copy ( InputIter  srcBegin,
InputIter  srcEnd,
const array_view< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array_view<T,N> to be copied to.
template<typename InputIter , typename T , int N>
void hc::copy ( InputIter  srcBegin,
InputIter  srcEnd,
array< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array<T,N> to be copied to.

References hc::array< T, N >::get_extent().

template<typename InputIter , typename T , int N>
void hc::copy ( InputIter  srcBegin,
const array_view< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array_view<T,N> to be copied to.

References copy(), and hc::array_view< T, N >::get_extent().

template<typename InputIter , typename T , int N>
void hc::copy ( InputIter  srcBegin,
array< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array<T,N> to be copied to.

References copy(), and hc::array< T, N >::get_extent().

template<typename OutputIter , typename T , int N>
void hc::copy ( const array_view< T, N > &  src,
OutputIter  destBegin 
)

The contents of a source array are copied into "dest" starting with iterator destBegin.

If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.

Parameters
[in]srcAn object of type array_view<T,N> to be copied from.
[out]destBeginAn output iterator addressing the position of the first element in the destination container.
template<typename OutputIter , typename T , int N>
void hc::copy ( const array< T, N > &  src,
OutputIter  destBegin 
)

The contents of a source array are copied into "dest" starting with iterator destBegin.

If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destBeginAn output iterator addressing the position of the first element in the destination container.
template<typename T >
void hc::copy ( const array< T, 1 > &  src,
const array_view< T, 1 > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References hc::array_view< T, N >::get_extent().

template<typename T >
void hc::copy ( const array_view< const T, 1 > &  src,
array< T, 1 > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References hc::array< T, N >::get_extent().

template<typename T >
void hc::copy ( const array_view< const T, 1 > &  src,
const array_view< T, 1 > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References hc::array_view< T, N >::get_extent().

Referenced by hc::array< T, N >::array(), copy(), copy_async(), hc::array< T, N >::copy_to(), hc::array_view< T, N >::copy_to(), hc::array_view< const T, N >::copy_to(), hc::array< T, N >::operator std::vector< T >(), hc::array< T, N >::view_as(), hc::array_view< T, N >::view_as(), and hc::array_view< const T, N >::view_as().

template<typename T , int N>
completion_future hc::copy_async ( const array< T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

The source and destination may reside on different accelerators. If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename T , int N>
completion_future hc::copy_async ( const array< T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename T , int N>
completion_future hc::copy_async ( const array_view< const T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename T , int N>
completion_future hc::copy_async ( const array_view< T, N > &  src,
array< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename T , int N>
completion_future hc::copy_async ( const array_view< const T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename T , int N>
completion_future hc::copy_async ( const array_view< T, N > &  src,
const array_view< T, N > &  dest 
)

The contents of "src" are copied into "dest".

If the extents of "src" and "dest" don't match, a runtime exception is thrown.

Parameters
[in]srcAn object of type array_view<T,N> (or array_view<const T, N>) to be copied from.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename InputIter , typename T , int N>
completion_future hc::copy_async ( InputIter  srcBegin,
InputIter  srcEnd,
array< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename InputIter , typename T , int N>
completion_future hc::copy_async ( InputIter  srcBegin,
array< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array<T,N> to be copied to.

References copy().

template<typename InputIter , typename T , int N>
completion_future hc::copy_async ( InputIter  srcBegin,
InputIter  srcEnd,
const array_view< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename InputIter , typename T , int N>
completion_future hc::copy_async ( InputIter  srcBegin,
const array_view< T, N > &  dest 
)

The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".

If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.

In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).

Parameters
[in]srcBeginAn iterator to the first element of a source container.
[in]srcEndAn interator to the end of a source container.
[out]destAn object of type array_view<T,N> to be copied to.

References copy().

template<typename OutputIter , typename T , int N>
completion_future hc::copy_async ( const array< T, N > &  src,
OutputIter  destBegin 
)

The contents of a source array are copied into "dest" starting with iterator destBegin.

If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.

Parameters
[in]srcAn object of type array<T,N> to be copied from.
[out]destBeginAn output iterator addressing the position of the first element in the destination container.

References copy().

template<typename OutputIter , typename T , int N>
completion_future hc::copy_async ( const array_view< T, N > &  src,
OutputIter  destBegin 
)

The contents of a source array are copied into "dest" starting with iterator destBegin.

If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.

Parameters
[in]srcAn object of type array_view<T,N> to be copied from.
[out]destBeginAn output iterator addressing the position of the first element in the destination container.

References __atomic_wrapdec(), __atomic_wrapinc(), atomic_compare_exchange(), atomic_exchange(), atomic_fetch_add(), atomic_fetch_and(), atomic_fetch_dec(), atomic_fetch_inc(), atomic_fetch_max(), atomic_fetch_min(), atomic_fetch_or(), atomic_fetch_sub(), atomic_fetch_xor(), and copy().

unsigned int hc::get_group_segment_size ( )

Fetch the size of group segment.

This includes both static group segment and dynamic group segment.

Returns
The size of group segment used by the kernel in bytes. The value includes both static group segment and dynamic group segment.

Referenced by __mad24().

unsigned int hc::get_static_group_segment_size ( )

Fetch the size of static group segment.

Returns
The size of static group segment used by the kernel in bytes.

Referenced by __mad24().

uint64_t hc::get_system_ticks ( )
inline

Get the current tick count for the GPU platform.

Returns
An implementation-defined tick count
uint64_t hc::get_tick_frequency ( )
inline

Get the frequency of ticks per second for the underlying asynchrnous operation.

Returns
An implementation-defined frequency in Hz in case the instance is created by a kernel dispatch or a barrier packet. 0 otherwise.
void hc::global_memory_fence ( const tile_barrier )

Establishes a thread-tile scoped memory fence for global (but not tile-static) memory operations.

This function does not imply a barrier and is therefore permitted in divergent code.

Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().

template<int N>
extent<N> hc::operator% ( const extent< N > &  ext,
int  value 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator% ( int  value,
const extent< N > &  ext 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator* ( const extent< N > &  ext,
int  value 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator* ( int  value,
const extent< N > &  ext 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator+ ( const extent< N > &  lhs,
const extent< N > &  rhs 
)

Adds (or subtracts) two objects of extent<N> to form a new extent.

The result extent<N> is such that for a given operator $\oplus$, result[i] = leftExt[i] $\oplus$ rightExt[i] for every i from 0 to N-1.

Parameters
[in]lhsThe left-hand extent<N> to be compared.
[in]rhsThe right-hand extent<N> to be compared.
template<int N>
extent<N> hc::operator+ ( const extent< N > &  ext,
int  value 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator+ ( int  value,
const extent< N > &  ext 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator- ( const extent< N > &  lhs,
const extent< N > &  rhs 
)

Adds (or subtracts) two objects of extent<N> to form a new extent.

The result extent<N> is such that for a given operator $\oplus$, result[i] = leftExt[i] $\oplus$ rightExt[i] for every i from 0 to N-1.

Parameters
[in]lhsThe left-hand extent<N> to be compared.
[in]rhsThe right-hand extent<N> to be compared.
template<int N>
extent<N> hc::operator- ( const extent< N > &  ext,
int  value 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator- ( int  value,
const extent< N > &  ext 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator/ ( const extent< N > &  ext,
int  value 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
template<int N>
extent<N> hc::operator/ ( int  value,
const extent< N > &  ext 
)

Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.

The result extent<N> is such that for a given operator $\oplus$, result[i] = ext[i] $\oplus$ value or result[i] = value $\oplus$ ext[i] for every i from 0 to N-1.

Parameters
[in]extThe extent<N> operand
[in]valueThe integer operand
void hc::tile_static_memory_fence ( const tile_barrier )

Establishes a thread-tile scoped memory fence for tile-static (but not global) memory operations.

This function does not imply a barrier and is therefore permitted in divergent code.

Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().

Variable Documentation

const tiled_extent< 3 > const Kernel &f __CPU__ hc::__HC__
Initial value:
{
for(int i = 0 ; i < N ; i++)
{
if (compute_domain[i] == 0)
return completion_future();
if (compute_domain[i] < 0)
throw invalid_compute_domain("Extent is less than 0.");
if (static_cast<size_t>(compute_domain[i]) > 4294967295L)
throw invalid_compute_domain("Extent size too large.");
}
size_t ext[3] = {static_cast<size_t>(compute_domain[N - 1]),
static_cast<size_t>(compute_domain[N - 2]),
static_cast<size_t>(compute_domain[N - 3])}
static void hc::cpu
Initial value:
{
unsigned int size = 0