HCC
HCC is a single-source, C/C++ compiler for heterogeneous computing. It's optimized with HSA (http://www.hsafoundation.com/).
|
Heterogeneous C++ (HC) namespace. More...
Classes | |
struct | __has_data |
struct | __has_size |
struct | __is_container |
union | __u |
class | accelerator |
Represents a physical accelerated computing device. More... | |
class | accelerator_view |
Represents a logical (isolated) accelerator view of a compute accelerator. More... | |
struct | am_allocator |
class | AmPointerInfo |
class | array |
Represents an N-dimensional region of memory (with type T) located on an accelerator. More... | |
struct | array_projection_helper |
struct | array_projection_helper< T, 1 > |
class | array_view |
The array_view<T,N> type represents a possibly cached view into the data held in an array<T,N>, or a section thereof. More... | |
class | array_view< const T, N > |
The partial specialization array_view<const T,N> represents a view over elements of type const T with rank N. More... | |
class | completion_future |
This class is the return type of all asynchronous APIs and has an interface analogous to std::shared_future<void>. More... | |
struct | copy_bidir |
struct | copy_bidir< T, N, N > |
struct | copy_input |
struct | copy_input< InputIter, T, N, N > |
struct | copy_output |
struct | copy_output< OutputIter, T, N, N > |
struct | do_copy |
struct | do_copy< Iter, T, 1 > |
struct | do_copy< T *, T, 1 > |
struct | do_copy< T *, T, N > |
class | extent |
Represents a unique position in N-dimensional space. More... | |
struct | pfe_helper |
struct | pfe_helper< 0, Kernel, _Tp > |
class | pfe_wrapper |
class | PrintfPacket |
union | PrintfPacketData |
struct | projection_helper |
struct | projection_helper< const T, 1 > |
struct | projection_helper< const T, N > |
struct | projection_helper< T, 1 > |
class | tile_barrier |
The tile_barrier class is a capability class that is only creatable by the system, and passed to a tiled parallel_for_each function object as part of the tiled_index parameter. More... | |
class | tiled_extent |
Represents an extent subdivided into tiles. More... | |
class | tiled_extent< 1 > |
Represents an extent subdivided into tiles. More... | |
class | tiled_extent< 2 > |
Represents an extent subdivided into tiles. More... | |
class | tiled_extent< 3 > |
Represents an extent subdivided into tiles. More... | |
class | tiled_index |
Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More... | |
class | tiled_index< 1 > |
Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More... | |
class | tiled_index< 2 > |
Represents a set of related indices subdivided into 1-, 2-, or 3-dimensional tiles. More... | |
Typedefs | |
template<int N> | |
using | index = Kalmar::index< N > |
Represents a unique position in N-dimensional space. | |
using | runtime_exception = Kalmar::runtime_exception |
using | invalid_compute_domain = Kalmar::invalid_compute_domain |
using | accelerator_view_removed = Kalmar::accelerator_view_removed |
typedef __fp16 | half |
template<typename T > | |
using | pinned_vector = std::vector< T, am_allocator< T >> |
Functions | |
uint64_t | get_system_ticks () |
Get the current tick count for the GPU platform. More... | |
uint64_t | get_tick_frequency () |
Get the frequency of ticks per second for the underlying asynchrnous operation. More... | |
unsigned int | __wavesize () __HC__ |
unsigned int | __popcount_u32_b32 (unsigned int input) __HC__ |
Count number of 1 bits in the input. More... | |
unsigned int | __popcount_u32_b64 (unsigned long long int input) __HC__ |
Count number of 1 bits in the input. More... | |
unsigned int | __firstbit_u32_u32 (unsigned int input) __HC__ |
Count leading zero bits in the input. More... | |
unsigned int | __firstbit_u32_u64 (unsigned long long int input) __HC__ |
Count leading zero bits in the input. More... | |
unsigned int | __firstbit_u32_s32 (int input) __HC__ |
Count leading zero bits in the input. More... | |
unsigned int | __firstbit_u32_s64 (long long int input) __HC__ |
Count leading zero bits in the input. More... | |
unsigned int | __bitalign_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Align 32 bits within 64 bits of data on an arbitrary bit boundary. More... | |
unsigned int | __bytealign_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Align 32 bits within 64 bis of data on an arbitrary byte boundary. More... | |
unsigned int | __lerp_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Do linear interpolation and computes the unsigned 8-bit average of packed data. More... | |
unsigned int | __packcvt_u8x4_f32 (float src0, float src1, float src2, float src3) __HC__ |
Takes four floating-point number, convers them to unsigned integer values, and packs them into a packed u8x4 value. More... | |
float | __unpackcvt_f32_u8x4 (unsigned int src0, unsigned int src1) __HC__ |
Unpacks a single element from a packed u8x4 value and converts it to an f32. More... | |
unsigned int | __sadhi_u16x2_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
This function is mostly the same as sad except the sum of absolute differences is added to the most significant 16 bits of the result. More... | |
uint64_t | __clock_u64 () __HC__ |
Get system timestamp. | |
uint64_t | __cycle_u64 () __HC__ |
Get hardware cycle count. More... | |
unsigned int | __activelaneid_u32 () __HC__ |
Get the count of the number of earlier (in flattened work-item order) active work-items within the same wavefront. More... | |
uint64_t | __activelanemask_v4_b64_b1 (unsigned int input) __HC__ |
Return a bit mask shows which active work-items in the wavefront have a non-zero input. More... | |
unsigned int | __activelanecount_u32_b1 (unsigned int input) __HC__ |
Count the number of active work-items in the current wavefront that have a non-zero input. More... | |
int | __any (int predicate) __HC__ |
Evaluate predicate for all active work-items in the wavefront and return non-zero if and only if predicate evaluates to non-zero for all of them. | |
int | __all (int predicate) __HC__ |
Evaluate predicate for all active work-items in the wavefront and return non-zero if and only if predicate evaluates to non-zero for any of them. | |
uint64_t | __ballot (int predicate) __HC__ |
Evaluate predicate for all active work-items in the wavefront and return an integer whose Nth bit is set if and only if predicate evaluates to non-zero for the Nth work-item of the wavefront and the Nth work-item is active. | |
unsigned int | __shfl_xor (unsigned int var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
unsigned int | __mul24 (unsigned int x, unsigned int y)[[hc]] |
Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication. More... | |
int | __mul24 (int x, int y)[[hc]] |
Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication. More... | |
unsigned int | __mad24 (unsigned int x, unsigned int y, unsigned int z)[[hc]] |
Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit unsigned integer. More... | |
int | __mad24 (int x, int y, int z)[[hc]] |
Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit integer. More... | |
void | abort () __HC__ |
unsigned int | get_group_segment_size () __HC__ |
Fetch the size of group segment. More... | |
unsigned int | get_static_group_segment_size () __HC__ |
Fetch the size of static group segment. More... | |
void * | get_group_segment_base_pointer () __HC__ |
Fetch the address of the beginning of group segment. | |
void * | get_dynamic_group_segment_base_pointer () __HC__ |
Fetch the address of the beginning of dynamic group segment. | |
void | all_memory_fence (const tile_barrier &) __HC__ |
Establishes a thread-tile scoped memory fence for both global and tile-static memory operations. More... | |
void | global_memory_fence (const tile_barrier &) __HC__ |
Establishes a thread-tile scoped memory fence for global (but not tile-static) memory operations. More... | |
void | tile_static_memory_fence (const tile_barrier &) __HC__ |
Establishes a thread-tile scoped memory fence for tile-static (but not global) memory operations. More... | |
template<int N> | |
const extent< N > & | check (const extent< N > &ext) |
template<typename T , int N> | |
void | copy (const array< T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename OutputIter , typename T , int N> | |
void | copy (const array_view< T, N > &src, OutputIter destBegin) |
The contents of a source array are copied into "dest" starting with iterator destBegin. More... | |
template<typename OutputIter , typename T , int N> | |
void | copy (const array< T, N > &src, OutputIter destBegin) |
The contents of a source array are copied into "dest" starting with iterator destBegin. More... | |
template<typename T , int N> | |
completion_future | copy_async (const array< T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
completion_future | copy_async (const array< T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename OutputIter , typename T , int N> | |
completion_future | copy_async (const array< T, N > &src, OutputIter destBegin) |
The contents of a source array are copied into "dest" starting with iterator destBegin. More... | |
template<typename OutputIter , typename T , int N> | |
completion_future | copy_async (const array_view< T, N > &src, OutputIter destBegin) |
The contents of a source array are copied into "dest" starting with iterator destBegin. More... | |
template<typename T , int N> | |
completion_future | copy_async (const array< T, N > &src, const array< T, N > &dest) |
template<typename T , int N> | |
completion_future | copy_async (const array_view< const T, N > &src, const array< T, N > &dest) |
template<typename T , int N> | |
completion_future | copy_async (const array_view< T, N > &src, const array< T, N > &dest) |
unsigned int | __atomic_wrapinc (unsigned int *address, unsigned int val) __HC__ |
Atomically do the following operations: More... | |
unsigned int | __atomic_wrapdec (unsigned int *address, unsigned int val) __HC__ |
Atomically do the following operations: More... | |
template<int N, typename Kernel > | |
completion_future | parallel_for_each (const accelerator_view &, const extent< N > &, const Kernel &) |
template<typename Kernel > | |
completion_future | parallel_for_each (const accelerator_view &, const tiled_extent< 3 > &, const Kernel &) |
template<typename Kernel > | |
completion_future | parallel_for_each (const accelerator_view &, const tiled_extent< 2 > &, const Kernel &) |
template<typename Kernel > | |
completion_future | parallel_for_each (const accelerator_view &, const tiled_extent< 1 > &, const Kernel &) |
template<int N, typename Kernel > | |
completion_future | parallel_for_each (const extent< N > &compute_domain, const Kernel &f) |
template<typename Kernel > | |
completion_future | parallel_for_each (const tiled_extent< 3 > &compute_domain, const Kernel &f) |
template<typename Kernel > | |
completion_future | parallel_for_each (const tiled_extent< 2 > &compute_domain, const Kernel &f) |
template<typename Kernel > | |
completion_future | parallel_for_each (const tiled_extent< 1 > &compute_domain, const Kernel &f) |
template<int N, typename Kernel > | |
__attribute__ ((noinline, used)) completion_future parallel_for_each(const accelerator_view &av | |
if (av.get_accelerator().get_device_path()==L"cpu") | |
return | completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< pfe_wrapper< N, Kernel >, 3 >(av.pQueue, ext, NULL, _pf)) |
template<typename Kernel > | |
__attribute__ ((noinline, used)) completion_future parallel_for_each(const accelerator_view &av | |
if (compute_domain[0]< 0) | |
if (static_cast< size_t >(compute_domain[0]) > 4294967295L) throw invalid_compute_domain("Extent size too large.") | |
return | completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 1 >(av.pQueue,&ext, NULL, f)) |
if (static_cast< size_t >(compute_domain[1]) > 4294967295L) throw invalid_compute_domain("Extent size too large.") | |
return | completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 2 >(av.pQueue, ext, NULL, f)) |
if (static_cast< size_t >(compute_domain[2]) > 4294967295L) throw invalid_compute_domain("Extent size too large.") | |
return | completion_future (Kalmar::mcw_cxxamp_launch_kernel_async< Kernel, 3 >(av.pQueue, ext, NULL, f)) |
return | completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 1 >(av.pQueue,&ext,&tile, f, kernel, compute_domain.get_dynamic_group_segment_size())) |
return | completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 2 >(av.pQueue, ext, tile, f, kernel, compute_domain.get_dynamic_group_segment_size())) |
return | completion_future (Kalmar::mcw_cxxamp_execute_kernel_with_dynamic_group_memory_async< Kernel, 3 >(av.pQueue, ext, tile, f, kernel, compute_domain.get_dynamic_group_segment_size())) |
auto_voidp | am_aligned_alloc (std::size_t size, hc::accelerator &acc, unsigned flags, std::size_t alignment=0) |
Allocate a block of size bytes of memory on the specified acc . More... | |
auto_voidp | am_alloc (std::size_t size, hc::accelerator &acc, unsigned flags) |
Allocate a block of size bytes of memory on the specified acc . More... | |
am_status_t | am_free (void *ptr) |
Free a block of memory previously allocated with am_alloc. More... | |
while (str[size]!='\0') size++ | |
template<class T , class U > | |
bool | operator== (const am_allocator< T > &, const am_allocator< U > &) |
template<class T , class U > | |
bool | operator!= (const am_allocator< T > &, const am_allocator< U > &) |
template<int N> | |
extent< N > | operator+ (const extent< N > &lhs, const extent< N > &rhs) __CPU__ __HC__ |
Adds (or subtracts) two objects of extent<N> to form a new extent. More... | |
template<int N> | |
extent< N > | operator- (const extent< N > &lhs, const extent< N > &rhs) __CPU__ __HC__ |
Adds (or subtracts) two objects of extent<N> to form a new extent. More... | |
template<int N> | |
extent< N > | operator+ (const extent< N > &ext, int value) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator+ (int value, const extent< N > &ext) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator- (const extent< N > &ext, int value) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator- (int value, const extent< N > &ext) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator* (const extent< N > &ext, int value) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator* (int value, const extent< N > &ext) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator/ (const extent< N > &ext, int value) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator/ (int value, const extent< N > &ext) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator% (const extent< N > &ext, int value) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
template<int N> | |
extent< N > | operator% (int value, const extent< N > &ext) __CPU__ __HC__ |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands. More... | |
unsigned int | __bitextract_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Extract a range of bits. More... | |
uint64_t | __bitextract_u64 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__ |
Extract a range of bits. More... | |
int | __bitextract_s32 (int src0, unsigned int src1, unsigned int src2) __HC__ |
Extract a range of bits. More... | |
int64_t | __bitextract_s64 (int64_t src0, unsigned int src1, unsigned int src2) __HC__ |
Extract a range of bits. More... | |
unsigned int | __bitinsert_u32 (unsigned int src0, unsigned int src1, unsigned int src2, unsigned int src3) __HC__ |
Replace a range of bits. More... | |
uint64_t | __bitinsert_u64 (uint64_t src0, uint64_t src1, unsigned int src2, unsigned int src3) __HC__ |
Replace a range of bits. More... | |
int | __bitinsert_s32 (int src0, int src1, unsigned int src2, unsigned int src3) __HC__ |
Replace a range of bits. More... | |
int64_t | __bitinsert_s64 (int64_t src0, int64_t src1, unsigned int src2, unsigned int src3) __HC__ |
Replace a range of bits. More... | |
unsigned int | __bitmask_b32 (unsigned int src0, unsigned int src1) __HC__ |
Create a bit mask that can be used with bitselect. More... | |
uint64_t | __bitmask_b64 (unsigned int src0, unsigned int src1) __HC__ |
Create a bit mask that can be used with bitselect. More... | |
unsigned int | __bitrev_b32 (unsigned int src0)[[hc]] __asm("llvm.bitreverse.i32") |
Reverse the bits. More... | |
uint64_t | __bitrev_b64 (uint64_t src0)[[hc]] __asm("llvm.bitreverse.i64") |
Reverse the bits. More... | |
unsigned int | __bitselect_b32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Do bit field selection. More... | |
uint64_t | __bitselect_b64 (uint64_t src0, uint64_t src1, uint64_t src2) __HC__ |
Do bit field selection. More... | |
unsigned int | __lastbit_u32_u32 (unsigned int input) __HC__ |
Find the first bit set to 1 in a number starting from the least significant bit. More... | |
unsigned int | __lastbit_u32_u64 (unsigned long long int input) __HC__ |
Find the first bit set to 1 in a number starting from the least significant bit. More... | |
unsigned int | __lastbit_u32_s32 (int input) __HC__ |
Find the first bit set to 1 in a number starting from the least significant bit. More... | |
unsigned int | __lastbit_u32_s64 (unsigned long long input) __HC__ |
Find the first bit set to 1 in a number starting from the least significant bit. More... | |
unsigned int | __unpacklo_u8x4 (unsigned int src0, unsigned int src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
uint64_t | __unpacklo_u8x8 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
unsigned int | __unpacklo_u16x2 (unsigned int src0, unsigned int src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
uint64_t | __unpacklo_u16x4 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
uint64_t | __unpacklo_u32x2 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
int | __unpacklo_s8x4 (int src0, int src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
int64_t | __unpacklo_s8x8 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
int | __unpacklo_s16x2 (int src0, int src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
int64_t | __unpacklo_s16x4 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
int64_t | __unpacklo_s32x2 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the lower half of the elements from each source into the desitionation. More... | |
unsigned int | __unpackhi_u8x4 (unsigned int src0, unsigned int src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
uint64_t | __unpackhi_u8x8 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
unsigned int | __unpackhi_u16x2 (unsigned int src0, unsigned int src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
uint64_t | __unpackhi_u16x4 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
uint64_t | __unpackhi_u32x2 (uint64_t src0, uint64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
int | __unpackhi_s8x4 (int src0, int src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
int64_t | __unpackhi_s8x8 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
int | __unpackhi_s16x2 (int src0, int src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
int64_t | __unpackhi_s16x4 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
int64_t | __unpackhi_s32x2 (int64_t src0, int64_t src1) __HC__ |
Copy and interleave the upper half of the elements from each source into the desitionation. More... | |
unsigned int | __pack_u8x4_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
uint64_t | __pack_u8x8_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
unsigned | __pack_u16x2_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
uint64_t | __pack_u16x4_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
uint64_t | __pack_u32x2_u32 (uint64_t src0, unsigned int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
int | __pack_s8x4_s32 (int src0, int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
int64_t | __pack_s8x8_s32 (int64_t src0, int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
int | __pack_s16x2_s32 (int src0, int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
int64_t | __pack_s16x4_s32 (int64_t src0, int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
int64_t | __pack_s32x2_s32 (int64_t src0, int src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
double | __pack_f32x2_f32 (double src0, float src1, unsigned int src2) __HC__ |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1. More... | |
unsigned int | __unpack_u32_u8x4 (unsigned int src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
unsigned int | __unpack_u32_u8x8 (uint64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
unsigned int | __unpack_u32_u16x2 (unsigned int src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
unsigned int | __unpack_u32_u16x4 (uint64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
unsigned int | __unpack_u32_u32x2 (uint64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
int | __unpack_s32_s8x4 (int src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
int | __unpack_s32_s8x8 (int64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
int | __unpack_s32_s16x2 (int src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
int | __unpack_s32_s16x4 (int64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
int | __unpack_s32_s3x2 (int64_t src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
float | __unpack_f32_f32x2 (double src0, unsigned int src1) __HC__ |
Assign the elements specified by src1 from the packed value in src0. More... | |
unsigned int | __sad_u32_u32 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More... | |
unsigned int | __sad_u32_u16x2 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More... | |
unsigned int | __sad_u32_u8x4 (unsigned int src0, unsigned int src1, unsigned int src2) __HC__ |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result. More... | |
int | __amdgcn_mbcnt_lo (int mask, int src)[[hc]] __asm("llvm.amdgcn.mbcnt.lo") |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_mbcnt_hi (int mask, int src)[[hc]] __asm("llvm.amdgcn.mbcnt.hi") |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __lane_id (void)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_ds_bpermute (int index, int src)[[hc]] __asm("llvm.amdgcn.ds.bpermute") |
ds_bpermute intrinsic FIXME: We need to add __builtin_amdgcn_ds_bpermute to clang and call it here instead. | |
unsigned int | __amdgcn_ds_bpermute (int index, unsigned int src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_ds_bpermute (int index, float src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_ds_permute (int index, int src)[[hc]] |
ds_permute intrinsic | |
unsigned int | __amdgcn_ds_permute (int index, unsigned int src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_ds_permute (int index, float src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_ds_swizzle (int src, int pattern)[[hc]] |
ds_swizzle intrinsic | |
unsigned int | __amdgcn_ds_swizzle (unsigned int src, int pattern)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_ds_swizzle (float src, int pattern)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_move_dpp (int src, int dpp_ctrl, int row_mask, int bank_mask, bool bound_ctrl)[[hc]] |
move DPP intrinsic | |
int | __amdgcn_wave_sr1 (int src, bool bound_ctrl)[[hc]] |
Shift the value of src to the right by one thread within a wavefront. More... | |
unsigned int | __amdgcn_wave_sr1 (unsigned int src, bool bound_ctrl)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_wave_sr1 (float src, bool bound_ctrl)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_wave_sl1 (int src, bool bound_ctrl)[[hc]] |
Shift the value of src to the left by one thread within a wavefront. More... | |
unsigned int | __amdgcn_wave_sl1 (unsigned int src, bool bound_ctrl)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_wave_sl1 (float src, bool bound_ctrl)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_wave_rr1 (int src)[[hc]] |
Rotate the value of src to the right by one thread within a wavefront. More... | |
unsigned int | __amdgcn_wave_rr1 (unsigned int src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_wave_rr1 (float src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __amdgcn_wave_rl1 (int src)[[hc]] |
Rotate the value of src to the left by one thread within a wavefront. More... | |
unsigned int | __amdgcn_wave_rl1 (unsigned int src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __amdgcn_wave_rl1 (float src)[[hc]] |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __shfl (int var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Direct copy from indexed active work-item within a wavefront. More... | |
unsigned int | __shfl (unsigned int var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Direct copy from indexed active work-item within a wavefront. More... | |
float | __shfl (float var, int srcLane, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Direct copy from indexed active work-item within a wavefront. More... | |
int | __shfl_up (int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with lower ID relative to caller within a wavefront. More... | |
unsigned int | __shfl_up (unsigned int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with lower ID relative to caller within a wavefront. More... | |
float | __shfl_up (float var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with lower ID relative to caller within a wavefront. More... | |
int | __shfl_down (int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with higher ID relative to caller within a wavefront. More... | |
unsigned int | __shfl_down (unsigned int var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with higher ID relative to caller within a wavefront. More... | |
float | __shfl_down (float var, const unsigned int delta, const int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item with higher ID relative to caller within a wavefront. More... | |
int | __shfl_xor (int var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront. More... | |
float | __shfl_xor (float var, int laneMask, int width=__HSA_WAVEFRONT_SIZE__) __HC__ |
Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront. More... | |
template<typename T , int N> | |
void | copy (const array_view< const T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
void | copy (const array_view< T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T > | |
void | copy (const array_view< const T, 1 > &src, const array_view< T, 1 > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
void | copy (const array< T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T > | |
void | copy (const array< T, 1 > &src, const array_view< T, 1 > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
void | copy (const array_view< const T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
void | copy (const array_view< T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T > | |
void | copy (const array_view< const T, 1 > &src, array< T, 1 > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
void | copy (InputIter srcBegin, InputIter srcEnd, const array_view< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
void | copy (InputIter srcBegin, const array_view< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
void | copy (InputIter srcBegin, InputIter srcEnd, array< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
void | copy (InputIter srcBegin, array< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename T , int N> | |
completion_future | copy_async (const array_view< const T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
completion_future | copy_async (const array_view< T, N > &src, array< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
completion_future | copy_async (const array_view< const T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename T , int N> | |
completion_future | copy_async (const array_view< T, N > &src, const array_view< T, N > &dest) |
The contents of "src" are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
completion_future | copy_async (InputIter srcBegin, InputIter srcEnd, array< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
completion_future | copy_async (InputIter srcBegin, array< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
completion_future | copy_async (InputIter srcBegin, InputIter srcEnd, const array_view< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
template<typename InputIter , typename T , int N> | |
completion_future | copy_async (InputIter srcBegin, const array_view< T, N > &dest) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest". More... | |
unsigned int | atomic_exchange (unsigned int *dest, unsigned int val) __CPU__ __HC__ |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More... | |
int | atomic_exchange (int *dest, int val) __CPU__ __HC__ |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More... | |
float | atomic_exchange (float *dest, float val) __CPU__ __HC__ |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More... | |
uint64_t | atomic_exchange (uint64_t *dest, uint64_t val) __CPU__ __HC__ |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller. More... | |
bool | atomic_compare_exchange (unsigned int *dest, unsigned int *expected_val, unsigned int val) __CPU__ __HC__ |
These functions attempt to perform these three steps atomically: More... | |
bool | atomic_compare_exchange (int *dest, int *expected_val, int val) __CPU__ __HC__ |
These functions attempt to perform these three steps atomically: More... | |
bool | atomic_compare_exchange (uint64_t *dest, uint64_t *expected_val, uint64_t val) __CPU__ __HC__ |
These functions attempt to perform these three steps atomically: More... | |
unsigned | atomic_fetch_add (unsigned *x, unsigned y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_add (int *x, int y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
float | atomic_fetch_add (float *x, float y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_add (uint64_t *x, uint64_t y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned | atomic_fetch_sub (unsigned *x, unsigned y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_sub (int *x, int y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
float | atomic_fetch_sub (float *x, float y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned | atomic_fetch_and (unsigned *x, unsigned y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_and (int *x, int y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_and (uint64_t *x, uint64_t y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned | atomic_fetch_or (unsigned *x, unsigned y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_or (int *x, int y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_or (uint64_t *x, uint64_t y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned | atomic_fetch_xor (unsigned *x, unsigned y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_xor (int *x, int y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_xor (uint64_t *x, uint64_t y) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_max (int *dest, int val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned int | atomic_fetch_max (unsigned int *dest, unsigned int val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_max (uint64_t *dest, uint64_t val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_min (int *dest, int val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
unsigned int | atomic_fetch_min (unsigned int *dest, unsigned int val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
uint64_t | atomic_fetch_min (uint64_t *dest, uint64_t val) __CPU__ __HC__ |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest. More... | |
int | atomic_fetch_inc (int *_Dest) __CPU__ __HC__ |
Atomically increment or decrement the value stored at the location point to by dest. More... | |
unsigned int | atomic_fetch_inc (unsigned int *_Dest) __CPU__ __HC__ |
Atomically increment or decrement the value stored at the location point to by dest. More... | |
int | atomic_fetch_dec (int *_Dest) __CPU__ __HC__ |
Atomically increment or decrement the value stored at the location point to by dest. More... | |
unsigned int | atomic_fetch_dec (unsigned int *_Dest) __CPU__ __HC__ |
Atomically increment or decrement the value stored at the location point to by dest. More... | |
Variables | |
const extent< N > & | compute_domain |
const extent< N > const Kernel &f __CPU__ | __HC__ |
const pfe_wrapper< N, Kernel > | _pf (compute_domain, f) |
size_t | ext = compute_domain[0] |
size_t | tile = compute_domain.tile_dim[0] |
void * | kernel = Kalmar::mcw_cxxamp_get_kernel<Kernel>(av.pQueue, f) |
PrintfPacket * | printf_buffer |
static unsigned int | cpu |
return | size |
Heterogeneous C++ (HC) namespace.
|
inline |
Count the number of active work-items in the current wavefront that have a non-zero input.
[in] | input | An unsigned 32-bit integer. |
References __activelanemask_v4_b64_b1(), and __popcount_u32_b64().
Referenced by __all().
unsigned int hc::__activelaneid_u32 | ( | ) |
Get the count of the number of earlier (in flattened work-item order) active work-items within the same wavefront.
Referenced by __lastbit_u32_s64().
uint64_t hc::__activelanemask_v4_b64_b1 | ( | unsigned int | input | ) |
Return a bit mask shows which active work-items in the wavefront have a non-zero input.
The affected bit position within the registers of dest corresponds to each work-item's lane ID.
The HSAIL instruction would return 4 64-bit registers but the current implementation would only return the 1st one and ignore the other 3 as right now all HSA agents have wavefront of size 64.
[in] | input | An unsigned 32-bit integer. |
Referenced by __activelanecount_u32_b1(), __all(), __any(), __ballot(), and __lastbit_u32_s64().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_permute().
Referenced by __amdgcn_ds_bpermute(), __lane_id(), __shfl(), __shfl_down(), __shfl_up(), and __shfl_xor().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_permute().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_swizzle().
Referenced by __amdgcn_ds_bpermute(), and __amdgcn_ds_permute().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_swizzle().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_move_dpp(), and __amdgcn_wave_sr1().
Referenced by __amdgcn_ds_permute(), and __amdgcn_ds_swizzle().
int hc::__amdgcn_mbcnt_hi | ( | int | mask, |
int | src | ||
) |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __lane_id().
int hc::__amdgcn_mbcnt_lo | ( | int | mask, |
int | src | ||
) |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __lane_id().
int hc::__amdgcn_wave_rl1 | ( | int | src | ) |
Rotate the value of src to the left by one thread within a wavefront.
[in] | src | variable being rotated |
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_rl1().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __amdgcn_wave_rl1(), and __amdgcn_wave_rr1().
int hc::__amdgcn_wave_rr1 | ( | int | src | ) |
Rotate the value of src to the right by one thread within a wavefront.
[in] | src | variable being rotated |
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_rr1().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_rl1().
Referenced by __amdgcn_wave_rr1(), and __amdgcn_wave_sl1().
int hc::__amdgcn_wave_sl1 | ( | int | src, |
bool | bound_ctrl | ||
) |
Shift the value of src to the left by one thread within a wavefront.
[in] | src | variable being shifted |
[in] | bound_ctrl | When set to true, a zero will be shifted into thread 63; otherwise, the original value will be returned for thread 63 |
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_sl1().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_rr1().
Referenced by __amdgcn_wave_sl1(), and __amdgcn_wave_sr1().
int hc::__amdgcn_wave_sr1 | ( | int | src, |
bool | bound_ctrl | ||
) |
Shift the value of src to the right by one thread within a wavefront.
[in] | src | variable being shifted |
[in] | bound_ctrl | When set to true, a zero will be shifted into thread 0; otherwise, the original value will be returned for thread 0 |
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_sr1().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_wave_sl1().
Referenced by __amdgcn_ds_swizzle(), and __amdgcn_wave_sr1().
unsigned int hc::__atomic_wrapdec | ( | unsigned int * | address, |
unsigned int | val | ||
) |
Atomically do the following operations:
Please refer to atomic_wrapdec in HSA PRM 6.6 for more detailed specification of the function.
Referenced by copy_async().
unsigned int hc::__atomic_wrapinc | ( | unsigned int * | address, |
unsigned int | val | ||
) |
Atomically do the following operations:
Please refer to atomic_wrapinc in HSA PRM 6.6 for more detailed specification of the function.
Referenced by copy_async().
unsigned int hc::__bitalign_b32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Align 32 bits within 64 bits of data on an arbitrary bit boundary.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
int hc::__bitextract_s32 | ( | int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Extract a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
int64_t hc::__bitextract_s64 | ( | int64_t | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Extract a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
|
inline |
Extract a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
References __bitextract_s32(), __bitextract_s64(), __bitextract_u64(), __bitinsert_s32(), __bitinsert_s64(), __bitinsert_u32(), __bitinsert_u64(), __bitmask_b32(), __bitmask_b64(), __bitrev_b32(), __bitrev_b64(), __bitselect_b32(), and __bitselect_b64().
uint64_t hc::__bitextract_u64 | ( | uint64_t | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Extract a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
int hc::__bitinsert_s32 | ( | int | src0, |
int | src1, | ||
unsigned int | src2, | ||
unsigned int | src3 | ||
) |
Replace a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
int64_t hc::__bitinsert_s64 | ( | int64_t | src0, |
int64_t | src1, | ||
unsigned int | src2, | ||
unsigned int | src3 | ||
) |
Replace a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
unsigned int hc::__bitinsert_u32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2, | ||
unsigned int | src3 | ||
) |
Replace a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
uint64_t hc::__bitinsert_u64 | ( | uint64_t | src0, |
uint64_t | src1, | ||
unsigned int | src2, | ||
unsigned int | src3 | ||
) |
Replace a range of bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
unsigned int hc::__bitmask_b32 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Create a bit mask that can be used with bitselect.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
uint64_t hc::__bitmask_b64 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Create a bit mask that can be used with bitselect.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
unsigned int hc::__bitrev_b32 | ( | unsigned int | src0 | ) |
Reverse the bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
uint64_t hc::__bitrev_b64 | ( | uint64_t | src0 | ) |
Reverse the bits.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
unsigned int hc::__bitselect_b32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Do bit field selection.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
uint64_t hc::__bitselect_b64 | ( | uint64_t | src0, |
uint64_t | src1, | ||
uint64_t | src2 | ||
) |
Do bit field selection.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __bitextract_u32().
unsigned int hc::__bytealign_b32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Align 32 bits within 64 bis of data on an arbitrary byte boundary.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
uint64_t hc::__cycle_u64 | ( | ) |
Get hardware cycle count.
Notice the return value of this function is implementation defined.
Referenced by __lastbit_u32_s64().
|
inline |
Count leading zero bits in the input.
[in] | input | An signed 32-bit integer. |
References __firstbit_u32_u32().
|
inline |
Count leading zero bits in the input.
[in] | input | An signed 64-bit integer. |
References __firstbit_u32_u64().
|
inline |
Count leading zero bits in the input.
[in] | input | An unsigned 32-bit integer. |
Referenced by __firstbit_u32_s32().
|
inline |
Count leading zero bits in the input.
[in] | input | An unsigned 64-bit integer. |
Referenced by __firstbit_u32_s64().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute(), __amdgcn_mbcnt_hi(), and __amdgcn_mbcnt_lo().
Referenced by __shfl(), __shfl_down(), __shfl_up(), and __shfl_xor().
|
inline |
Find the first bit set to 1 in a number starting from the least significant bit.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
References __lastbit_u32_u32().
|
inline |
Find the first bit set to 1 in a number starting from the least significant bit.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
References __activelaneid_u32(), __activelanemask_v4_b64_b1(), __bitalign_b32(), __bytealign_b32(), __clock_u64(), __cycle_u64(), __lastbit_u32_u64(), __lerp_u8x4(), __pack_f32x2_f32(), __pack_s16x2_s32(), __pack_s16x4_s32(), __pack_s32x2_s32(), __pack_s8x4_s32(), __pack_s8x8_s32(), __pack_u16x2_u32(), __pack_u16x4_u32(), __pack_u32x2_u32(), __pack_u8x4_u32(), __pack_u8x8_u32(), __packcvt_u8x4_f32(), __sad_u32_u16x2(), __sad_u32_u32(), __sad_u32_u8x4(), __sadhi_u16x2_u8x4(), __unpack_f32_f32x2(), __unpack_s32_s16x2(), __unpack_s32_s16x4(), __unpack_s32_s3x2(), __unpack_s32_s8x4(), __unpack_s32_s8x8(), __unpack_u32_u16x2(), __unpack_u32_u16x4(), __unpack_u32_u32x2(), __unpack_u32_u8x4(), __unpack_u32_u8x8(), __unpackcvt_f32_u8x4(), __unpackhi_s16x2(), __unpackhi_s16x4(), __unpackhi_s32x2(), __unpackhi_s8x4(), __unpackhi_s8x8(), __unpackhi_u16x2(), __unpackhi_u16x4(), __unpackhi_u32x2(), __unpackhi_u8x4(), __unpackhi_u8x8(), __unpacklo_s16x2(), __unpacklo_s16x4(), __unpacklo_s32x2(), __unpacklo_s8x4(), __unpacklo_s8x8(), __unpacklo_u16x2(), __unpacklo_u16x4(), __unpacklo_u32x2(), __unpacklo_u8x4(), and __unpacklo_u8x8().
|
inline |
Find the first bit set to 1 in a number starting from the least significant bit.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __lastbit_u32_s32().
|
inline |
Find the first bit set to 1 in a number starting from the least significant bit.
Please refer to HSA PRM 5.7 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__lerp_u8x4 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Do linear interpolation and computes the unsigned 8-bit average of packed data.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
|
inline |
Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit unsigned integer.
[in] | x | 24-bit unsigned integer multiplier |
[in] | y | 24-bit unsigned integer multiplicand |
[in] | z | 32-bit unsigned integer to be added to the product |
References __mul24().
|
inline |
Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication and then add the product to a 32-bit integer.
[in] | x | 24-bit integer multiplier |
[in] | y | 24-bit integer multiplicand |
[in] | z | 32-bit integer to be added to the product |
References __mul24(), get_dynamic_group_segment_base_pointer(), get_group_segment_base_pointer(), get_group_segment_size(), and get_static_group_segment_size().
|
inline |
Multiply two unsigned integers (x,y) but only the lower 24 bits will be used in the multiplication.
[in] | x | 24-bit unsigned integer multiplier |
[in] | y | 24-bit unsigned integer multiplicand |
|
inline |
Multiply two integers (x,y) but only the lower 24 bits will be used in the multiplication.
[in] | x | 24-bit integer multiplier |
[in] | y | 24-bit integer multiplicand |
Referenced by __mad24().
double hc::__pack_f32x2_f32 | ( | double | src0, |
float | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__pack_s16x2_s32 | ( | int | src0, |
int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__pack_s16x4_s32 | ( | int64_t | src0, |
int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__pack_s32x2_s32 | ( | int64_t | src0, |
int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__pack_s8x4_s32 | ( | int | src0, |
int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__pack_s8x8_s32 | ( | int64_t | src0, |
int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned hc::__pack_u16x2_u32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__pack_u16x4_u32 | ( | uint64_t | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__pack_u32x2_u32 | ( | uint64_t | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__pack_u8x4_u32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__pack_u8x8_u32 | ( | uint64_t | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Assign the elements of the packed value in src0, replacing the element specified by src2 with the value from src1.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__packcvt_u8x4_f32 | ( | float | src0, |
float | src1, | ||
float | src2, | ||
float | src3 | ||
) |
Takes four floating-point number, convers them to unsigned integer values, and packs them into a packed u8x4 value.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
|
inline |
Count number of 1 bits in the input.
[in] | input | An unsinged 32-bit integer. |
|
inline |
Count number of 1 bits in the input.
[in] | input | An unsinged 64-bit integer. |
Referenced by __activelanecount_u32_b1(), __all(), and __any().
unsigned int hc::__sad_u32_u16x2 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
unsigned int hc::__sad_u32_u32 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
unsigned int hc::__sad_u32_u8x4 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
Computes the sum of the absolute differences of src0 and src1 and then adds src2 to the result.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
unsigned int hc::__sadhi_u16x2_u8x4 | ( | unsigned int | src0, |
unsigned int | src1, | ||
unsigned int | src2 | ||
) |
This function is mostly the same as sad except the sum of absolute differences is added to the most significant 16 bits of the result.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute(), and __lane_id().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __shfl().
|
inline |
Direct copy from indexed active work-item within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function returns the value of var held by the work-item whose ID is given by srcLane. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. If srcLane is outside the range [0:width-1], the value returned corresponds to the value of var held by: srcLane modulo width (i.e. within the same subsection).
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __shfl().
|
inline |
Copy from an active work-item with higher ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute(), and __lane_id().
|
inline |
Copy from an active work-item with higher ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __shfl_down().
|
inline |
Copy from an active work-item with higher ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by adding delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: this has the effect of shifting var up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The ID number of the source work-item index will not wrap around the value of width, so the upper delta work-items will remain unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __shfl_down().
|
inline |
Copy from an active work-item with lower ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute(), and __lane_id().
|
inline |
Copy from an active work-item with lower ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __shfl_up().
|
inline |
Copy from an active work-item with lower ID relative to caller within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
The function calculates a source work-item ID by subtracting delta from the caller's work-item ID within the wavefront. The value of var held by the resulting lane ID is returned: in effect, var is shifted up the wavefront by delta work-items. If width is less than HSA_WAVEFRONT_SIZE then each subsection of the wavefront behaves as a separate entity with a starting logical work-item ID of 0. The source work-item index will not wrap around the value of width, so effectively the lower delta work-items will be unchanged.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
Referenced by __shfl_up().
|
inline |
Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
THe function calculates a source work-item ID by performing a bitwise XOR of the caller's work-item ID with laneMask: the value of var held by the resulting work-item ID is returned.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __amdgcn_ds_bpermute(), __lane_id(), and __shfl_xor().
Referenced by __shfl_xor().
|
inline |
Copy from an active work-item based on bitwise XOR of caller work-item ID within a wavefront.
Work-items may only read data from another work-item which is active in the current wavefront. If the target work-item is inactive, the retrieved value is fixed as 0.
THe function calculates a source work-item ID by performing a bitwise XOR of the caller's work-item ID with laneMask: the value of var held by the resulting work-item ID is returned.
The optional width parameter must have a value which is a power of 2; results are undefined if it is not a power of 2, or is number greater than HSA_WAVEFRONT_SIZE.
References __HSA_WAVEFRONT_SIZE__, and __shfl_xor().
float hc::__unpack_f32_f32x2 | ( | double | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpack_s32_s16x2 | ( | int | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpack_s32_s16x4 | ( | int64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpack_s32_s3x2 | ( | int64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpack_s32_s8x4 | ( | int | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpack_s32_s8x8 | ( | int64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpack_u32_u16x2 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpack_u32_u16x4 | ( | uint64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpack_u32_u32x2 | ( | uint64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpack_u32_u8x4 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpack_u32_u8x8 | ( | uint64_t | src0, |
unsigned int | src1 | ||
) |
Assign the elements specified by src1 from the packed value in src0.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
float hc::__unpackcvt_f32_u8x4 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Unpacks a single element from a packed u8x4 value and converts it to an f32.
Please refer to HSA PRM 5.15 for more detailed specification.
Referenced by __lastbit_u32_s64().
int hc::__unpackhi_s16x2 | ( | int | src0, |
int | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpackhi_s16x4 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpackhi_s32x2 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpackhi_s8x4 | ( | int | src0, |
int | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpackhi_s8x8 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpackhi_u16x2 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpackhi_u16x4 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpackhi_u32x2 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpackhi_u8x4 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpackhi_u8x8 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the upper half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpacklo_s16x2 | ( | int | src0, |
int | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpacklo_s16x4 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpacklo_s32x2 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int hc::__unpacklo_s8x4 | ( | int | src0, |
int | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
int64_t hc::__unpacklo_s8x8 | ( | int64_t | src0, |
int64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpacklo_u16x2 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpacklo_u16x4 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpacklo_u32x2 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
unsigned int hc::__unpacklo_u8x4 | ( | unsigned int | src0, |
unsigned int | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
uint64_t hc::__unpacklo_u8x8 | ( | uint64_t | src0, |
uint64_t | src1 | ||
) |
Copy and interleave the lower half of the elements from each source into the desitionation.
Please refer to HSA PRM 5.9 for more detailed specification of these functions.
Referenced by __lastbit_u32_s64().
void hc::all_memory_fence | ( | const tile_barrier & | ) |
Establishes a thread-tile scoped memory fence for both global and tile-static memory operations.
This function does not imply a barrier and is therefore permitted in divergent code.
Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().
auto_voidp hc::am_aligned_alloc | ( | std::size_t | size, |
hc::accelerator & | acc, | ||
unsigned | flags, | ||
std::size_t | alignment = 0 |
||
) |
Allocate a block of size
bytes of memory on the specified acc
.
The contents of the newly allocated block of memory are not initialized.
If size
== 0, 0 is returned.
Flags: amHostPinned : Allocated pinned host memory and map it into the address space of the specified accelerator.
If an error occurred trying to allocate the requested memory, 0 is returned.
auto_voidp hc::am_alloc | ( | std::size_t | size, |
hc::accelerator & | acc, | ||
unsigned | flags | ||
) |
Allocate a block of size
bytes of memory on the specified acc
.
The contents of the newly allocated block of memory are not initialized.
If size
== 0, 0 is returned.
Flags: amHostPinned : Allocated pinned host memory and map it into the address space of the specified accelerator.
If an error occurred trying to allocate the requested memory, 0 is returned.
am_status_t hc::am_free | ( | void * | ptr | ) |
Free a block of memory previously allocated with am_alloc.
bool hc::atomic_compare_exchange | ( | unsigned int * | dest, |
unsigned int * | expected_val, | ||
unsigned int | val | ||
) |
These functions attempt to perform these three steps atomically:
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[out] | expected_val | A pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest. |
[in] | val | The new value to be stored in the location pointed to be dest |
bool hc::atomic_compare_exchange | ( | int * | dest, |
int * | expected_val, | ||
int | val | ||
) |
These functions attempt to perform these three steps atomically:
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[out] | expected_val | A pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest. |
[in] | val | The new value to be stored in the location pointed to be dest |
bool hc::atomic_compare_exchange | ( | uint64_t * | dest, |
uint64_t * | expected_val, | ||
uint64_t | val | ||
) |
These functions attempt to perform these three steps atomically:
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[out] | expected_val | A pointer to a local variable or function parameter. Upon calling the function, the location pointed by expected_val contains the value the caller expects dest to contain. Upon return from the function, expected_val will contain the most recent value read from dest. |
[in] | val | The new value to be stored in the location pointed to be dest |
Referenced by copy_async().
unsigned int hc::atomic_exchange | ( | unsigned int * | dest, |
unsigned int | val | ||
) |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.
This function provides overloads for int , unsigned int and float parameters.
[out] | dest | A pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable. |
[in] | val | The new value to be stored in the location pointed to be dest |
int hc::atomic_exchange | ( | int * | dest, |
int | val | ||
) |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.
This function provides overloads for int , unsigned int and float parameters.
[out] | dest | A pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable. |
[in] | val | The new value to be stored in the location pointed to be dest |
float hc::atomic_exchange | ( | float * | dest, |
float | val | ||
) |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.
This function provides overloads for int , unsigned int and float parameters.
[out] | dest | A pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable. |
[in] | val | The new value to be stored in the location pointed to be dest |
uint64_t hc::atomic_exchange | ( | uint64_t * | dest, |
uint64_t | val | ||
) |
Atomically read the value stored in dest , replace it with the value given in val and return the old value to the caller.
This function provides overloads for int , unsigned int and float parameters.
[out] | dest | A pointer to the location which needs to be atomically modified. The location may reside within a hc::array or hc::array_view or within a tile_static variable. |
[in] | val | The new value to be stored in the location pointed to be dest |
Referenced by copy_async().
unsigned hc::atomic_fetch_add | ( | unsigned * | x, |
unsigned | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
int hc::atomic_fetch_add | ( | int * | x, |
int | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
float hc::atomic_fetch_add | ( | float * | x, |
float | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_add | ( | uint64_t * | x, |
uint64_t | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
unsigned hc::atomic_fetch_and | ( | unsigned * | x, |
unsigned | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
int hc::atomic_fetch_and | ( | int * | x, |
int | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_and | ( | uint64_t * | x, |
uint64_t | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
int hc::atomic_fetch_dec | ( | int * | _Dest | ) |
Atomically increment or decrement the value stored at the location point to by dest.
[in,out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
unsigned int hc::atomic_fetch_dec | ( | unsigned int * | _Dest | ) |
Atomically increment or decrement the value stored at the location point to by dest.
[in,out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
Referenced by copy_async().
int hc::atomic_fetch_inc | ( | int * | _Dest | ) |
Atomically increment or decrement the value stored at the location point to by dest.
[in,out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
unsigned int hc::atomic_fetch_inc | ( | unsigned int * | _Dest | ) |
Atomically increment or decrement the value stored at the location point to by dest.
[in,out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
Referenced by copy_async().
int hc::atomic_fetch_max | ( | int * | dest, |
int | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
unsigned int hc::atomic_fetch_max | ( | unsigned int * | dest, |
unsigned int | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_max | ( | uint64_t * | dest, |
uint64_t | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
int hc::atomic_fetch_min | ( | int * | dest, |
int | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
unsigned int hc::atomic_fetch_min | ( | unsigned int * | dest, |
unsigned int | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_min | ( | uint64_t * | dest, |
uint64_t | val | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
unsigned hc::atomic_fetch_or | ( | unsigned * | x, |
unsigned | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
int hc::atomic_fetch_or | ( | int * | x, |
int | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_or | ( | uint64_t * | x, |
uint64_t | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
unsigned hc::atomic_fetch_sub | ( | unsigned * | x, |
unsigned | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
int hc::atomic_fetch_sub | ( | int * | x, |
int | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
float hc::atomic_fetch_sub | ( | float * | x, |
float | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
unsigned hc::atomic_fetch_xor | ( | unsigned * | x, |
unsigned | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
int hc::atomic_fetch_xor | ( | int * | x, |
int | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
uint64_t hc::atomic_fetch_xor | ( | uint64_t * | x, |
uint64_t | y | ||
) |
Atomically read the value stored in dest, apply the binary numerical operation specific to the function with the read value and val serving as input operands, and store the result back to the location pointed by dest.
In terms of sequential semantics, the operation performed by any of the above function is described by the following piece of pseudo-code:
*dest = *dest val;
Where the operation denoted by is one of: addition (atomic_fetch_add), subtraction (atomic_fetch_sub), find maximum (atomic_fetch_max), find minimum (atomic_fetch_min), bit-wise AND (atomic_fetch_and), bit-wise OR (atomic_fetch_or), bit-wise XOR (atomic_fetch_xor).
[out] | dest | An pointer to the location which needs to be atomically modified. The location may reside within a concurrency::array or concurrency::array_view or within a tile_static variable. |
[in] | val | The second operand which participates in the calculation of the binary operation whose result is stored into the location pointed to be dest. |
Referenced by copy_async().
void hc::copy | ( | const array_view< const T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References hc::array_view< T, N >::get_extent().
void hc::copy | ( | const array_view< T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
void hc::copy | ( | const array< T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References hc::array_view< T, N >::get_extent().
The contents of "src" are copied into "dest".
The source and destination may reside on different accelerators. If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
void hc::copy | ( | const array_view< const T, N > & | src, |
array< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References hc::array< T, N >::get_extent().
void hc::copy | ( | const array_view< T, N > & | src, |
array< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
void hc::copy | ( | InputIter | srcBegin, |
InputIter | srcEnd, | ||
const array_view< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
void hc::copy | ( | InputIter | srcBegin, |
InputIter | srcEnd, | ||
array< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array<T,N> to be copied to. |
References hc::array< T, N >::get_extent().
void hc::copy | ( | InputIter | srcBegin, |
const array_view< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy(), and hc::array_view< T, N >::get_extent().
void hc::copy | ( | InputIter | srcBegin, |
array< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy(), and hc::array< T, N >::get_extent().
void hc::copy | ( | const array_view< T, N > & | src, |
OutputIter | destBegin | ||
) |
The contents of a source array are copied into "dest" starting with iterator destBegin.
If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.
[in] | src | An object of type array_view<T,N> to be copied from. |
[out] | destBegin | An output iterator addressing the position of the first element in the destination container. |
void hc::copy | ( | const array< T, N > & | src, |
OutputIter | destBegin | ||
) |
The contents of a source array are copied into "dest" starting with iterator destBegin.
If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | destBegin | An output iterator addressing the position of the first element in the destination container. |
void hc::copy | ( | const array< T, 1 > & | src, |
const array_view< T, 1 > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References hc::array_view< T, N >::get_extent().
void hc::copy | ( | const array_view< const T, 1 > & | src, |
array< T, 1 > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References hc::array< T, N >::get_extent().
void hc::copy | ( | const array_view< const T, 1 > & | src, |
const array_view< T, 1 > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References hc::array_view< T, N >::get_extent().
Referenced by hc::array< T, N >::array(), copy(), copy_async(), hc::array< T, N >::copy_to(), hc::array_view< T, N >::copy_to(), hc::array_view< const T, N >::copy_to(), hc::array< T, N >::operator std::vector< T >(), hc::array< T, N >::view_as(), hc::array_view< T, N >::view_as(), and hc::array_view< const T, N >::view_as().
completion_future hc::copy_async | ( | const array< T, N > & | src, |
array< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
The source and destination may reside on different accelerators. If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array< T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array_view< const T, N > & | src, |
array< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array_view< T, N > & | src, |
array< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array_view< const T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array_view< T, N > & | src, |
const array_view< T, N > & | dest | ||
) |
The contents of "src" are copied into "dest".
If the extents of "src" and "dest" don't match, a runtime exception is thrown.
[in] | src | An object of type array_view<T,N> (or array_view<const T, N>) to be copied from. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | InputIter | srcBegin, |
InputIter | srcEnd, | ||
array< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | InputIter | srcBegin, |
array< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | InputIter | srcBegin, |
InputIter | srcEnd, | ||
const array_view< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | InputIter | srcBegin, |
const array_view< T, N > & | dest | ||
) |
The contents of a source container from the iterator range [srcBegin,srcEnd) are copied into "dest".
If the number of elements in the iterator range is not equal to "dest.extent.size()", an exception is thrown.
In the overloads which don't take an end-iterator it is assumed that the source iterator is able to provide at least dest.extent.size() elements, but no checking is performed (nor possible).
[in] | srcBegin | An iterator to the first element of a source container. |
[in] | srcEnd | An interator to the end of a source container. |
[out] | dest | An object of type array_view<T,N> to be copied to. |
References copy().
completion_future hc::copy_async | ( | const array< T, N > & | src, |
OutputIter | destBegin | ||
) |
The contents of a source array are copied into "dest" starting with iterator destBegin.
If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.
[in] | src | An object of type array<T,N> to be copied from. |
[out] | destBegin | An output iterator addressing the position of the first element in the destination container. |
References copy().
completion_future hc::copy_async | ( | const array_view< T, N > & | src, |
OutputIter | destBegin | ||
) |
The contents of a source array are copied into "dest" starting with iterator destBegin.
If the number of elements in the range starting destBegin in the destination container is smaller than "src.extent.size()", the behavior is undefined.
[in] | src | An object of type array_view<T,N> to be copied from. |
[out] | destBegin | An output iterator addressing the position of the first element in the destination container. |
References __atomic_wrapdec(), __atomic_wrapinc(), atomic_compare_exchange(), atomic_exchange(), atomic_fetch_add(), atomic_fetch_and(), atomic_fetch_dec(), atomic_fetch_inc(), atomic_fetch_max(), atomic_fetch_min(), atomic_fetch_or(), atomic_fetch_sub(), atomic_fetch_xor(), and copy().
unsigned int hc::get_group_segment_size | ( | ) |
Fetch the size of group segment.
This includes both static group segment and dynamic group segment.
Referenced by __mad24().
unsigned int hc::get_static_group_segment_size | ( | ) |
Fetch the size of static group segment.
Referenced by __mad24().
|
inline |
Get the current tick count for the GPU platform.
|
inline |
Get the frequency of ticks per second for the underlying asynchrnous operation.
void hc::global_memory_fence | ( | const tile_barrier & | ) |
Establishes a thread-tile scoped memory fence for global (but not tile-static) memory operations.
This function does not imply a barrier and is therefore permitted in divergent code.
Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Adds (or subtracts) two objects of extent<N> to form a new extent.
The result extent<N> is such that for a given operator , result[i] = leftExt[i] rightExt[i] for every i from 0 to N-1.
[in] | lhs | The left-hand extent<N> to be compared. |
[in] | rhs | The right-hand extent<N> to be compared. |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Adds (or subtracts) two objects of extent<N> to form a new extent.
The result extent<N> is such that for a given operator , result[i] = leftExt[i] rightExt[i] for every i from 0 to N-1.
[in] | lhs | The left-hand extent<N> to be compared. |
[in] | rhs | The right-hand extent<N> to be compared. |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
Binary arithmetic operations that produce a new extent<N> that is the result of performing the corresponding binary arithmetic operation on the elements of the extent operands.
The result extent<N> is such that for a given operator , result[i] = ext[i] value or result[i] = value ext[i] for every i from 0 to N-1.
[in] | ext | The extent<N> operand |
[in] | value | The integer operand |
void hc::tile_static_memory_fence | ( | const tile_barrier & | ) |
Establishes a thread-tile scoped memory fence for tile-static (but not global) memory operations.
This function does not imply a barrier and is therefore permitted in divergent code.
Referenced by hc::tile_barrier::wait_with_tile_static_memory_fence().
const tiled_extent< 3 > const Kernel &f __CPU__ hc::__HC__ |
static void hc::cpu |