A. R. Shajii
|
ebd344f894
|
GPU and other updates (#52)
* Add nvptx pass
* Fix spaces
* Don't change name
* Add runtime support
* Add init call
* Add more runtime functions
* Add launch function
* Add intrinsics
* Fix codegen
* Run GPU pass between general opt passes
* Set data layout
* Create context
* Link libdevice
* Add function remapping
* Fix linkage
* Fix libdevice link
* Fix linking
* Fix personality
* Fix linking
* Fix linking
* Fix linking
* Add internalize pass
* Add more math conversions
* Add more re-mappings
* Fix conversions
* Fix __str__
* Add decorator attribute for any decorator
* Update kernel decorator
* Fix kernel decorator
* Fix kernel decorator
* Fix kernel decorator
* Fix kernel decorator
* Remove old decorator
* Fix pointer calc
* Fix fill-in codegen
* Fix linkage
* Add comment
* Update list conversion
* Add more conversions
* Add dict and set conversions
* Add float32 type to IR/LLVM
* Add float32
* Add float32 stdlib
* Keep required global values in PTX module
* Fix PTX module pruning
* Fix malloc
* Set will-return
* Fix name cleanup
* Fix access
* Fix name cleanup
* Fix function renaming
* Update dimension API
* Fix args
* Clean up API
* Move GPU transformations to end of opt pipeline
* Fix alloc replacements
* Fix naming
* Target PTX 4.2
* Fix global renaming
* Fix early return in static blocks; Add __realized__ function
* Format
* Add __llvm_name__ for functions
* Add vector type to IR
* SIMD support [wip]
* Update kernel naming
* Fix early returns; Fix SIMD calls
* Fix kernel naming
* Fix IR matcher
* Remove module print
* Update realloc
* Add overloads for 32-bit float math ops
* Add gpu.Pointer type for working with raw pointers
* Add float32 conversion
* Add to_gpu and from_gpu
* clang-format
* Add f32 reduction support to OpenMP
* Fix automatic GPU class conversions
* Fix conversion functions
* Fix conversions
* Rename self
* Fix tuple conversion
* Fix conversions
* Fix conversions
* Update PTX filename
* Fix filename
* Add raw function
* Add GPU docs
* Allow nested object conversions
* Add tests (WIP)
* Update SIMD
* Add staticrange and statictuple loop support
* SIMD updates
* Add new Vec constructors
* Fix UInt conversion
* Fix size-0 allocs
* Add more tests
* Add matmul test
* Rename gpu test file
* Add more tests
* Add alloc cache
* Fix object_to_gpu
* Fix frees
* Fix str conversion
* Fix set conversion
* Fix conversions
* Fix class conversion
* Fix str conversion
* Fix byte conversion
* Fix list conversion
* Fix pointer conversions
* Fix conversions
* Fix conversions
* Update tests
* Fix conversions
* Fix tuple conversion
* Fix tuple conversion
* Fix auto conversions
* Fix conversion
* Fix magics
* Update tests
* Support GPU in JIT mode
* Fix GPU+JIT
* Fix kernel filename in JIT mode
* Add __static_print__; Add earlyDefines; Various domination bugfixes; SimplifyContext RAII base handling
* Fix global static handling
* Fix float32 tests
* FIx gpu module
* Support OpenMP "collapse" option
* Add more collapse tests
* Capture generics and statics
* TraitVar handling
* Python exceptions / isinstance [wip; no_ci]
* clang-format
* Add list comparison operators
* Support empty raise in IR
* Add dict 'or' operator
* Fix repr
* Add copy module
* Fix spacing
* Use sm_30
* Python exceptions
* TypeTrait support; Fix defaultDict
* Fix earlyDefines
* Add defaultdict
* clang-format
* Fix invalid canonicalizations
* Fix empty raise
* Fix copyright
* Add Python numerics option
* Support py-numerics in math module
* Update docs
* Add static Python division / modulus
* Add static py numerics tests
* Fix staticrange/tuple; Add KwTuple.__getitem__
* clang-format
* Add gpu parameter to par
* Fix globals
* Don't init loop vars on loop collapse
* Add par-gpu tests
* Update gpu docs
* Fix isinstance check
* Remove invalid test
* Add -libdevice to set custom path [skip ci]
* Add release notes; bump version [skip ci]
* Add libdevice docs [skip ci]
Co-authored-by: Ibrahim Numanagić <ibrahimpasa@gmail.com>
|
2022-09-15 15:40:00 -04:00 |