I've been building a C++ tensor library that tries to bring the ergonomics of NumPy/PyTorch to native code. The main goals were:
- Familiar API (operator overloading, method chaining, same function names)
- Actual GPU acceleration via Metal (not just matmul, but all ops)
- Cross-platform with SIMD on x86/ARM/RISC-V via xsimd
- einops-style rearrange/reduce patterns
It started because I wanted NumPy's API but needed to deploy on edge devices without Python. Ended up going deeper than expected (28k LOC+) into BLAS backends, memory views, and GPU kernels.
Some things I'm genuinely unsure about and would love feedback on: - Does Axiom actually seem useful? - What features might make you use it?
GitHub: https://github.com/frikallo/axiom
Happy to answer questions about the implementation.
loading...