I've looked into the functions provided by ipp or mkl, but they're not worth it. They are mostly way too complex for what we are doing in our code.
More than 90& (99% before optimization) of the run time is spent within a single loop that once had about a dozen lines of C-Code, containing only simple multiplications and additions (and the very vew divisions we coudn't avoid - I think there's actually only one of them left - hm, gives me another idea...). We have parallized/vectorized as much as we could.
You can map the operations we perform to matrix operations, but due to the necessary overhead executing these even with a higly-optimized library this will not be faster than doing just the necessay low-level operations.
The latest speedup came mostly from avoiding type conversions (requiring to set the rounding mode, which is slow), a bit of optimizing the interface to the assembler-coded parts, and some playing with the C-code. It's always a tradeoff you have to make - global static variables are the fastest in many cases, but with too many of them your code becomes unreadable and unmaintainable.
I also found that apparently due to caching effects some compiler switches that worked well for previous versions weren't optimal for this code (e.g. unrolling loops).
Finally I used the same compiler and settings we use for the Linux version (a gcc-4.1) now for Windows, too (at least for this critical module), which saved some interfacing, quite some maintenance, and brought the Windows App to the speed our Linux App had before.
So - the libraries don't help us, as we don't perform standard operations (like e.g. FFT), and the latest speedup of the Windows App was due to a combination of things, where roughly half of them didn't come from the assembler coding.
I'll have another try with the Intel compiler, but I think all critical parts by now have been taken out ouf the hands of the compiler by our assembler coding anyway, so I don't expect much of it.
Since v4.24 is now the "official" Windows application, shouldn't it be removed from the Beta Test page?
Thanks for the hint. For the moment it may help people with problems automatically downloading the official App to have this at hand for manual installation. It will be removed from the Beta App page at the next update.
I've looked into the
)
I've looked into the functions provided by ipp or mkl, but they're not worth it. They are mostly way too complex for what we are doing in our code.
More than 90& (99% before optimization) of the run time is spent within a single loop that once had about a dozen lines of C-Code, containing only simple multiplications and additions (and the very vew divisions we coudn't avoid - I think there's actually only one of them left - hm, gives me another idea...). We have parallized/vectorized as much as we could.
You can map the operations we perform to matrix operations, but due to the necessary overhead executing these even with a higly-optimized library this will not be faster than doing just the necessay low-level operations.
The latest speedup came mostly from avoiding type conversions (requiring to set the rounding mode, which is slow), a bit of optimizing the interface to the assembler-coded parts, and some playing with the C-code. It's always a tradeoff you have to make - global static variables are the fastest in many cases, but with too many of them your code becomes unreadable and unmaintainable.
I also found that apparently due to caching effects some compiler switches that worked well for previous versions weren't optimal for this code (e.g. unrolling loops).
Finally I used the same compiler and settings we use for the Linux version (a gcc-4.1) now for Windows, too (at least for this critical module), which saved some interfacing, quite some maintenance, and brought the Windows App to the speed our Linux App had before.
So - the libraries don't help us, as we don't perform standard operations (like e.g. FFT), and the latest speedup of the Windows App was due to a combination of things, where roughly half of them didn't come from the assembler coding.
I'll have another try with the Intel compiler, but I think all critical parts by now have been taken out ouf the hands of the compiler by our assembler coding anyway, so I don't expect much of it.
BM
BM
RE: Since v4.24 is now the
)
Thanks for the hint. For the moment it may help people with problems automatically downloading the official App to have this at hand for manual installation. It will be removed from the Beta App page at the next update.
BM
BM