CUDA, Stream Computing and Ct

Anonymous
Topic 13678

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

CUDA, Stream Computing and Ct

Quote:
Quote:

I got some code from a masters student who worked on porting the FStat engine to CUDA. Looks like a factor of 7 speedup, but he's still struggling with the few calculations in there that require double precision. There might be an App some time during S5R4.

BM

If double precision is a major req you could just limit cuda to the G200 series of cards. Unlike their predecessors they have 64bit FPUs.


There are rather few of them. Right now I'm not sure that supporting GPU is worth the effort at all. Anyway, I'm pretty sure that the remaining issues can be resolved by emulating the double precision e.g. with two floats or a float and an int. But first you'll have to find out what precisely goes wrong, and that's where we're stuck atm.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: Right now I'm not

Quote:
Quote:
Right now I'm not sure that supporting GPU is worth the effort at all.
BM

How come? Will not CUDA become more sophisticated as well as the processing power continue to outperform serialised instruction computing on the x86 architecture? Doesn't that imply, that parallelisation of the e@h-app is rather tricky while parallelisation is key to unlocking shed loads of processing?

I am sorry that I do not understand the precise problems (e.g. double float VS int+float), however, from a long term perspective, I would say, that harnissing the power of GPUs holds more potential than harnessing the power of CPUs.

BM or somebody else equally knowledgable (AkosF, etc.): Please, is it possible to explain why Folding@home has managed to get a GPU client working whereas e@h proves to be difficult? Please explain from the point of the architecture of the apps :)

Thank you ever so much!


There is no standard for GPU computing (yet). Picking one particular model: how many Einstein@home participants do have an NVidia Quadro card that they want to actually use for crunching? Remember that displaying anything is not (yet) possible when using the GPU for numerical calculations.

As far as I understand the Folding@home application is based on Brook or some similar higher level language, the Einstein@home application is (currently) not. Our "Fstat engine" could be thought of as an FFT for narrow frequency bands. It's actually possible to use standard FFT implementations to calculate it, but in the current framework this would be rather inefficient. The current code was chosen for Einstein@home because it allows us to split the frequency bands into many small pieces (workunits), keeping computing time and data transfer volume within the bounds of a volunteer computing project.

Pinkesh Patel (a LSC member) is working on a program that actually uses standard FFT algorithms (I think with little modifications) for calculating the F-Statistic, but his code isn't ready to be used yet (at least not on E@H), using it would require a completely different search- and workunit design, and it would be much more demanding for machines and their connection to the servers than what we currently expect our participants to have.

I definitely think that using high-level languages / libraries like Brook that have efficient implementations for every platform is the way to go in the future, but for the moment (i.e. S5R4) we need to stick to what we have.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: I understand that even

Quote:
I understand that even for Folding@Home, the workunits crunched by the GPU beta clients are different from those for the other platforms. But they did manage to do visualization and GPU processing at the same time now, so that you can still use your PC's video capabilities while crunching, which should improve acceptance.


That's quite amazing. I've been told that this is impossible.

Actually running a second Application (and Workunits) on the same project is quite possible on BOINC, though I don't know how many projects actually do this (I could imagine Leiden Classical). Erik Korpela is visiting the AEI this week, he told us that SETI@home will run Astropulse as a second Application some time soon. We're currently looking into implementing it, it might become an option for Einstein@home, too. This way we could actually run a "stream computing" search in parallel.

BM

BM

Bernd Machenschalk
Bernd Machenschalk
Joined: 15 Oct 04
Posts: 2,684
Credit: 25,950,161
RAC: 34,820

RE: RE: RE: I

Quote:
Quote:
Quote:
I understand that even for Folding@Home, the workunits crunched by the GPU beta clients are different from those for the other platforms. But they did manage to do visualization and GPU processing at the same time now, so that you can still use your PC's video capabilities while crunching, which should improve acceptance.

That's quite amazing. I've been told that this is impossible.

At least for the ATI variant. Seems to be a recent change tho, after Folding@Home's GPU client switched from a DirectX driven API to the "CAL" abstraction layer:

http://folding.stanford.edu/English/FAQ-ATI2#ntoc23

CU
Bikeman


I see. The information I got apparently was bound to CUDA / NVidia.

BM

BM

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.