Alex Saint Croix

Curious to revisit an earlier post about computations over lists versus arrays in Scala, and examine how the environment handles more difficult floating point operations with OpenCL acceleration under the hood, I rigged together this very very simple benchmark on scalacl/0.2.Beta10:

import scalacl._
import scala.math._

implicit val context = new ScalaCLContext

def testLists(num: Int): Float = {
val a = List.fromArray(Array.range(0,num))
val start = System.nanoTime
val result = a.map(s => cos(s / 100.0f).toFloat)
val end = System.nanoTime
((end - start).toFloat) / num
}

def testArrays(num: Int): Float = {
val a = Array.range(0,num)
val start = System.nanoTime
val result = a.map(s => cos(s / 100.0f).toFloat)
val end = System.nanoTime
((end - start).toFloat) / num
}

def testParallelLists(num: Int): Float = {
  val r = (0 to num).cl
  val a = r.toCLArray
  
  val start = System.nanoTime
  val result = a.map(s => cos(s / 100.0f).toFloat)
  val end = System.nanoTime
  ((end - start).toFloat) / num
}

def testSuite() = {
  val n = 10000000
  println(testLists(n))
  println(testArrays(n))
  println(testParallelLists(n))
}

testSuite()

:wq!
ListFlopParallel.scala (END) 

$ JAVA_OPTS="-Xmx1g" scala ListFlopParallel.scala

1050.4584
49.255398
22.513

I’m using an NVIDIA GeForce 330M GPU with a 48-core CUDA processor, and I suspect that there’s a significant overhead cost associated with shuttling the data between main memory and the GPU (and back). But, despite this overhead, as you can see, there’s still a ~2x speedup from pushing these floating point computations onto the GPU. Exciting stuff!

Unfortunately, in scalacl/0.2.Beta11, there’s no ability to convert functions that capture external symbols yet, so the code above doesn’t run.  Still, I’m looking forward to seeing where this project goes.  In the meantime, I plan to use good functional design principles and leave myself room to hook into OpenCL hardware acceleration later on.

  • Pure object oriented language: everything is an object.
  • Pure functional language: functions are first class objects, similar to ML or Haskell
  • Uniform object model, similar to Smalltalk and Ruby.
  • Universal nesting: Any type of object can be nested in any other type of object.
  • Uniform access principle for method invocation similar to Eiffel.
  • Actor-based concurrency inspired by Erlang.
  • Treats infix operators like functions, similar to Iswim and Smalltalk.
  • Permits function literals or blocks as parameters, allowing libraries to define control structures.

And now you can hook right into the GPU to perform parallel operations with hardware acceleration:

$ vi ScalaCLTest.scala

import scalacl._
import scala.math._

implicit val context = Context.best
// prefer CPUs ? Context.best(CPU)

val a = (0 until 100000).cl // this gives a CLRange

val result = a.map(x => cos(x / 100.0f).toFloat).zipWithIndex map { 
  case (c, i) => c * 10 + i 
} filter { 
  v => (v.toInt % 2) == 1 
}

result.foreach(println)

:wq! 
  
$ scala ScalaCLTest.scala 
11.998
13.992001
15.982005
17.968018
19.950043
21.928085
23.90216
25.872272
27.838436
29.800667
31.758974
33.71338
35.6639
37.610554
39.553364
...

I love this, it’s the first step toward really cracking into the world of GPU-accelerated parallelism and the awesome potential of Scala and OpenCL for computational intelligence purposes.  Thanks to Olivier Chafik for his continued efforts in building the ScalaCL library.