ICE vs VOP, a performance comparison.

Since ICE release last year, I often read some comparison between this new Softimage tool and Houdini.
At first glance, ICE look like Houdini VOPs. A network system to manipulate low level data (attributes).
Most of the time the comparison statement is that ICE is faster and easier than VOPs.
It is hard for me to argue that ICE is easier. Very often, we find an application easier because we just spend more time in it.
In this article I will focus on the performance side. I will try to be as neutral as I can and so you won’t know which one is my favorite  ;).

But is ICE really looks like VOPs ?

ICE is a node based system to manipulate data on points. Each ICE nodes take some input(s) and do some operations and then output the result(s).
The set of all those ICE nodes give an ICEtree, a Softimage operator.
An ICE node use the SIMD architecture and is multithread.
SIMD means single instruction multiple data. So if you need to add a vector A (a single instruction) to all your point positions (multiple data)
it will execute much faster than processing the same addition in a loop one point after the over one.
Multithreaded means that for a given task (like the vector A addition) several threads will be used at the same time (concurrently).
Here is a concret example from the Softimage SDK doc.
This way, for each nodes in an ICEtree, each time it is possible, the multithreading feature will be used. That’s the important point to understand. ICE run several little multithreaded programmes (ICE nodes) instead of one big multithreaded program (ICEtree operator).

VOPs are a little bit different. A VOP node is just a representation of a VEX function. VEX is a language designed for writing custom nodes (and Mantra shaders)
and it also support SIMD and multithreading. A set of VOPs gives a VOP operator (something like an ICEtree operator). This VOP network will “just” build some VEX code (you can read this code if you right click in the vop output node). It is a big difference compared to ICE architecture. The first time a VOP network is called, Houdini will pre-optimize the compiled VEX code and cull out any instructions which don’t contribute to the final data. It will also convert “non-varying” variable to constants. This way the VEX code is very optimised and can be even faster than a C++ Houdini custom operator !

We can clearly see now that ICE is a “compiled programs by nodes” system instead of a “source code by nodes system” like VOPs.
It looks too me that the multithreading can be much more optimised with the ICE system but to be sure, nothing worth a comparison test !

The comparison

It is not so obvious to compare those two systems. Simple math operations will be rather easy to compare as we are able do to a really similar graphs in the two
applications. But very often, ICE/VOP will need to get some data from other objects in the scene. Those geometry queries are done by some factory nodes and
it will be hard to really know how it is done under the hood. I could stick to pure mathematics operations comparison but I think it will be much more interesting
to compare scenes with common production scenario too. For the timing I will use the Houdini Performance Monitor and the ICE Performance Timer (set to Time all threads) in order to get only the ICEtree operator and VOP operator evaluation time.
Time will be display below each ICE/VOP graph pictures.

A VOP node is  just a representation of a VEX function.  A set of VOPs gives a VOP operator (something like an ICEtree operator).
VEX is a language designed for writing custom nodes (and also Mantra shaders) and it also support SIMD and can be multithreaded too.
So the major difference between ICE and VOP architecture is that the VOP network will “just” build some VEX code
(you can read this code if you right click in the vop outputnode).
The first time a VOP network is called, Houdini/mantra will pre-optimize the compiled VEX code and
cull out any instructions which don’t contribute to the final data. It will also convert “non-varying” variable to constants.

And now the empiric test !

Test A : vectorOperations_simple.scn/hip

Some particles positions are modified using some dot and cross products between their normalized position and a normalized varying vector.
This is a pure math operation comparison.

vectorOperation_ogl

10000000 particles

vectorOperation_ICEtree

ICE : 1820 ms

vectorOperation_VOPs

VEX : 3270 ms

ICE is twice faster to do vector processing !
But let see the other scene test…

————————————————————————————————————————————————————————-

Test B : vectorOperations_basicTapper.scn/hip

In this test, we scaled down the X and Z components of each point position along the Y axis of a box (200X200X200 points).
A Get Bounding Box function is called to re-map the Y value of each points between zero and one.

vectorOperation_tapper_ogl

vectorOperation_tapper_ICEtree

ICE : 29.6 ms

vectorOperation_tapper_VOPs

VEX : 67.3 ms

The only difference is that ICE use “Get Minimum in Set” and “Get Maximum in Set” nodes to return a scalar value. Houdini use the “getbbox” function
to directly return the min and max corner of the box. In this test ICE is still twice faster.

————————————————————————————————————————————————————————-

Test C : pcloudQuery.scn/hip
This test compare the “Get Closest Points” ICE node to the set of VOPs  “pcopen”, “pciterate” and “pcimport”.
It is very similar to the pcBulletHoles file from the Houdini exchange.
A rather dense mesh (40000 pts) is displaced when some particles (400000 in the test)  are very closed to its surface.
The search radius is set to 0.1 (for a grid of 4.2 by 4.2 units), and maximum number of points is set to 10.

pcloud_query_ICEtree

ICE : 2741 ms

pcloud_query_VOPs_a

VEX : 1009 ms

This time Houdini is more than twice  faster ! Lets try to understand why.

Checking timing for each ICEtree nodes, it is clear that the bottleneck came from the Get Closest Points node  as it takes 95% of total processing.
From this node,  you get a point locator data. I think it is a Softimage concept only. This kind of object (programmabily speaking) represent a location on the surface of a geometry and compute the linear variation of each attributes along this surface. This is a very useful feature but for access to points clouds it looks like it is not so fast. I don’t think that the locator is trying to interpolate between points (as there is no surface in a point cloud), but it’s a fact, it works slower than in Houdini to get the points data.

In Houdini, we use the pcopen function to get  a bunch of points (a handle) in a radius around a position, and pcimport function to get attributes from the points in the handle. Those specialized functions are really fast.

If ICE is slower in this test, it is because of the Get Closest Points implementation, not because of the ICE graph architecture.

————————————————————————————————————————————————————————-

Test D : raycast_reflexionWmap.scn/hip

A camera ray is compute and set for each grid points. Then, to test some space transformations operations, I’m using the point reference frame (X and Z  axis tangent to the point surface and Y axis parallel to the point normal) to compute a reflection ray and then find the intersection with the elephant.  The test use “Raycast” ICE node and “Intersect” VOP node.

reflection_wmap_OGL

reflection_wmap_ICEtree

ICE : 32247 ms using space transformation to get the reflection vector

specular_reflexion_VOPs

VEX : 975 ms using space transformation to get the reflection vector

————————————————————————————————————————————————————————-

The difference is huge in favor of Houdini. This time, the bottleneck  is in the Get > self.PointReferenceFrame operation. I was rather surprised because “PointReferenceFrame”  is a factory attribute in Softimage (in Houdini I built it from scratch).

specular_reflexion_ICEAxisAngle

ICE : 402 ms to compute reflection using axis and angle

So I rebuild the test with an other reflection vector function. Using the normal axis to rotate the camera ray to 180 degree .

specular_reflexion_VOPsAxisAngle

VEX : 850 ms to compute reflection using axis and angle

This time ICE is twice faster than Houdini !
On a side note, this setup could be handy to quickly fine tune a reflection
position on the surface of an object.

————————————————————————————————————————————————————————-

Test E : pcloudQuery_flowAroundBunny.scn/hip

This is a particle simulation test. 200 000 particles are emitted per seconds (frame rate :29.97).
Each particles looking for the closest point on the bunny surface. From this closest point, some vector operations are done to get the “flow around surface” behaviour.

This time I choose to split the Houdini setup in two parts in order to use the “Attribute Transfert” POP node. This node is often use in Houdini when you need to get some interpolated data on the object surface and unfortunately I don’t think we can use a similar operation inside a VOP graph.
So in Houdini, I first get the closest bunny points attributes (P and N) with the “Attribute Transfert” node and then use a VOP to edit the particle velocity and get the flow effect.

bunny_flow_ogl

bunny_flow_ICEtree

ICE : 12 seconds (to go to frame 30).

bunny_flow_VOPs

Houdini : 59 seconds (to go to frame 30) !

————————————————————————————————————————————————————————-

On this scenario, ICE is way much faster !  In Houdini the bootleneck is precisely the “Attribute Transfert” POP node.
So lets give it an other try in Houdini using the pcopen, pciterate and pcimport functions. The graph is much longer to setup that in ICE (where you only need one node…), but it is not exactly the same as it doesn’t return interpolated attributes like the ICE locator object on a geometry surface ;).

bunny_flow_VOPs2

VEX : 9 seconds (to go to frame 30) 

————————————————————————————————————————————————————————-

Again for the specific case of point cloud only query, Houdini is faster than ICE. I also test in ICE replacing the Closest Location by a Closest Points node but it was twice slower…

————————————————————————————————————————————————————————-

Conclusion :

The original comparison idea came from a discussion with Thiago Costa about ICE and Houdini speed. Thiago explained me that we can think of ICE like a GPU engine. I think it is an interesting analogy. ICE is a specialized application for vector processing. Each time you will need to do this kind of job and nothing else, it will be very fast.
VOPs are maybe less optimized for vector processing, however, for heavy particles queries, the Houdini point clouds functions are also very specialized and seem to win in this particular field (no pun intended 😉 ).

The goal of this article was not to prove that an application is faster than the other one but to show that it is no so simple to compare both systems as timing results depends on what you need to do and how can you do it.
Furthermore, Houdini and Softimage are both rather huge applications and so can’t be describe to just a VOPs or ICE system.

I hope you enjoy this (rather long) flight.

Cheers !

Guillaume Laforge


Advertisements

21 responses to “ICE vs VOP, a performance comparison.

  1. hello

    it might also be interesting to know if either application use SSE3 (and similarly the upcoming AVX) to optimize performance in vector manipulation.

    But a great flight, 1st class 🙂

  2. Hi Guillaume,

    Great article. That really opens up an interesting debate regarding performance vs ease of use.

    Yep, I still find ICE incredibly easy for quickly prototyping effects, whereas dealing with the technical complexities in Houdini can get in the way of the creative process. I think your final test with the point cloud illustrates that.

    Looking forward to reading more of your articles.

    (BTW, just so you know, this post didn’t show up in my RSS reader. I think it’s because the publish date is before the previous post on La Maison’s showreel.)

  3. Hi Andy, I published this post before the la maison one in “password protected” mode. I didn’t think about the RSS thing. At least this article still keeps a little bit of secret :).

  4. Great article Guillaume! 🙂

  5. Thanks a lot for this article.

    cheers
    Martin

  6. @TJ:
    About using AVX or SSE3, if it’s C++, then you can be pretty much sure you can use it in ICE…
    In the case of SS3 it’s at compilation time and people already did ICE nodes that use SS3.
    I don’t know much about AVX thou…

    To minimize, an ICE graph is a description of how compiled code (dlls) are passing data to the next port.
    It has all this optimization structures and the multi-threading comes for free if you follow the rules.
    But the way it passes data to the nodes is very basic, there’s nothing “under the hood”.

    For example:
    Some of us have compiled single threaded ICE nodes and implemented our own multi-threaded calls inside of the code, using TBB, openMP and other threading libraries. (Bypassing the ICE multi-threading feature completely).

    I’ve also done some tests with implementing an ICE node that calls a GPU device through CUDA, (using Thurst, a C++ interface for CUDA).

    This is an example of how ICE can be extended in many different ways.
    I don’t know how that goes with VEX so I will let the VEX experts comment on this 🙂

  7. Interesting stuff Thiago, Would be good to know the details of any performance comparisons you might have done between the different multithreading APIs and how it compares the performance of a single threaded node.

  8. > This is an example of how ICE can be extended in many different ways.
    > I don’t know how that goes with VEX so I will let the VEX experts comment on this

    As any VOP is a VEX function[s] wrapped into a visual form and any “atomic” VEX function is a C++ code one can easily write a function that uses SSE/CUDA or even other programming languages let’s say OCaml.

  9. @h
    that makes a lot of sense, thanks for sharing.

  10. Great article Guillaume.
    I am really interested to look at these comparasions between the two models.
    I a mostly a Houdin user and some of the ICE features looks really intersting, specailly the multithreading one.
    Just curious, are you running the VOPs in 1 thread per cpu?
    Is not still runnning really well and depending of the Houdini version you can get a great speed improvement or a crash 🙂
    As far as I know one of the reasons because Houdini still not have a proper multithreading architecture is because all of these differents APIs and tool that are appearing to develop for multicores/GPU devices. Seems they are waiting to know who is going to be the king of the hill and the standar in multithreading programming before doing the whole thing from scratch.
    Let´s see

  11. Hi Pablo,

    Great you like the article :).

    Yes, for all the VOP networks in the test, I set the number of threads to 8 (the maximum on my workstation). It was okay as all the operations in those vops support multithreading.

    Yes, it looks like the “old” Houdini toolsets are not suit for multithreading (and some vex functions can crash too) but it is the same for several non-ICE operators in Softimage 😉 .

  12. Definitily the multithreading is an area SESI needs to address as soon as possible.
    In the other side cos yo uare talking a lot about point clouds.
    I definitily have to agree with you, I am using them more and mre for a lot of things, and is amazing how you can make for instance a skin wrapt like tool only using point clouds and is blazing fast.
    From all the opinions I have heard abouty ICE I got the impression that is more like and add-on in XSI, I mean in Houdini the whole architecture, from VOPs to SOP or anything else works as a homogeneus thing, whereas ICE seems to be something appart in Softimage that ease the access to some internals of the tool, s oyou don’t need to code everything as in many other tools.
    Am I wrong???

  13. ICE gives access to geometry data without scripting and it is fast. I wouldn’t call it an addon :). Of course there is still room for improvement as it is a rather young system. I agree, Houdini VOPs are much more “embedded” in Houdini as you can use VEX everywhere. ICE doesn’t work with xsi hair system and xsi compositing system for example. And of course it doesn’t work to write some shaders like in Houdini.

  14. Excellent!! I’m bummed I missed this one.

  15. I appreciate the effort that went into this post. Thank you for this interesting data.

  16. Salut Guillaume,

    Great article, it helped for me to clear some of the finer architectural points about ICE. I continue to look forward to the day, I can design an entire character rig setup using a visual program architecture such as ICE, but get the optimization, and threading that certain TDs at large studios have access through proprieatary software.

    I would be especially curious to see, once enough people have setup fully rigged ICEd network characters, what kind of threading happens when, lets say 4 characters are in a shot.

    Very nice explaination of the ICE and Houdini nodal network.
    THX
    RSM

  17. Raffaele,
    Softimage evaluates the scene graph linearly. It runs through the scene graph looking for an order to resolve things, for example a Cube inside of a model would have to know the transformation of the model to know the transformation of the cube itself.
    Since the cube transformation depends on the Model transformation, this becomes a very linear process.

    eg: 100 bone chain would have to go one by one, finding the transform of the parent to solve the child. Otherwise you can’t solve the child.
    This is the nature of IK solvers actually and a lot of them are not threadable by design.

    ICE doesn’t enforce any of that, an ICE Kine graph is evaluated as any other ICEgraph.
    The operator creates these dynamic dependencies and solve things “on-demand”.
    Let’s say a cube with an ICEgraph request the transform of it’s parent, the moment this transform is requested it will call the parent, resolve it and give back to the cube.

    So theres no linear order to resolve transformations anymore.

    Most of the operations don’t need much memory allocation as a lot of these computations are faster if you compute it again than if you allocate read/write memory.
    (Although there’s caching all over the place and optimizations that I don’t even know to make the best use of resources, it’s pretty much like that)

    So all this to say that optimization is now work of TD’s and who design those rigs.

    So your ICElogic has to be “threadeable” to be efficient.
    If you design something that has to know the transform of all objects in the scene to resolve it’s own position in space, then it’s not really efficient.
    It’s on everyone’s hands now to design efficient rigs that can be solved non-linearly and all that.

    Now what you can account for is in situations where for example you solve all the 4 characters in one ICEgraph, then each character could be resolved in one different core. So that’s what ICE would doing for you naturally.
    Maybe a lot of these computations will be cached and optimized, and no memory will have to be allocated to resolve these 4 characters.
    If there’s a tail, maybe this tail is resolved as a spring system or what not…
    So all sorts of things are possible now instead of having to resort to building parent/child dependencies to make things move.

    If your IK solver is iterative then ICE can’t do much except solve 10.000 of those IKs at the same time.
    Which is big deal of course because Softimage wouldn’t be able to do this without ICE if it had to process the scene graph linearly.

    To finish this long comment, all this power is open to everyone and some people will do Rigid Body solvers, some people will make their liquid solvers transform objects, etc.. Kine is the new playground now.

    -thiago

  18. Great Article Guillaume 🙂

  19. Hi.
    I found this a very interesting test. I’m learning Houdini, and i’m starting also to use ICE.
    I was following the tests and trying them at the same time. But I wasn’t able to fins some nodes that are present on your screenshots, like, for example, on the reflection test, the ‘Get Camera Ray’, on ICE, and the VOP node Cmera_ray, on Houdini. Have you made them yourself? If so, can you tell how?

    Keep the EXCELLENT work, and thanks!!

  20. HI, it’s now 2014, maybe a good time for an update to this great article? Some things have changes in both software packages.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s