No more pink walls

    Still kickin

    Browsing Posts in Uni

    AS promised here are the results from the testing with the SIMD code. If you have no idea what I’m talking about see my previous entry here

    Results 1
    Results 2

    As can be seen there was quite a bit of variance accross the different systems. The P4 2.80 really seems to come out on top everywhere and I can only put it down to the fact that it was the only system in the group that has SSe3 capabilites. I also wish to run the tests on this machine underclocked to 1.4Ghz under linux as I believe it will yeild slightly different results.

    Also I’ve recently learned that the speed of SSE on a chip is largely independant of its core clock speed, hence the similarities thats can be seen between SSE scores.

    Stay tuned for more details

    [Update]: The final results from this work can be found at 

    For the last few weeks I’ve been playing around using Single Instruction Multiple Data (SIMD) instruction sets. More specifically I’ve been trying to do a very high level, basic comparison of Altivec and SSE as it seems like something that is both intersting and relevant considering Apples imminent move to Intel.

    However, this seemingly simple comparison has a number of problems at the practical level. Firstly, there is no chip in production that supports both Altivec and SSE, nor does it appear as if there ever will be. This immediatly rules out a direct comparison. Secondly, and this is obvious but still a necesary point, there is no single architecture (ie x86, ppc etc) that has Altivec and SSE chips, making comparison even trickier.
    Finally thie clincher, I’m an impoverished uni student so any grand hopes of testing accross many different setups was always going to be impossible.

    The Plan
    Write a simple program that serves very little purpose other than to stress the SIMD units of various chips. Obviously repetitive is the name of the game here as SIMD units come into their own when dealing with unrolling loops. In this case a pi generator was chosen and all compiling would be done in gcc-4.0, no direct ASM code was to be written.

    The Hardware
    Obviously I needed test beds capable of running Altivec and SSE. Altivec was covered with both my Mac Mini (G4 1.4ghz) and the iMac (G5 1.8ghz). SSE was taken care of by my trusty P4 2.8 (Prescott). There was, unforunately a significant clock speed difference but it was unavoidable as I do not have access to faster ppc machines or a slower x86 (with unix)

    What does it do?
    OK so I wrote a little pi generator that uses basicly the most inneficient method of calculating pi, the series 1-1/3+1/5-1/7 … ~= pi/4. Initially I wrote up a version that uses the CPU alone. No unrolling of loops, no ‘normal’ optimisations, just basic, raw CPU grunt. I then wrote up versions of this that do their work with Altivec and SSE packed vectors. These two are almost identical except for the intrinsic names (These vary between Altivec and SSE) and 1 other, probably important detail. The SSE instruction set contains a hardware divide function whereas Altivec relies upon a software implementation of this.
    The Macs were both running OSX 10.4 (Tiger) whilst the P4 has been tested in linux (Ubuntu, Hoary) and the hacked Intel version of OSX.
    The pi generator itself performs a 128,000 iteration loop 1000 times in order to complete this testing.

    The results
    OK, the results amazed, confused and annoyed me. Instead of getting the nice spread of results I was hoping for, I came out with 1 clear leader. The P4 blew the other two machines apart, both in linux and OSX. By blew away, I mean the other two couldn’t even come close. The raw CPU time for the P4 was nearly twice as fast as the Altivec enhanced time of the Mini. Of course I expected the P4 to be faster in raw CPU due to its higher clock speed, but I did NOT expect it to be faster than the Altivec version, at least not by such a significant margin.
    Unfortunately the G5 decided to pack it in during testing. It gave results that were (considerably) slower than the mini and then crapped itself. Its back with Apple as I write this.
    I will be posting ‘exact’ results tomorrow for this, though roughly this is how it broke down:
    G4 1.4 PPC Raw CPU: 7 seconds
    G4 1.4 PPC Altivec: 2.5 seconds
    P4 2.8 Raw CPU: 1.5 seconds
    P4 2.8 SSE: 0.8 seconds

    There was practically no difference between the P4 in linux and OSX (They use the same header files etc so no real surprises). The difference between the Altivec and SSE times really amazed me though. The code is practically the same in each case (just changed for the platform) and yet the difference is more than would be expected from the clock speed difference alone.
    It should also be noted that the code on the P4 was using SSE2, NOT SSE3.

    So in the hope of getting some results that are even slightly comparable, the next step is to underclock the P4 and try again. I also wish to try some older, slower x86 CPUs if I can get a hold of them.

    OK, this whole thing was aimed at being a programming learning experience rather than a comparison. These are my major notes:
    – Documentation for SSE is _terrible_! I’m sure its out there somewhere but I could find very little. Apple provide a small amount but even that is more related to migrating code (Altivec -> SSE) and there is not much on the additional features of SSE (ie the divide intrinsic and double precision variables)
    – Altivec is a ‘nicer’ interface than SSE. I’m sure this is due to Apples influence but there is a lot less ‘ugliness’ about it compared to SSE.
    – Its quite easy to get good improvement using both Altivec and SSE
    – I’ve really only scratched the surface of SIMD and its something I’d like to play with more down the line. Adding to my list of things to play with when I’ve finished this damn year.

    A lot has been said about international students lately, most of which revolve around peoples own experiences. Today the Age is running an article on the hardship faced by these studens, including racism.
    Ignoring the fact that most of the problems outlined sound applicable to nearly every university student I know, irrespective of race, what the whole thing fails to show is the reasons for such problems occuring in the first place. People these days are beyond racism purely for racisms sake (ie wanting to feel superior). If people are ‘racist’ today they typically have a reason for being so, whether this reason is acceptable to others is another issue, however it seems that people are starting to agree on these reasons (And they are not as far fetched or paranoia fuelled as they traditionally have been). Its these issues that should be addressed.
    Whilst I’m not trying to say that there is some great conspiracy in it all, I don’t think this is information that the universities would not really want out in the public. Of course they’re trying to encourage internationals simply for the dollars, why else? (its certainly not for their high quality of work) , but they don’t have to make that too obvious.

    Note: To anyone reading this who has no idea of the background I’m coming from this must all sound terribly racist. I assure you its not. It is simply my take on it as a tutor who has to contend with a number of international students for whom great allowances are being made.

    I know I don’t use this blog much but, well, its week 13 and I WANT TO WHINGE! I’m sick to death of crappy software that doesn’t do what it claims to do! Software that just does not make any logical sense! This is promising software, it just hasn’t been thought out all the way through.
    I think that this maybe a product of the open source, distributed development model. I hope not tho.

    Anyway I’ve calmed down from earlier in the evening when i was throwing shit around in my frustration. I guess its lucky I don’t keep the server in the honours room. Such anger is unusual for me, i think its a result of me not having time to fuck about with things that aren’t working when they should. I know that I can do amazing things when i work under pressure. Both my body and mind seem to slip into another mode and just go nuts. But as a consequence it seems it takes a lot less than normal for me to snap. Still I enjoy doing this once in a while, it reminds me that I can be efficient when i need to be.