Dilbert minus Dilberts boss

Alas, when I took Cuil for a spin I found that “David Olive” – the name I share with a PhD in theoretical physics in Swansea, Wales, and a computer-repair geek in Perth, Australia, among others – turned up 381,000 times on Google, 22.3 million times on Yahoo and zero times on Cuil, which also couldn’t find Stephen Harper.
…
Back on ChrisBrogan.com, they’re still cavilling that a Cuil search for “Jaguar” turned up nothing, that other searches yielded long-abandoned Web pages, and that even Cuil’s “About Us” link didn’t work. “In this biz you get one chance,” wrote one of the more charitable reviewers. “They blew it. Back to Google.”
And thus, my previous characterisation of Cuil was right.
This is your fault, dear reader, so many of you have contacted me over this passed week about Cuil.com that I cant just ignore it anymore. Or maybe its my fault for giving all of my readers my MSN/AIM, regardless someone is at fault. Over dramatic? Not quite, I was hoping that I could just ignore and let it all blow over. After all, in a few months reality will set in with the investors and they will start demanding that Cuil turns a profit; after that they will have to reevaluate their ‘Cuil analyzes the Web, not its users’ slogan.
If they don’t start analyzing search patterns for marketers, then they will have to add ads, or sell their search technology to companies for private indexing, or something else for money. And if they don’t do all of the above then they wont beat Google’s profit margin. It all comes down to money. How does Cuil make money? Venture capital, and once that runs out… then they will make another business plan. How does Cuil attract visitors? Its doing something that has already been done, but with a nice web2.0 AJAX UI; and don’t get me wrong it’s a perfectly fine mode of attracting new customers, if your UI sucks no one will use it. However, if you focus solely on UI with no substance your users will abandon you.
I’m reminded of a story I heard earlier today about Xserver horror stories, it had a really nice admin GUI but that didn’t stop it from bugging out in mixed environments and failing to read HFS+ metadata properly; eventually it got so bad that the hapless administrator had to switch to CentOS. How is that relevant? No matter how much fancy AJAX you put on the top, if you return nonsensical results your userbase will eventually abandon you. Common searches that I run on Google returned zero results on Cuil, examples include:
I could go on!
Now these may seem like facetious examples, but they are vary important. While searching Cuil I noticed that it doesn’t warn you about typos - after reading thus far, you should see that its a pretty important feature for me. The “Am I safe from Christians?” claimed to have 124,744,479 results, but only one showed up in the search results, where are the others hiding because I had safe search off? or is that some kind of bug? Why is it that I can coax more search results by rearranging word positions, what happened to double quotes if you wanted to search for specific phrases?
These are problems that Cuil will have to work out before I’ll be willing to consider switching. And as for the source of income? As a user I could really care less, I do use Google after all. But as an IT professional I don’t like wildcards, every failed Bubble(1.0/2.0/3.0) company ruins investor confidence even if these companies don’t IPO anymore. Call me 1980’s but I prefer companies who say “These are our products: a, b, and c. They do: x, y, and z. They can be used by: r, b, and g. The unit price is free, average, and expensive. That is how we make money”.
Here is the code that I have been using to play with random numbers these past few days. It comes with the 3D thing and a few statistical tests. It will compile on any platform that has some form of GLUT.
Compile and run on Linux:
On Mac OSX(thanks Aaron):
But you may have to adjust the header files(Instructions in the source)
Click and drag to rotate.
keys ‘1′,’2′, and ‘3′ manipulate the points in various ways
‘[' and ']‘ increase and decrease the number of dots
‘+’ and ‘-’ zoom out and in.
And enjoy, I don’t plan on touching this code again unless someone finds a bug; but I would appreciate it if anyone with a mac could send back any interesting results.
This reminds me of some 3D tangent graph, I guess I’ll have to search Mathworld for something like this.. I don’t get whats with C library implementers and their fetish for implementing poor Rand() functions. As you can see Linux isn’t quite as bad as windows, but there is still room for improvement. Once I fix up my program a bit I’ll try the Linux API random number gen, but a programmer shouldn’t have to make calls to the OS to get the proper implementation of any function as it reduces portability.
On my (really long)todo list is to write a practical application that breaks/doesn’t work well with poor random numbers, once I get that done ill release the source along with the source to this program. That way I look at tad less crazy because I can demonstrate how this applies to the real world.

Us c++ programmers are forced to contend ourselves with a sub par rand() function. Once I get more time, ill be coding some functions to check how predictable these numbers actually are.
I had to choose between getting a blackberry or an EEE. Ya, I choose the EEE. I loaded it up with Aircrack-ng, cowpatty, nmap, metasploit, and a few others. I plan on spray painting it too once I’m confident that all the hardware isn’t borked.

I forget where I saw this originally, but it still amazes me that a major math error like this is still present in modern CPU’s.
int main(){
for(float i=1.01;i<=5.05;i=i+1.01){
cout << i << ", ";
}
cout << endl;
}
The output is:
1.01, 2.02, 3.03, 4.04,
Due to an arithmetic error in adding floating points in binary. Go ahead, try it.
Another Milestone, 128 posts and I still haven’t been kicked off the internet due to lack of lulz. SUCCESS!
Now onto business. I wrote that 3d program to get my fingers wet with OpenGL(yet again) and to test out the Marsaglia effect, where random numbers become not-so random when graphed in 3D.
Interesting links:
Ben had a stroke of genius(and also a stroke, but Ill save that for another post) by suggesting I multiply two contiguous random numbers in the Microsoft random number set together and then modulo that by RANDMAX. It produced the following.

Pretty good eh? I couldn’t construct a visualization with a reoccurring pattern and the numeric series passed a few of the Die Hard tests that I implemented in my program. I thought it was fool proof until I realized that you could attack this based on the flaws in the initial PRNG. After all, Seq[t]*Seq[t+1] only requires you to break that weak random number generator.
After collecting a few other random number lists(including QBASIC’s random sequence), I started looking into academic PRNG algorithms and as it turns out, there are quite a few simple to implement ones. Here is ‘xor and shift’ designed by George Marsaglia:
Here is an example with k=5, period about 2^160,
one of the fastest long period RNGs, returns more than
120 million random 32-bit integers/second (1.8MHz CPU),
seems to pass all tests:int x=123456789,y=362436069,z=521288629,w=88675123,v=886756453;
/* replace default x,y,z,w,v with five random seed values in calling program */
int xorshift() {
int t;
t=(x^(x>>7));
x=y;
y=z;
z=w;
w=v;
v=(v^(v<<6))^(t^(t<<13));
return (y+y+1)*v;
}-geo@stat.fsu.edu
And how well does that PRNG fair?

Hey, that looks exactly like that badPRNG * badPRNG discussed above, in fact the box is in the exact same position which could probably lead to a little bit of confusion. But they are completely different, one was generated by multiplying two predictable positions of a bad PRNG together -thus providing a thin veil of randomness - and the other was produced by an actually ‘good’ PRNG.
I probably don’t need to explain the ramifications of a poor random number generator as my audience is mostly tech savvy[1]; and I also shouldn’t have to explain how simple the xorshift algorithm above is, and yet Microsoft decided to go with ’something else’. Recently a bug was discovered in Nmap where the random host generation[2] would start duplicating IP’s after ~500 and would reach 50% duplication around ~1000. The problem Turns out to be Microsoft using a legacy PRNG for their standard c/c++ rand() and a ‘better’ PRNG for their proprietary rand_s().
Alas, this isn’t an anti-Microsoft rant, and later next week I will be putting Linux, IRIX, and possibly others to the same task.
Until then.
[1] My audience consists of my friends, random people from the Google summer of code mailing list, a Godaddy employee, and my father - who doesn’t understand a thing I write about.
[2] Thread here. If your interested in the math side of things keep reading this thread, Kris Katterjohn, Jah, and Brandon Enright all post extremely insightful comments

The bounding box is the the max value generated by Microsoft’s pseudo random number generator, 2,147,483,647. The equation I used to generate this is as follows:
x = Seq[t] - Seq[t-1]
y = Seq[t-1] - Seq[t-2]
z = Seq[t-1] - Seq[t-3]
It may be hard to see from the image but all the vertices could be be bound by a rectangular parallelepiped that comfortably sits within the bounds of the PRNG limit. Why aren’t any random numbers generated near any of the other six corners?
Ill leave that question open till I port my program to Linux.