EDIT I know this might sound stupid, but please stop voting the question down. The point has been made. No need to keep voting it down. In fact, Im pretty sure this is the most negative votes for any SO question ever.
I have access to one of the top 50 fastest computers in the world. This question I am trying to complete is for a job that is required to get done. While it might not make sense on an ordinary regular computer in the civilized world, it makes sense to do it with the hardware and capabilities I have. Please understand, that this will be done, it just depends on how fast.
As the title says.
I have made a console application to count all iterations of a Guid. That is 2^128 times.
I want to start with 00000000000000000000000000000000 and end with the iteration of fffffffffffffffffffffffffff.
Is this the fastest way to do so? Im looking for help on the most efficient code possible, just because I know it will take forever and a day to get done.
Im looking for any and all help to speed this puppy up.
WHY You Ask?
I am screen scraping web content. Sadly, the web content was not put in integer format from 0-999999. But it was actually formatted with Guids instead. I want to hit all the guids on the site to see if there is content that resides there.
Here is my main program.
static void Main(string[] args)
{
Console.WriteLine("Guid to start");
string start = Console.ReadLine();
string start1 = "00000000000000000000000000000000";
string iteration = start;
using (StreamWriter outfile = new StreamWriter(@"c:\temp\scanner\guids.txt"))
{
for (Double k = 0; k < 5316911983139663491615228241121400000d; k++)
{
Console.WriteLine(site + iteration + extension);
outfile.WriteLine(iteration);
iteration = NextLetter(iteration);
try
{//do work on the Guid
}
catch
{
outfile.WriteLine("broke");
Console.WriteLine("broke");
}
}
outfile.Close();
Console.ReadLine();
}
}
Here is the method to increment the Guid.
public static string NextLetter(string letter)
{
const string alphabet = "abcdefghijklmnopqrstuvwxyz0123456789";
if (!string.IsNullOrEmpty(letter))
{
char lastLetterInString = letter[letter.Length - 1];
// if the last letter in the string is the last letter of the alphabet
if (alphabet.IndexOf(lastLetterInString) == alphabet.Length - 1)
{
//replace the last letter in the string with the first leter of the alphbat and get the next letter for the rest of the string
return NextLetter(letter.Substring(0, letter.Length - 1)) + alphabet[0];
}
else
{
// replace the last letter in the string with the proceeding letter of the alphabet
return letter.Remove(letter.Length - 1).Insert(letter.Length - 1, (alphabet[alphabet.IndexOf(letter[letter.Length - 1]) + 1]).ToString());
}
}
//return the first letter of the alphabet
return alphabet[0].ToString();
}
Ok, let's start with a math / computer science lesson:
Let's say you have a 3 GHz machine. That's 3e+9 clock cycles every second. You have 2^128 iterations to go through.
If we were to trim your code down to 1 clock cycle per iteration (impossible btw), it would take (2^128 / 3e9) = 1.13e+29 seconds to complete.
That is 3.15e+25 hours, and 3.59e+21 years. Better get started soon--you're going to be staring at your console for a LONG time.
Bottom Line--the problem is not your algorithm. It's the problem itself. I recommend you look for a better way.
If you look at the GUID spec you will see that they are constructed from specific sub-fields. If you can take a look at some of the GUIDs in your data, you might be able to get an idea of what those sub-fields contain, and you should be able to shorten your search time. For example, one of the sub-fields is supposed to be a "spatially unique node identifier" (usually the MAC address of the computer that generated the GUID). This is probably invariant across your data set (or possibly contains distinct values from a fairly small set), so that's 48 bits you don't have to iterate through. Likewise, other sub-fields are created from timestamps, so it should be possible to put upper and lower limits on those, thereby limiting your search space. Even if the GUIDs are truly random, there are still some fields (variant and version) that will remain fixed for all GUIDS generated by the same system, so that's at least seven bits you don't have to wade through.
Drop the Console output, that makes it slow. Create a measuring test bench and get some variations running.
Chunk up the 'problem' space and let multiple threads run over different chunks.
Measure what the performance bottlenecks are (my guess would be either ram throughput or disk throughput) and speed that up.
Can't you change the GUID into a byte array and simply add 1 to the correct indexes. That should prevent a whole lot of string copying and speed it up.
Not to try to feed the fire on this madness ;), but you could use this as an opportunity to work with the Task Parallel Library -- break it up into pieces, and run it with multiple threads on a multi-core machine. (one way to do it would be to break the whole range (ooo... to zzz...) into a number of pieces equal to the number of logical cores, and have mutiple threads working on it.
Edit
Or, if you want to get even more tricky about it, work out a Map Reduce algorithm for it, and solve it on the Amazon Cloud: http://en.wikipedia.org/wiki/MapReduce
This is the wrong approach to increment GUIDs. Use the BigNumber class with a 16byte number. (Or write your own number class/struct that uses two ulong fields. It's not too hard, but it would be a lot easier in a native language with access to the carry flag.) Then use the GUID constructor that takes 16 bytes to create the GUID.
Also, I don't believe that Double has the necessary precision to increment the number you gave it by one. Did you verify that? Your NextLetter method is awful. Ditch it. And don't print anything to screen until you've found an answer.
c#
tag, while a similar but impossible to solve question (and linked to this question) has 309 upvotes. I think we need to cut @SpoiledTechie.com a breakuint x = 0; do { x++; } while (x != 0);
. That was 32 bits; next, time the 64-bit version:ulong x = 0; do { x++; } while (x != 0);
How long does that take to run on your "one of the 50 fastest computers in the world"? It should be about 4 billion times longer than the 32-bit version. The analogous loop with 96 bits should take 4 billion times longer than that, and with 128 bits, another 4 billion times longer.000,000,000,000,000,000,000
looks way unrealistic this way