Find an efficient way to create network maps and gather statistics without changing the gnutella protocol.
Experiment:
After testing Gnucleus’s network browse feature (Example) I thought of a new way to surf the network. Since gnucleus sends an information page to any browser that tries to connect to it, why not use a web spider as a means of collecting information from each node. After the web spider downloads all the information it can find, a simple parser could create statistics and output a data file to a graphing program.
Testing:
As soon as I found a decent (and
free) web spider, I tested my theory, and it worked excellent. It took a few tries to get through to the
first hop of nodes (firewalled nodes can’t send info pages) but once it got
past the 2nd hop, it kept going. I finally stopped it after collecting data for a few hundred nodes. In
theory you could probably run it as long as your want, and I will be trying
that shortly to see if it ever stops. So far the largest data set I have
collected was about 80,000 nodes in half an hr (That’s way more than the 10 or
20 thousand that the common user can see) See the stat file here:
http://home.attbi.com/~gregory.bray/UltrapeerStats.html
Procedures:
I have a “working” system now, but there were a few problems (and still are).First I use winHttrack (open source web spider) to save the data to html files. Then I have been working on creating a html parser (in visual basic because I’m not a programmer!) that takes the html files and creates GDL map files out of them (as well as some statistics).Then I use graphing software to create the actual node maps. I started out using Neato the same software that gnucleus uses, but it is very slow at creating large maps. I now use Aisee to create the node maps, and even though it does not create as clean of maps, it is much faster and can handle much larger maps (a full 7 hop map in less then 20 mins)
Examples:
Here are some samples of what I have been able to create so far.
Simple small graph:
Same Graph, different colors, color by hop, label hops, and label node by node type:
Same graph, Tree layout algorithm, Color by hop, label by filename (host name)
Supernode Network: Statistic file http://home.attbi.com/~gregory.bray/superstats.html
Ultrapeer Network: Statistic file http://home.attbi.com/~gregory.bray/ultrastats.html
Whats next?:
There still is a lot that can be done. The most interesting, which I would like to work on next, would be an interactive node map. Neato has the option to create index images so that when you click on a node it will open a web page. I would like to be able to create web maps that would pull up the actual index file for the node you click on. Also are a lot of other little features that I keep thinking of.
So, how can I make maps like this?:
I just finished updating the parser. I fixed some bugs and added a couple of features. It can be downloaded here:
|
Links:
Winhttrack
http://www.httrack.com/
Gnucleus
http://www.gnucleus.net
Shareaza
http://www.shareaza.com
Aisee
http://www.aisee.com
Questions or comments: