Friday, November 15, 2002
( 9:40 AM ) Matt
Some Back-of-the-Head Calculations
I'm thinking that maybe the cost of broadcasting a signal to the entire distributed network the same way Gnutella does it isn't too much more expensive than the optimal strategy. First, I need to make some rather unreasonable assumptions, and if you want to generalize from there, be my guest. Let's assume that each client is connected to a bounded number of other clients, and the connections within the network are random. Now, let me define what I consider the best possible way of broadcasting a message to the entire network: When a client receives a broadcast message, it resends it to all adjacent clients which have not yet received the message, and won't receive it from any other computer. This way, there will be n transmissions in a network of size n.
What Gnutella does is to send broadcast messages to all of the clients connected to it except for the one from which it received the message. Each message has a unique identifier. If a client receives a message with an identifier which matches a previous message, the message is ignored and not forwarded to any other clients. This way, each client sends at most m-1 transmissions where the connectivity of each node is bounded by m. This means that by using Gnutella style broadcasting, transmissions are increased by at most a factor of m-1. Oops. That's pretty bad. If m is 6, then 80% of transmissions will be extraneous.
If m = O(n) then the number of hops needed on average to get from one client to another is O(1). However, this means that the number of wasted broadcast transmissions per client would increase proportionally with the size of the network. If we limit connectivity to a constant, m = O(1), then the number of hops needed on average in a randomly connected network is O(lg n) which is practically O(1). I posted a little about this on November 4.
Hmm. My conclusion is... I need to think about this more. In particular, the number of messages transmitted isn't a good measure of bandwidth. A TCP/IP connection may travel a long ways through several routers to get to its destination. Thus, a randomly connected network might not really be very efficient. If clients are encouraged to connect to other clients which are closer topologically on the network, then the expected number of hops increases to something like O(sqrt(n)) which is not nearly as good as O(lg(n)). Oh well...# -
Comments: Post a Comment