Pending ideas
From The Okopipi Wiki
|
Note to reporters: The information contained on this wiki page is not official documentation. |
These are ideas that have been suggested but not yet agreed upon. Please add your ideas here!
Contents |
Network Design
Servers
- The main server(s) do not have static IP addresses. During periodic intervals, server(s) will initiate secure connections with the servers owned and operated by Okopipi administrators and download the spam reports. After the transaction has been completed, the IP addresses of the servers are reset. (Caspian)
- Handlers owned and operated by Okopipi don't forward. Instead, they wait until the main server(s) establish a connection with them
- This concept could also be applied to the handlers owned by Okopipi. These handlers won't have static IP addresses and will open secure connections to trusted handlers.
- Changing IP addresses only moves the point of failure elsewhere -- it doesn't remove it. Somewhere, someone has to have a known IP address. Otherwise none of the nodes will be able to find each other. If the admin nodes aren't static, then the trusted handlers must be static. If the trusted handlers aren't static then the next level of nodes must be static. Static IPs are a vulnerability but you don't gain anything by pushing the vulnerability around the network. Yes, every single node can be dynamic but then the vulnerability becomes DNS. (samc 11:15 EDT 2006-05-26)
- Changing IP addresses has another downside -- the next person to get the old IP could be the target of an attack. This is the unfair side-effect of DNS RBLs like SORBS. Given the ferocity with which spammers are likely to attack this network, passing the buck isn't very friendly. (samc 11:15 EDT 2006-05-26)
- Another way of protecting the servers is shown by a

, where there are unidentified Administrator nodes that connect to the hidden servers. (added by tmey)
- I had a similar idea to this. My network design was on the basis that most end users will have some reason why they cannot accept communications directly such as firewalls, routers and so on.
So what does this do to the network?
It means that the network as shown will not work as illustrated. Imagine a two tier system.
The top tier is made up of Nodes, irrespective of actual role in the overall system. A Node is simply a machine that is able to accept connections from the outside world. The bottom tier is made up of Basic Clients that are able to initiate connections to Nodes.
A Node will have one of three roles, Administrator, Handler and Standard. The Standard node would be the majority of the nodes. Admin and Handler Nodes would have the ability to insert scripts and optout launches into the network. Admin Nodes would also have the power to promote and demote Nodes as well as blacklist nodes and clients for misbehaviour. All Nodes would still perform the functions of a Basic Client.
Communications between the Nodes would be based around messages passed around the Node Layer. Each node would pass the message on to 5-10 other nodes. The Node will use a list of 10 node.okopipi.org addresses for say 3 hours. Each message would have a hop counter, unique identifier and digitally signed. Messages inserted into the Node Layer would have a random hop counter of between 1 and 10. As a message passes through a Node the hop counter would be increased by a random number between 1 and 5. Any message recieved with a hop counter over 100 would be processed but not passed on. Any message previously seen by the node would have the hop count increased by a random number between 10 and 20. A pit fall of this is that of you received enough low hop count messages you may be able to guess which ones are injection points.
How do the Nodes and Basic Clients know where to find each other? I suggest that multiple DNS servers be setup, or DNS services used, for the Okopipi domain (A possible weak point but bear with me here). All Nodes would register themselves (perhaps a la No-IP.com) as node.okopipi.org. The DNS servers would be setup to issue a random registered Node IP when requested. BUT if a particular source requests node.okopipi.org excessively then it will be given a 127.0.0.1 response (eg 10 requests in 15 min the 11th request will result in 127.0.0.1 response for 15 min. Each new request from the same source in the 15min will restart the 15min counter. If this happens more than 4 times the blocked for 24 hours. This would apply to registered Nodes as well.).--Hal9000 03:39, 28 May 2006 (PDT)
- DNS is not a good idea on reflection. Perhaps the messenging system mooted could be used to pass around new node addresses. This would effectively mean we would be running our own miniDNS system. Addresses would be updated by the client itself (a la no-ip.com clients). Where a node or client misbehaves then it could be quarantined by passing messages around which would raise a quarantined flag against the internal DNS entry and connections would be refused. A Node would typically store only client addresses and 30 other node addresses. The clients would use say 10 node addresses at a time with a TTL of say 1 to 7 days. When the addresses expires the client would request a new one from the fisrt node it could connect to. --Hal9000 03:56, 28 May 2006 (PDT)
- Because there are no server per se, just Nodes that perform these functions. If you consider the mooted means of communication is to send a message with no identified origin, but with valid certificates/signatures. Each node receives the message, processes where appropriate (this may include storing for Basic Client consumption), then passes it on to 5-10 other nodes. --Hal9000 04:41, 26 May 2006 (PDT)
- I would suggest that the "servers" would simply be Nodes in the above mooted Node layer. They would respond exactly the same way as a Standard or Handler Node. This would minimise the effectiveness of DOS attacks because there would be no way of knowing what the systems role actually is. --Hal9000 04:41, 26 May 2006 (PDT)
Clients
- Use I2P instead of Tor, since I2P is geared more towards general bulk data transfer. Each client that runs the I2P ``proxy automatically functions like a Tor exit node. Since we will probably be ``reporting a large amount of spam, this may be a necessity. M1t0s1s 22:39, 15 July 2006 (PDT)
- Clients will also serve as mini-handlers to help route spam reports. It's the same concept of "seeding" with Bittorent -- you benefit from this service, so you donate a bit of your bandwidth to the cause. (Caspian)
- Clients should click the unsubscribe links within emails and report the spam to the appropriate authorities -- so that even if all the servers are taken down, Okopipi will still work. Spam reports will still be sent for analysis by Okopipi's servers and/or staff. (Caspian)
- If clients can't click the unsubscribe links (link is hidden within an image, etc.), the client will note this in the spam report and software on the server will take care of this.
- Clients should be able to report SPAM to multiple systems. If the system uses a reporting email address a la SpamCop then the ability to enter this in the client preferances should be provided.
- Hooks to/from Outlook, Thunderbird, Firefox/Mozilla, IE, Firetrust Mailwasher and so on, to report SPAM should be part of the client from day one.
- Writing these extensions will be trickier than it was for BlueFrog -- we aren't forwarding all email to a central email account, but storing it to the file system. Does anyone know if we'll run into sandbox issues around that? (Secondwheel)
- Actually, clients don't need to report spam at all. All that's needed to track spam is to open a few email accounts, publish them and wait for the spam to arrive. This can save a tremendous amount of work - if clients don't report anything, spammers can't flood with bogus reports.(Gershon)
--Hal9000 04:41, 26 May 2006 (PDT)
- A Basic Client is unable to receive messages (due to firewalls, routers, dialup etc) from the Node Layer except by querying a Node for new scripts and submitting aggregated SPAM information and SPAM where matches are made for a SPAM call (where Admin and Super Nodes call for a particular type of SPAM for further examination).
Also the Basic Client preprocesses the SPAM looking for origin of message, URL's and any other information the Okopipi system is looking for. This information is stored locally and where a match is made to a script already downloaded then the opt-out process is triggered on the client.
Where a script is released for a particular SPAM which the client has previously reported the Client will again be allowed to execute the opt-out script. this will be triggered by the Basic Client going to a node and saying "Are there any new scripts I can download?" to which the Node being queries will then transmit the script to the Basic Client. The Basic Client will then compare the script against its list of reported SPAM. Where matches are made opt-out processes will be triggered and queued for execution.
This will have the effect of significantly reducing the load on the Nodes because they are not having to process or store and forward the actual SPAM, just the bits Okopipi need to develop opt-out scripts.
- Create a script to check spf records and/or domain keys of emails received. Possibly in the beginning fast tracking any spam which manages to pass the verification process. Allowing for such sites to be reported more quickly. --Hedwards 19:10, 30 May 2006 (PDT)
Handlers
- Handlers will wait a random interval from 0 - 30 seconds before forwarding spam reports. This way, if a spammer sends in a flood of spam reports, they'll be staggered and won't all hit the main server(s) at once. (Caspian)
- If a client starts flooding a handler, the handler will suspend forwarding reports from the client, and will cancel forwarding pending reports from the client. (Caspian)
- The process by which handlers are assigned other handlers to forward spam reports should not be random. Rather, a definite path should be mapped out as spam reports travel from handler to handler. Although no one other than Okopipi administrators know the entire route, it should be carefully planned. Else, some spam reports may enter an "eternal loop", being forwarded in a ring of handlers. (Caspian)
- I disagree about the not random bit. This is because any pathed or mapped or systematic delivery system is vunlerable to attacks. By introducing a random delivery system where a Node simply passes the message along to say 15 of the 30-40 Nodes (and increases a node counter) the Nodes has on file achieves two things. The source of the signed/encrypted message is obscured and also the path of a message is not predictable. Try and attack that! Refer to other threads in here and on the GoogleGroup for further discussion on this. --Hal9000 19:24, 29 May 2006 (PDT)
- This could easily be worked around by having the handlers remember which packets they have seen. If they see a dup, they simply send to a different handler than the one they sent it to last time. In this way, you could ensure that the packet eventually gets taken off the network. This would also allow the retention of anonymity. If spam reports are not random, with a little work, servers could be exposed and then the network taken down. (JDShewey aka morphius)
- One could either have several paths (perhaps 12) chosen at random to route a message, or one could conceivably base it upon a gaussian attractor. Personally, I would go with several paths to keep it simpler. I doubt that it is a good idea to allow the packet to bounce around until it has potentially hit every node. --Hedwards 19:14, 30 May 2006 (PDT)
- Total Anonymity: To guard the identity of the (super-)handlers, the spam reports should be send with a counter attached that is upped by a random choice between 1 or 0 (bernoulli) by each node upon retrieval. If a node recieves a spam report with the counter set to a certain value, e.g. 20, its journey stops. This value must be high enough to travel through a node it is destined for, e.g. a super-handler. RESULT: no-one can be sure which previous IP started sending the spam report, and furthermore: no-one does know which of the visited nodes is actually using it to pass it on to a super-node. DANGER: you must be sure the path is long enough to get where it needs to get to, but not too long as its a waste of valuable resources. RANDOM: the random counting process will have a very small chance to live past e.g. 2 times the set maximum value. This should prevent `eternal loops` as mentioned before! --Danielt34 13:07, 13 June 2007 (PDT)
Opting Out
In response to Opting out as single e-mail:
- What makes you think that you can describe a group of spams with a single (or even tens of) hash(es)? If the spammer just changes a single word in every sent spam you get thousands of hashes for the 'same' spam. Maybe you can cluster by marking similarities. For example by hashing small chunks of the messages thereby trying to find higly similar messages. Second thing, are the other peers really needed? Can't each frog research on it's own? - HenkPoley - 19may2006 15:00 CET
- Well if they slightly change the words then they have thousands of hashes for the 'same' email, and it serves them damn right to have lots of opt-out requests for trying to slip the system. Of course, you can have something to check each letter of each word and then have a similarity check. But then just send an email for each hash anyway and get back on them for sending such a STUPID amount of emails anyway. --Thematrixeatsyou 01:32, 26 May 2006 (PDT)
- One way to test similarity is to use compression. Get two files and then compress them with your favourite compression algo (eg. zip, or rar) and the smaller the output file, the more similar the two files are. -Simul.
- Ok. I did some quick experiments to test out the compression method of identifying similar files. It worked beautifully, except it was _very_ dependent on compression method. See the talk page to see my (little) experiment and results. -Simul.
- Normalize the emails, take out all of the spaces, convert all letters to upper or lower. Then compare the links and such. I am not an expert, but this is similar to the way in which fingerprints are compared on computers. Give each aspect a weight after comparing the bits. Not perfect, but if it works, it would be a much more efficient way of doing it. At any rate, it would reduce the number of opt outs sent to a reasonable number. --Hedwards 19:18, 30 May 2006 (PDT)
- It's not the spam content that will be hashed, but the victim's e-mail address. The spammer will be able to hash the e-mail addresses on his list and compare them to the reported address. If there's a match, it will be removed. Yes, the other peers are indeed needed to prevent false alarms - but they shouldn't be necessary for the frog to work. - Spy der Mann - May 19, 2006 08:15 -0600 GMT
- Okay I edited the text to include 'address'. Did you mean "Do Not Intrude Registry" (DNIR) instead of "Opt out request"? Opt out request seems like a single spam response message to me, not some kind of list/registry. - HenkPoley - 19may2006 17:40 CET
- Can i remove this part of the discussion now? I think it's redundant at this point.- Spy der Mann - May 19, 2006 15:30 -0600 GMT
- What we really need to use to group similar spam is the link destination. Spam may have a zillion different random letters at the end but they can only have so many target domains for email reply or link target. Or phone number if they try to use that, but it's rare.--TexasDex 08:27, 26 May 2006 (PDT)
On the Do Not Intrude Registry
Generating the registry
An Idea I came up with is to generate the registry on-the-fly, based on the currently connected frogs at that point. The request would be made by one of the top-nodes in the hierarchy, and the registry would be generated by other means.
Once generated, the registry could be published via bittorrent or other means independent of the "frognet", so even if the net is down, it could be downloaded. The registry should be signed with the poster's public key.
This would be only done once per month or so. On the subsequent requests, the list would be diffed and published as a patch. Every six months a new complete list could be generated.
The only problem I see with this is that people wouldn't run the frogs 24/7, so there should be a mechanism to add a single-email to the registry.
-Spy der Mann, May 19,2006 15:27 -0600 GMT
Adding single e-mails to the registry
A problem comes with adding single e-mails, should the addresses be kept in a cache (by some members of the node hierarchy)? How to avoid spammers from poisoning the registry with garbage? One possibility is sending e-mails should be un-hashed, and sent to the top authority, encrypted with the authority's public key. But this has not been discussed yet. -Spy der Mann, May 19,2006 15:40 -0600 GMT
If they poison the registry, that's their problem. They have to download bigger and bigger lists eventually, if they ever download anything at all. Plus the addition of email addresses to the registry should be confirmed, by clicking a link, solving a CAPTCHA.. etc. And the registry should store a hash of the address's hash (so doing the hashing two times, this way they can't just use a time-memory trade-off password cracker, because over 20-30 chars the trade-off is negligible, and even an md5 hash is 32 chars long) PAStheLoD 08:46, 26 May 2006 (PDT)
Is a Registry really needed? (Discussion)
Anti-registry arguments
1. My opinion is that it should NOT be used, since it was one of the spammers' arguments: Downloading an 80 megs "Do Not intrude registry" file is pointless when tomorrow you'll be downloading an 85 megs "Do Not intrude registry" anyway. In other words, having to download the full not intrude registry was an obstacle rather than a help in opting out (it was necessary at the time, for a proof-of-concept, but it's not now that we know the blue-frog approach works). So it's better if opt-outs are processed individually. - Spy der Mann - May 19, 2006 11:36 -0600 GMT
2. If users can keep a list of "protected addresses", there's no need for a central database.
3. Follow the KISS principle.
4. Spammers don't need the full list, just the offended addresses to which they spammed.
5. It would save bandwidth.
6. Big companies can keep their own frogs for large lists.
Spy der Mann - May 19, 2006 14:00 -0600 GMT
7. Without a registry to fall back on spammers will only have one way to reduce the bite-back from okopipi... stopping spamming altogether. Saying you'd be happy if spammers removed your name from their lists is a selfish world-view - I'd far rather fight for everyone in the world's email addresses than just users of the service.
Vincevincevince 09:13, 27 May 2006 (PDT)
Pro-registry arguments
1. Regarding anti #1: All that needs to happen to eliminate the spammer argument is incremental lists -- you have one complete list that contains everything, but you also then post weekly updates which contain only the hashes added since the last update. Then, the spammers can never argue that they would have to perform redundant downloads to comply. It would be 80 Mb one day, then maybe 5 Mb a week later, then another 5 Mb in another week, etc. The size of the incremental downloads would not be linked to the size of the whole list, only the rate of new user sign-up.
2. Regarding antis #3 and #5: Building the registry on-the-fly and posting it on bit torrent would both keep the KISS principle and save bandwidth.
3. Regarding antis #4 and #6: If we do both opt-out and keep a registry, the spammers don't need the registry to opt out the requests they've had.
4. Regarding anti #2: (Idea by zacronos at the okopipi-dev group). Having a DNIR list would allow us to proffer an olive branch with one hand and a fist with the other, rather than just the fist. This is good for public opinion as well as effectiveness.
5. (by zacronos) I think that many of us would refuse to participate in a project that is too retaliatory -- for example, one that tries to DDoS a website's sales forms, as you suggested. That is much more of an attack than even sending multiple opt-out requests for each email received, which is still a fairly aggressive stance.
6. (by Tortanick) If you have a DNIR then just by signing up someone gets removed from the spammers lists.
7. (by Secondwheel) Regarding anti #1: The spammer argument isn't logical to begin with. If they stick with the 80MB list even though an 85MB list is available, they are still avoiding all but 6% of the complaints they'd get otherwise. They only need to update when the complaints of new additions to the list reaches their comfort threshold.
8. (by Secondwheel) Regarding anti #4: Consider from the POV of a bulk-emailer with paying clients. Will they like the idea of being forced to send round #1 of emails to a newly-purchased list to find out who they should remove (because that first client's site will be inundated with remove requests, and the client will be history.. which, by the way, would likely mean the client won't bother passing on the list of remove requests). If we can make it as easy as possible (including providing list-cleaning software and an easy-to-update DNIR), the spammers who clean their lists will instantly get a better response percentage and lower complaint rate, which they can *advertise*, and non-complying spammers will be at a swiftly-growing disadvantage.
9. (by Vertias Nikishi) There's NO hope that ANY project can stop all spam to ALL email users. There's just no way. The project goal needs to be lower, and in this respect, Blue Frog had it right; set the goal merely to tell spammers not to spam this select subset of email users, listed here, under pain of opt-out flooding. This gives spammers a realistic choice which apparently already worked for 7 or the 10 worst spammers: "You can spam the rest of the world, and we won't mess with you. Just leave THIS LIST of email addresses alone OR ELSE." No spammer is ever going to stop spamming 100% of the world. But they will stop spamming the 1% of the world on this list if complying will make their life easier.
10. (by Veritas Nikishi) Many users of Blue Frog NEVER RAN THE APP, but they were on the list anyway and benefitted from the collective effort. Yes, you can consider this leeching, in a way, but consider this: if your parents had a spam problem, and only a dial-up account, wouldn't you add their address to the list and not install the app itself? Sure you would, and that actually helps us all, because the spammers won't know who on the list is running the app and who isn't. All they'll see is a list of 500,000 addresses and think "Uh Oh, that's a pretty big potential for problems."
11. (by Veritas Nikishi) We get to bask in the Light of Reason and Good. To not have a Do Not Spam List is to look like vigilantes, which will NOT work well in the press or the general public. When Blue Frog was reported on, there was always a debate: is this good or evil? Well, thanks to the DNSL, there was at least debate. To not have a DNSL, there will be no "On the other hand..." in any article covering the project. There will be no debate.
-- Why is this a problem? The frog won't do anything wrong. You have the right to request opt-out, so you can click their link, no matter what (there isn't any exception that says, you can't do it with a program..). And if they doesn't provide an opt-out address, then you can write to the advertised company, that you don't want any more of their spam. PAStheLoD
12. (by Vertias Nikishi) Regarding the "But spammers will be able to extract email addresses and spam list members" agrugment. They already have your email address and are spamming you. Then they got your address again from the Blue Frog list. So what if they get your address again?
13. (By H Edwards) In addition to #10, merely being able to say that we have X number of (hopefully) thousands of addresses provides a much larger incentive for legitimate bulkmailers to check the registry.
14. (by H Edwards) While the ultimate goal of total spam elimination is probably impossible, the ability to reduce it by 80-90% for some and perhaps up to 98% for those that are more careful is a huge step from where we are. Just getting spammers to stop sending junk to random addresses, even if just in our registry is a step forward.
15. (by Chris Knight) Regarding the "But spammers will be able to extract email addresses and spam list members" argument. If the registry is a list of hashed e-mail addresses using a one-way hash algorithm (MD5 comes to mind) then it is impossible for a spammer to determine the original e-mail address but they can, with a simple program, take a list of e-mail addresses that they use and a list of hashes in the registry and determine whether the MD5 hash of that address exists in that list. (Note: a tool should be developed and released for a variety of platforms so that the spammer has no excuse of "I can't use your data 'cause it's in a format I can't handle.) I personally would be against sharing a list of e-mail addresses and this hash mechanism is functionally equivalent while providing security.
Reporting Spam
- Scripts should allow reports to FTC, FFA, software companies, etc... (Vidaloca)
- Hooks to/from Outlook, Thunderbird, Firefox/Mozilla, IE, Firetrust Mailwasher and so on, to report SPAM should be part of the client from day one.
- I still get spam that uses tinyurl or geocities redirects on occasion. We do need to have some way of notifying the website abuse departments without bombarding them like we do the spammers. --TexasDex 23:33, 27 May 2006 (PDT)
- I am concerned when people start suggesting that scripts should report off to FTC, FFA, software companies, etc.
STOP AND THINK ABOUT THIS!!!!
OK thought about it.
The problem I have with this suggestion (and I'll be the first to say I had suggested it as well) is that we are the guilty of what we are trying to prevent. We become spammers ourselves by inundating FTC, FFA, software companies, etc with individual messages.
A far better suggestion is that the Handler or Admin Nodes would aggregate the reports and send a single message along the lines of
"Hi FTC, FFA, software companies, etc.
We represent (the number of clients and nodes reporting the particular SPAM) people who have recieved the attached spam sample.
Please be advised that this is illegal under relevant legislation.
Regards
Okopipi Network.
<<sample of spam message>>"
I am sure this could be automated to run a week after the script has been released for action.
I am of course assuming that the clients will be reporting back to the network in some way to let us know whats going on out there.
--Hal9000 04:30, 28 May 2006 (PDT)
I like aggregates, but it may make sense to try grouping like spam together; say by website or product so as to make it a bit easier on the receiving end. Obviously it isn't particularly useful if it takes man power to do it. And probably should be relatively low in the priorities.
--Hedwards 19:32, 28 May 2006 (PDT)
I think it's best to ask the reciving combatants how they like the reports so they won't be flooded and can make the most out of the reports.
--Lebewesen 16:34, 14 February 2007 (PST)
--tannoz
As far as I know, the (FTC, other local antispam authorities, ISPs, etc.), the (FFA) and (software companies) do not have the same role and should not be treated in the same way:
- the FTC

