# /robots.txt User-agent: * Crawl-delay: 604800 Disallow: / Disallow: /cgi-bin/lid.cgi Disallow: /rbtc/ Disallow: /email.php Disallow: /formmail.php Disallow: /guestbook.php Disallow: /images.php Disallow: /management.php Disallow: /private.php Disallow: /personal.php Disallow: /spider.php Disallow: /submit.php Disallow: /tools.php # [:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:] # # # # ---------------------------=*> THE BOT BLOG <*=-------------------------- # # # # [:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:][:] # # =========================================================================== # # # # Can Google Predict The Future? # # # # by Brett Tabke # # 5/29/2007 # # # # =========================================================================== # # # # Google posted an AdSense blog entry this week stating that most # misclicks by AdSense vendors were automatically discounted. # http://adsense.blogspot.com/2007/05/accidents-happen.html # # "...chances are we've already detected your clicks on your ads and # discounted them." # # I was surprised to see a few noted SEO's weight in with naive # responses I have rarely seen in the SEO space. # # I would guess that Google can ID your click with 95% accuracy. The # other 5% of the clicks, they can throw out on pure guesstimation. # They don't have to be 100% certain to be able to toss a click out # with a high degree of confidence that the click was errant. # # How does Google know it is you clicking on your ads? # # 1- cookies. # # 2- your ip matches ips that have logged into Adsense control panel. # Or a login to the panel matches a previous click on your site. # # 3- you page view behavior matches an owners page view behavior. This # is by far the most common method used by Google. It is easy to ID an # owner of a site after very few numbers of page views. Google simply # tracks your ip behavior as you view your own site and ads are served # to you. Read some of the recent stuff on click fraud - it is pretty # clear this is the top way Google is tracking bad clicks. # # #4: Additionally, the majority of IP's on the cable networks are # dynamic, but dynamic within a block. Thus, it is deducible to know # that if Bob's ISP is Comcast and a Comcast address has viewed 200 # pages on his site and the same C block logged into his control # panel, and the same d block is on the Cookie - given his path # behavior - it is pretty safe bet we can throw out those clicks. # # #5: Here is another one: lets say you are using a stock piece of # blog software or blog service. Many of those pieces of software # allow one template and one template only. So you serve Google ad # code, to even your blog admin panel. Google sees an attempt to load # an ad from a restricted url on your site - presto, it has you. The # number of blind urls Google would have to check against would be # less than 10 to match 90+% of the major blogging software out there. # # # #6: Two words: Google Toolbar # # Long story short - yes Virginia, Google knows who you are from your # click. That's not the question - the question is, even if they know # it is you, how many do they left fly by without discounting them? # # Everything talked about so far is child's play that any # knowledgeable webmaster can duplicate. Now lets get a little more # advanced: # # Often overlooked is the widespread usage of Google AdSense code. # That code is living on millions (maybe billions) of pages. If you # surf a lot of sites in a day, you are loading that code hundreds, to # thousands of times a day. As you load it, you are leaving a trail. # Every time you load that code, you are leaving information on # Googles ad servers. Sooner or later, those bits of information add # up into a pattern that can be used to identify you with a high # degree of accuracy. # # For example, if Bob starts his typical morning run by surfing: # # 1- foosite.com news. # 2- bigsite.com blog. # 3- fooweather.com weather. # 4- bobs-site.com/wordpress/ # # Most people do something similar. A few to a dozen of our favorite # sites and pages make up your average morning run for most internet # users (especially webmasters). Even if Bob switches user agents, # ips, and even some of the sequence to his daily habits, there is # little doubt Google could ID Bob out of millions of users, simply # from his click and advertising behavior. # # Deja vu? Any of this sound familiar? It is the same type of pattern # recognition search engines use to find duplicate content on # websites. # # Every time you load that AdSense code, on any site, you are leaving a # bread crumb trail of information. # # Again...dig up some of what Google has talked about recently at # conferences in reference to ClickBot detection, it is fascinating to # see just how far Google has went at detecting users/bots/mischief. # # That's exactly where we were headed Hobbs. # # So we have went from basic to advanced detection. Now, lets get # leading edge by looking further at heuristic methods of prediction. # # There is AdSense code on a few associated keyword sites, Google # already knows: # # 1- The path most users take when viewing those sites (due to tool # bar and Adsense data). # 2- What sites most of those visitors visit. # 3- How often most users stay on those particular pages and sites. # 4- What type of advertising behavior those pages show. # 5- what language is on those sites. # 6- the income range of the audience, # 7- the sex and the age of the audience. # 8- and the general the psychographic make up of the website # audience, etc. # # Essentially, Google knows that Bob likes roses, daisies, orchids, # and wild flowers. Therefore, it is a good bet that he likes Tulips # as well. # # So what if Bob is on vacation in Paris: # # - he visits a public internet cafe. # # - he surfs a few of his favorite sites (not necc any of those from # his morning run, but sites that run AdSense). # # - he surfs a fresh new site that he has never seen before in this # space. # # Now, here is the fun part. Google knows it is Bob. How? # # I don't know the official name for this type of predictivity, but it # is a subset of Psycho-Graphic behavioral targeting (Click # Prediction?). # # After you dig into this line of thinking, you have to start to # conclude that: # # 1- Google is much further along the path than this. # # 2- Googles ability to "predict" user behavior is now a thousand fold # what we are talking about here. # # 3- Googles ability to track, interpret, predict, and act upon # information is now in the scary all-seeing, all knowing range. # # Number three is the most interesting to me. How good has Google # become at predicting events? Think about all the web data Google can # synthesize. # # - news tracking. # - stock tracking around the world. # - web site tracking. # - trend tracking. # - event tracking. # - gmail email reading. # - blogs. # # Imagine the associations that could be uncovered? # # # -bt # # ... I accidentally installed the deer whistles on my car backwards. # Now everywhere I go, I'm chased by a herd of deer. - Stephen Wright # # # =========================================================================== # # # # By Webmasters - For Webmasters # # # # by Brett Tabke # # 4/3/2007 # # # # =========================================================================== # # # # Monday (April 1) was a fun day around the web. For our part, we did # a day of advertising on WebmasterWorld. It started as a simple joke # because we have never had pure advertising on the site. The plan was # to really load the site up with dozens of ads per page from every # advertising network we could find. The idea was to be so over-the-top # that people knew it was clearly a April fools day joke. That was the # plan anyway... # # After starting to get into it we realized that would have been many # hours of work for just a April fools day fun! We did the AdSense # ads first because that code was easily at hand and was little work # to drop into the code. It took about five minutes to put the code # in, and then another hour to figure why it was not working. Turns # out, I'd blocked Google AdSense in hosts file and forgotten about it # over a year ago. When code was up and ads were finally running, bells # started going off that this will be interesting to see results. # # At the end of the day. the results were mixed. It was a odd day to # do a legitimate test, but the opportunity presented itself. I was # going to do it for a few hours at best, but the team was more # adamant about letting it go. (secretly, I think they assume there is # money on the table that we are leaving behind). # # http://www.webmasterworld.com/webmasterworld/3299440.htm # # Results were significantly less than they would be during a quality # week day and during peak fall season. # # Variables to consider: # o Sunday is the lowest historical day of the week around here. We just # tank on weekends. # o It was April 1 - which is a notoriously slow day on the web. # o It was one of the first warm Sundays in the south (people out # side, or with family). # o Many schools out on spring break (kids at home spending quality # time in a loving environment with their family ;-) # o Ads were not on the high volume pages such as the forum index list # and the subforum index lists - and certainly not the mega active # list. # o Sunday was the lowest uniques we have seen since July of last # year. (ouch) # # # Ownership: # ---------- # About 8 years ago, I shut down a program called Buddy Links. It was a # program that involved over 4000 sites and 1000 people. In the final # letter to all those in the program, I asked a simple question, "is # that all there is"? I was extremely disillusioned with web life and # some actions of my fellow webmasters at the time (slipping porn # links on to childrens sites in the program). I came very close to # walking away from internet work entirely and getting a *real* job. # Instead, that little community that had sprouted around # SearchEngineWorld was a growing even after shutting down Buddy # Links. Looking around at alot of the forums and sites in the tech, # webmaster, and search space, it was clear that a low key, low impact # site had legs - had a niche. # # Our core is a group of independent publishers. Most of them have # left "real jobs" to stake a claim of independence on their own. It # isn't easy for these people to dive in with both feet and hope it # all works ok. Like the real world, not everyone plays nice in # cyberspace. While they are heads down working on their sites - who # is looking out for them? The big corporates have big feet and don't # always watch where they step on the web. We have all seen the big # sites make changes that effect thousands of domains without so much as # a warning, a whisper, and certainly no apologies. Who is keeping an # eye on that stuff? Who is really looking out for the smaller sites? # # WebmasterWorld is a community site by Webmasters and for Webmasters. # We wrote that almost 10 years ago on SearchEngineWorld (back when it # was still on my isp site) At the end of the day, that still means # something. It is worth standing for - worth fighting for - no matter # the outcome. # # Every time the VC's knock, a competitor calls, or people want us to # sellout to a corporate advertiser, that old phrase "by webmasters, # for webmasters" comes back too me. Simple tried and true questions # that have no answers resonate: If not us - who? If not here - # where? If not now - when? Somebody needs to do it. # # Some advertising proponents fail to understand what happens when you # put advertising on your site and agree to a TOS - you are owned. Go # home Ferris, it's over. Your site is not longer yours. You are only # working for the man now. # # That may be fine on some sites that are designed to do that, but # webmasterworld was designed for community members. Often, we discuss # those very things (advertising programs), and no one OWNS US in those # discussions. # # You don't think they own you? Go look around this space at other forum # and community sites. Most of are owned by their advertisers. Some # even have forums that are "sponsored by" or "presented by" an actual # advertiser. How, can you possibly have an independent, intelligent # conversation about Program X when the very forum is plaster with ads # for Program X? You can't - and they don't. You can't trust that info # to be remotely honest or accurate. 9 out of 10 of those "sponsored" # forums have employees as the top poster. # # So there you have it - pretty much the same thoughts we had on the # subject 10 years ago. # # That's what what we do here. A place to gather and talk about stuff # without all the noise of advertising in our face, or leverage # in our wallets. That independence is paramount to continuing to # serve the community as well as we have for so many years now. # # - bt # # ... I stayed in a really old hotel last night. They sent me # a wakeup letter. - Stephen Wright # # # =========================================================================== # # # # PubCon 2007 # # # # by Brett Tabke # # 3/27/2007 # # # # =========================================================================== # # # # They can talk about sleep deprivation and it's effects, but until you have # your own newborn in the house, you really don't know what it is all about. # http://www.webmasterworld.com/webmasterworld/3191522.htm # # That is part of the reason we are a bit delayed launching this years # PubCons. The other part, is we have some seriously special things in # store to announce. Stay tuned on the blog on PubCon for more. # # We will start asking for and taking session and speaking proposals # at that time. Everyone is welcome to send one in (even you Shoe!) lol # Seriously, even if you have spoken at Pubcon before, please fill one out so # that we can make sure you are in the pipeline. We had over 500 last # year so it is critical that you turn one in if you wish to speak at # Pubcon. # # -bt # # ... I went to a general store, but they wouldn't let me buy # anything specific. - Stephen Wright # # =========================================================================== # # # # Subscription Site - Five Years Later - Part 2 # # # # by Brett Tabke # # 3/12/2007 # # # # =========================================================================== # # # Running a subscription site is a different proposition than an advertising # based revenue model site. # # Pro's : # - Increased visitor trust of site and information found there. No # advertising clearly means site trust is automatically higher. # - No advertising to manage or staff to sell advertising! # - Don't have to cater to an advertising network or be beholding to them. # - Cleans the site look and feel by reduced clutter because of no ads. # - Traffic from search engines becomes secondary concern. # - You get to focus on your visitors experience and only on your visitors. # - Your visitor *is* your customer. No 3rd parties with their own - often # hidden - agendas involved. # # Con's : # - Some view advertising as an endorsement and automatically assume that # the site is not successful because it didn't acquire advertising. # - Page views for page views sake become a secondary concern. Which leads to # the site appearing smaller than it should be in the actual traffic data. # - Traffic doesn't mean what it once did. So what if 1m page views a day # fly by - it doesn't translate into anything of value. # - Protecting visitor rights as much as possible becomes an overriding # almost dehabilitating concern over everything we do. There is much more # at stake than just your own concerns. # - Advertising networks become extremely jealous and problematic. It becomes # a significant concern when advertising network is also a search engine. # - Affiliate networks also feel same as previous item. # # I will keep updating this list as I think of more. # # -bt # # ... There's a fine line between fishing and just standing on the # shore like an idiot. - Stephen Wright # =========================================================================== # # # # First Click Free - Is it working? Part Two # # # # by Brett Tabke # # 3/11/2007 # # # # =========================================================================== # # # # I think a few people are under the misconception that we require visitors # to login as a blanket policy. This is simply not the case and misinformation # from (imho) mostly competitors. # # Over the last week, we have had 4 aggressive bots from multiple IPs spider # WebmasterWorld to the point of saturating the box. The only salvation was # a reboot 8 times. This were launched from Europe. So we have temporarily # rolled back the first click free status to about 2 clicks and increased the # timeout to 10 mins (vs 3mins) again in response to the attacks. # # How do we detail what ISP's/IP's are required to login, what ips are # required, and not open ourselves up to much worse? The mere discussion # of it has lead to giving people nefarious ideas. At times, I have # wondered if this wasn't all a ploy at attacking WebmasterWorld in the # first place. # # To give a few highlights and remind people why bots are such an issue for us: # - Potentially unlimited url space for spiders to get lost in. # - Flat url structure. It is the only major forum with a flat url structure # that isvcrawlable by all of the offline browsers/spiders/crawlers # available for download on Tucows today. # - Highly technical and advanced visitors. # - Some visitors gravitate towards the black hat arena of the web and don't # necc see anything wrong with unleashing a bot off an OC3. # - A highly visible site with thousands of references on the web. # - A rich archive of message that are (imho) the highest level of discussion # in the webmaster/tech space today. # - We don't do advertisements on the site. # - Dynamic site - every page is generated on-the-fly. # # So, first we have the full "first click free" thing, where visitors can view # two+ pages within a certain time frame. Only those on isp/ip's mentioned below # will be required to login. We'll season that to taste as this latest round # of spider attacks calms down. # # # Requiring Login by Agent Name: # ------------------------------ # The majority of agent names run by bots are bogus. Those that aren't # and require login include: # # java,snoop,lwp,boitho.com,spiderman,downloader,Missigua,HTTrack,Fetch API Request, # webstripper,Jakarta,IEAutoDiscovery,PHP_feedyes.com,Zehuti,bot.mainseek.net,TMCrawler, # megite,Bottino,QihooBot,Hatena,Missigua,Bookdog,NewsGator,API Request,Macan, # Java,Python,webmon,yoono,Boston,Jakarta, # # That cuts out much of the programmed stuff by newbies who just dl'd the latest # offline browser and let it rip. # There are about 10 other specific ones with trademarked names in the # agent name that we will not name here. # # Also note that any agent name used by a search engine, that is used by a # known visitor, is banned with out hesitation. If it looks like a bot, and # is really an imposter - then there is no reason what-so-ever to let them # on the site (which occasionally leads to bans of prominent SEO's that like # to scam agent names as a poor mans seo detector) # # Banning by agent names: # ----------------------- # "EmailSiphon","EmailWolf","autoemailspider","ExtractorPro","URLSpiderPro","Crescent" # "CherryPicker","[Ww]eb[Bb]andit","WebEMailExtrac","WebCopier","NICErsPRO","Openfind", # "grub-client","WebWhacker","gigabaz","PingALink","LexiBot","LinkSleuth","OfflineExplorer" # "Teleport","Zeus.Webster","Microsoft.URL","Wget","WebCapture","Sweeper","Aide","larbin" # "Szukacz","httpdown","MSIECrawler","LinkWalker","sitecheck.internetseer.com","ia_archiver", # "Seeker","ASPSeek","DIIbot","IndyLibrary","psbot","almaden","MSProxy","SlySearch","WebStripper" # "EasyDL","WebZip","b2w","HTTrack","InternetSeer","User-Agent","EmailCollector","Python" # "Offline Explorer","LWP","Simple","sohu","Fetch","ichiro","production","libwww","Zehuti" # "robot","httrack","Simpy","kinjabot","livedoorCheckers","Lite Bot","MFC","UltraWombat" # "Hatena","WebStripper","grub","php","naver","loopimprovements","zao","links","Downloader" # "Cache Content","almaden","IAArchiver","UrlDispatcher","Exabot","Java","deepindex.com" # "WebZIP","EmailCollector","UltraUptime","avantgo","HTTrack","Hatena","TMCrawler" # "QihooBot","Indy Library","EmailSiphon" # # Those are more prolevel spiders we all recognize from days-of-yore. Those are unable # to visit any threads with those agent names. There maybe some "white listed" ips, agents, # and referral strings that may override any of those settings. # # Requiring Logins per IP: # ------------------------ # There are aprox 800 ip's in the current list that are required to login. # There are an additional 800 IP's that are blocked from viewing threads or # logging into the site. We can't name those specific IP's here because many # will get recycled in dynamic ISP IP spaces. We can not know who will or will # not have that same ip for long. # # Requiring Logins per ISP: # ------------------------- # # This is where it gets a little less granular and casts a wider net to catch # those in the dynamic ISP ip range. # # The TLD's that we have required login were not easy. Each one has seen # and unrelenting spider on a dynamic IP hit the site. Some notes: # - we have very few visitors from china and WebmasterWorld is regularly blocked # in China. # - the highest incidents of wholesale site ripping and republication have # # been from (in this order): .cn, .ru, .cz, .de, and .jp. # - the most aggressive spiders have been from: .tw and chello.nl # # As a lot, the worst isp as far as spidering has been (of course), the mega # road runner (rr.com). That's a reflection of size, as well as the nature # of the user. However, Road Runner has never responded to a single report # of abuse. # The second worst has been comcast and pacbell.net. # The list changes almost daily. We try to remove the "non essential" ones # about once every six months. # # Aside from the big cable/DSL companies, we also run into many hosting companies # that download scripts are hosted on. These are deceptively "violent" because # most often, the come from a host with major bandwidth to match our own major # bandwidth. I don't care what kind of box you have, in any server-vs-server war, # an attacker can always flood the victim box. # # rr.com, comcast.net, pacbell.net, eatel.net, openaccess.org, filenet.com # concordia.ca, verizon.net, progressive.com, chello.nl. razorservers.com # inode.at, euskaltel.es, versanet.de, swbell.net, iol.cz, serverpronto.com # dslextreme.com, ameritech.net, privatedns.com, telnor.net, mindspring.com # alltel.net, brockport, dip.t-dialin.net, swipnet.se, touchtelindia.net # svabuse.info, inhoster.com, ev1servers.net, net.ar, wnet.ua # acn.gr, ucsfmedicalcenter.org, telepacific.net, rfhost.com, bellsouth.net # ibm, saix.net, amazon.com, freedom2surf.net, astral.ro # uni-muenster.de, charter.net, charter.com, earthlink.com, charter.com # sbcglobal.net, net.tw, gavle.to, surfer.at, net.tr, charter.com, ne.jp, twtelecom.net # co.th, netvigator.com, telia.com, ebuyer.com, net.ar, net.br, com.ar, # whiterockdistilleries.com, phx.gbl, hostingpanama, # chello.nl, bulldogdsl.com, layeredtech.com, swbell.net, b-one.net # vsnl.net.in, wooph.com, lunarservers.com, megapath.net, moldtelecom.md # mts.net, btopenworld.com, answers.com, optonline.net, astral.ro # dslextreme.com, easynet.co.uk, hinet.net, layeredtech.com, isp-thailand.com # ozerki.net, bluehost.com, markwatch.com, hispeed.ch, teleweb.at # batelco.jo, direcpc.com, choopa.net, rdsnet.ro, gudzondns.net # kiev.ua, pair.com, exatt.net, oati1.com, ironpath.net, webhosting.com # newsgator.com, startdedicated.com, onshore.net, airtelbroadband.in # webwarper.net, t-dialin.net, ev1servers.net, hawaiiantel.net, arcor-ip.net # inode.at, inewsroom.com, daylife.com, svservers.com, .net.il # swbell.net, .isu.net.sa, csccorporatedomains.com, enterhost.com, telecomitalia.it # iwayafrica.com, yetnet.ch, isnet.net, shawcable.net, rogers.com # freedom2surf.net, com.ph, tufts.edu, com.cn, entireweb.com # outboundrequest.com, forthnet.gr, vivodi.gr, intercage.com # airtelbroadband.in, globnet.md, storm.ca, bellsouth.net, live-servers.net # hostrino.com, verizon.net, layeredtech.com, chello.sk, dappit.com # themezoom2.com, buckeyecom.net, auna.net, databasemart.net, comcast.net # .tds.net, snet.net, hostingstudio2.net, videotron.ca, sdnet.net # telecomitalia.it, griffinsg.net, trancehosting.com, com.cn, comcastbusiness.net # steephost.com, atrianetworks.com, theplanet.com, cox.net, telecomitalia.it # # Banned domains # -------------- # avantgo.com, chello.nl, sccoast.net, slaskpost.se, googlealert.com # # All that is at the program level. NOT at the web server level. # # Bans at the Apache Level: # ------------------------- # We currently have about 4k IPS in the flat out - access denied - file. # # The only one of note is the translate.google.com proxy. It has been a # major source of spider attacks over the years. We keep trying to take # it off the list from time to time when a member requests it, but the # scripts/spiders keep coming back through. We hear Google has been doing # some work on the translation proxy to clean up the stray traffic and pass # the ip of the user in headers. Not sure how that works just yet, but it is # on the list of things to investigate. The old wap.google.proxy has finally # been taken off again. # # Auto IP bans: # ------------- # Any one logging into one of our planted bug)me-not logins is auto # banned without exception. # # IP Spoofing # ----------- # We have also had a couple of people work with us to try to get to the root # of some DDos style requests from spoofed IP's. For the most part, this is # handled at the server/firewall level, however the volumne of requests are # still an issue. # # # White Lists: # ------------ # Yes, there are numerous people/things that are whitelisted. Those whitelists # can trumpt everything mentioned above but the Apache level bans. I mention it # because it can sometimes confuse people as to why they can view a thread. # # # -bt # # ... It doesn't matter what temperature a room is, it's always # room temperature. -Stephen Wright # =========================================================================== # # # # First Click Free - Is it working? # # # # by Brett Tabke # # 3/6/2007 # # # # =========================================================================== # # # re: http://www.webmasterworld.com/webmasterworld/3270248.htm # # # I took a long look at the current required login list and why those # particular sets of isp domains were there. I just don't think it is # feasible for webmasterworld to allow unfettered downloading, scraping, # attacks, copyright bots, and ddos'ing attempts from isp's such as: # # rr.com, comcast.net, pacbell.net, bellsouth.net, btcentralplus.com, # verizon.net, chello.nl, swbell.net, t-dialin.net, bellsouth.net, ibm, # saix.net, charter.net, charter.com, btopenworld.com, bellsouth.net, # verizon.net # # Those represent the worst of the problems we have seen. High speed # cable users who think nothing of unleashing a bot at full speed off a # 5meg line with 5-10 simultaneous connections. # # # We currently block most of those until users from those isp's login. # The ISP's - the ENTIRE lot - have never once responded to an abusive # user report sent to them via ANY channel. # # Here is an example from a bot just 5 mins ago (as I type this): # # 145 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=stuntdubl # 146 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=shri # 147 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=volatilegx # 148 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /website_technology/ # 149 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /content_management/ # 150 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=Xoc # 151 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=madmatt69 # 152 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=dreamcatcher # 153 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=ferhanz # 154 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=zCat # 155 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=caine # 156 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=rocknbil # 157 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=wheel # 158 11:09.13 pm Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 203.244.131.17 /profilev4.cgi?action=view&member=Komodo_Tale # # That is a handful of lines from well over 1k dls in ten mins. That is # a specific bot written specifically to scrap webmasterworld profiles # and collect email addresses. We see at least 1 of those specific # profile bots like that every day. # # Another one caused us to reboot the server 3 times today because of # machine saturation due to spidering. (on a normal day with normal # load, we rarely get about about 25% max load on the new server). I # have no doubt it was due to this blog entry. # # So, dialing back the required login is not an option to me. Regardless # of what some of you guys running little sites think, you can not set # by and let bots continue to damage your system. Our first duty is to # the membership - not the bots - and not the drive by viewers. # # The other option I considered is starting to post the ip's, log files, # and abuse reports to the ISP's on the HoneyNet project website. After # discussing it with our corp attorneys, they advised me not to do that. # I'm going to take that advice. # # I did take a look back at how well matt's recommended "first click # free" system had been working. I love the whole concept of it, but I # don't think it is working well (as witnessed by some of the comments # above). One problem is multi-paged threads that take 2 or more clicks # to view in their entirety. Or someone visiting the home page, and then # clicking on a link - two clicks. I just don't think "first click free" # is enough. So, I bumped it up to about 4-5 clicks right now. I don't # know if that is low enough to discourage someone cranking up a slow # bot, but lets try that. # # The other thing that I did, was switch the screen off the login page, # to this new error page: # http://www.WebmasterWorld.com/error.htm # # I thought that was as good of a standard statement page as we will # find on the net today. So, if you can't beat them…join them… # # Thanks # - bt # # ... Why in a country of free speech, are there phone bills? - Stephen Wright # =========================================================================== # # # # Subscription Site - Five Years Later - Is it Really Possible? Part 1 # # # # by Brett Tabke # # 3/5/2007 # # # # =========================================================================== # # # # " ...[chuckle]...After reading this latest round of anti-WebmasterWorld # crap -- this time concerning alleged "cloaking" -- I've come to the # conclusion that there's a large number of people who are flat-ass jealous # of what you've accomplished here." - (unsolicited member comments) # # # Is it really possible to run a forum as a subscription site? That was # the question we asked five years ago. Here is a sampling of responses # from the moderators thread at the time: # # # - ...you'll make about a hundred dollars and that will be it. Advertising # is the right thing to do. Banners, buttons, skyscrappers, # Interstatials. Load it up and lets go. - NFFC # - ...Try it. I don't like it, I bet you end up with an advertising based # model in the end. - RCJordan. # - ...that is really stupid, it will never work. - anon. # - ...might as well try. I think you will be surprised. - Mike Mackin. # # # As you can see, it was not a warm-n-fuzzy outpouring of support # for the idea. At that time, we were still well into the era of the # bust where no one is making money on advertising. All the networks had # just lost about 80% of their ad revenue in the dot com meltdown. At # that time, who would you have sold banners too? Burst, DoubleClick, # and L90 had all failed to produce quality advertisements on our # other sites. There just wasn't the advertising there to be had at # the time. So it was not an "either advertising or subscription" # premis. It was a "subscription or close doors" proposition. # # As from that, I don't hate advertising, but it do get tired of it # pretty quick. I think part of it must be generational. When I was a # kid, I was always amazed by the fact that any big Hollywood movie # star that did advertising spots was lambasted as a sellout. Even my # favorite rock band of all time (The Who), put out an album parody of # "Sell Out" with images of canned goods. By the time I was out of # those formative adolescent years, the cheeky classic by Dire Straits # made a mockery of the entire anti advertising movement; "We gotta # move these refrigerators - We gotta move these colour TV's". # # In our space of webmastery, search, and site ownership, we get our # fare share of advertising thrown in our face. I know I quit going to # alot of sites if they are heavy in advertising. I never know if I am # clicking on an ad, or clicking on an honest link somewhere. That # is one aspect that you always have to remember if you run and # advertising heavy site - outlinking is everything! The more links # you have, the more you look like a resource site, and the more # likely people are to click on your ads. Whether they get there by # being tricked, or by an honest mistake is another question entirely. # # When I choose to try the subscription format, I honestly figured it # would last a three month test, earn about a K, and we'd call it # quits and do NFFC's plan B. # # But I under estimated - no I vastly under estimated our # members loyalty and willingness to support the site. The first # couple days of the subscription program caught me off guard. In the # first few days, we had 100 yearly subscriptions. We also had several # pure donations totaling over \$10k. # # I can't over emphasize the pure shock value of those actions by the # members. A week later a check showed up in the mail for \$1000. That # was our largest donation ever. It was until a few years later that I # actually heard the rest of the story. A member had asked his wife to # donate \$100 to WebmasterWorld. She misunderstood and made it out for # \$1000. After it happened, he realized that not only had he gotten a # K's worth out of the site, he went and subscribed for a year as # well. Talk about vindication for the model! # # The donations, the subscriptions, the postings, the community # support? I was moved. I absolutely had to keep the site advertising # free. This had to be a place where webmasters, site owners, and even # search engine employees come to stay apart from it all. A global # coffee shop where we could talk business without actually doing # business. A place free from the constant intrusion and influence of # advertising. A place we could talk. # # That was the beginning of it all. No grand conspiracyes, just a # willingness to try it, and a wake up call about the caliber # of member that uses the site. # # In the next couple of segments, I am going to detail some of our # serious issues that we have faced because we are subscription based. # From advertising network threats, to bot attacks, we have seen a rare # group of issues few sites see. # # -bt # # ...I have an answering machine in my car. It says, "I'm home now, # but leave a message and I'll call when I'm out." - Stephen Wright # =========================================================================== # # # # Sleep Deprivation 101 # # # # by Brett Tabke # # 3/4/2007 # # # # =========================================================================== # # # # I was looking at this earlier tonight for the first time in a few # weeks. It has been so hard to keep motivated at it. So, sorry I # haven't updated this in awhile. It has been a whirlwind bunch of # months in a row for me and my family. # # In Dec, me and Erika celebrated the arrival of our first daughter # Eleanor. I essentially took an extended couple of months off. I logged # in to take care of some fires and make sure everyone was getting paid. # Other than that, a couple hours a day to keep the email churning was # about it. This was the longest work break I have taken since college # in the early 80's. # # http://www.webmasterworld.com/webmasterworld/3191522.htm # # What a change of life it has been. From the sleep deprivation to the # complete restructuring of our lives, it has been a time of adjustment. # The biggest change, has been that it really puts the internet and our # work here on it into a whole new in perspective. It really reminded me # what was important. # # Before that, last falls PubCon was a huge project as well. I have # never worked so hard as those last few months. It was 4am to 10pm most # days from August to early December. It was so well worth it, as we had # an awesome event. Thanks to everyone who participated and attended. # http://www.pubcon.com/vegas2006/ # # I am still digging out from a lot of email going back to November. # Please be patient as I am still working through it all. Delayed - not # forgotten - and thanks for understanding. # # The only blog was pretty big, so I moved the old blog entries over to # here: http://www.webmasterworld.com/robots-blog.txt # # I am off to TRAFFIC Vegas this week. # http://www.targetedtraffic.com/ # # Brett Tabke # # # ... I went for a walk last night and she asked me how long I was going # to be gone. I said, "The whole time." - Stephen Wright # # $ENV{'REMOTE_ADDR'} # (C) Copy and Copyright 2007 WebmasterWorld Inc. All Rights Reserved.