dog or higher: Catching web standards: "The developer I was working with initially built our site using tables, and when I pointed out that company policy was to use CSS, she got, shall we say, a little huffy. I knew I had to get my boss on side to influence her to do it over.
So I went in to him. "Boss", I said "you are not going to understand much of what I am about to say but you need to know that it's important and I will try to explain it to you as best I can."
He looked mildly alarmed.
I went on. "Imagine that you wanted me to send out a document on your behalf, and we have a lovely word processor there to use, but I created the document on a manual typewriter instead because I didn't know how to use the typewriter." He nodded."
So now that I found out that, yes, people are actually writing software and selling it for US$24 that does comment spam (wiki spam won't be far off), we really need to work out bullet proof solutions. By bulletproof I mean that can't be cracked too easily on a large scale by determined coders.
Some ideas on battling commentspam, referrerspam, wikispam and such: (most are for battling robots)
- Randomize the scriptname that accepts the POST data for each installation, or even for each pageview. Not bulletproof, but makes finding your script harder for the software.
- Add a random ID to your form, valid for a 1 post. If the ID isn't right, the post doesn't go through. This means that for every spam post, the spam software needs to download the page once. If you randomize the field name as well, it might even work better. Not bulletproof though.
- Generally randomize all field names. Create a table that maps your randomized field names to the real field names.
- Until a poster has proven they're human, make it really hard to machinespam.
- Find a way to penalize spammers, that doesn't make it easy to penalize others by faking them as spammers.
- Make sure you don't make it too hard to post.
- Keep a central list vetted by some authority (maybe a community) of know spam URL's. Actively use it to scare the people who buy spamming software (find them!): "We'll make you loose pagerank!". Be aware of the problems with central lists - this list should only list clear, true and proven spammers, not may-be-spammers.
More ideas?
From my referrers: www.php-soft.com looks like they sell comment spam software, used for comment spam, wiki spam and other stuff. It looks pretty efficient and somewhat advanced - enter a scriptname and it does a google search and spams all the scripts it finds.
I started a wiki page on Video Encoding Tools. Feel free to add :)
Jon's-video-Udell points to the Turbine Video Encoder, which encodes Flash video. There's a free version that places a small watermark. I will have to try this out.
Comments on my Video Blogging Timeline welcome!
Helping people start blogs
I've helped 2 friends of mine start blogs in the past year: Jay and Melina (I don't even remember her URL). They don't blog much though, think they aren't getting that bloggin' feeling. Jay sends me URL's sometimes that I tell him he should put on his blog instead.
What are your experiences with helping people start blogs?
And here we have an illustration of why RDF is useful: "The bane of my existence is doing things I know the computer could do for me. When I got my proposed July 2001 travel itinerary in email, I just couldn't bear the thought of manually copying and pasting each field from the itinerary into my PDA calendar. I started putting the Semantic Web approach to application integration to work."
XML.com: Something Useful This Way Comes [Jun. 09, 2004]: "In other words, development of the Semantic Web requires a lot of work, but there's been a lot work done. This raises an obvious question: when will all that work pay off?
There are only three ways to answer that question -- already, never, or somewhere in between."
In other words, the semantic web is kinda here and it's kinda useful. That's good enough for me.
One thing bothers me though. If it's the data model of RDF that's valuable, not the syntax (pretty much everybody hates the RDF-XML syntax), then really, what's the big deal about? The RDF data model really isn't that complex. Why do so many people shy away from RDF if it's really just a useful data model? I suspect it might be because of all the stuff they build around that model, especially the syntax. Couldn't someone invent RSRDF (Real Simple RDF), implementing the same useful data model (and at the same time maybe explain exactly why it's so useful), but keeping the tools around it (OWL, syntax, ...) much simpler?
Thinking a bit more about wiki spam avoidance: from a usability point of view, it's a bit annoying for a user to have to enter a code they see in an image every time they edit a wikipage. The system does pretty much completely block out bots though. That's a good start.
So why don't we set a cookie after someone has entered it once (and thus proven they're not a bot)? After that they won't be challenged again. Cons: a real person could still run a script after going through "the barrier" once manually. Maybe the cookie could be turned off every time the person enters a link to another site, thereby identifying themselves as a possible spammer. I assume maybe 80% of all wiki page edits don't add links to other sites.
Whatever the solution, avoiding spam is going to be important for any wiki. The potential for abuse is just too great. Until I come up with a solution (I am seriously thinking about working something out), I just monitor my recent changes by RSS and keep an eye out for wikispam.
Wiki sandbox - Google Blogoscoped Forum: "Pillipp, I have had to lock four of my wikis to control your trashy misuse of them. For many people, the internet is a production tool. Converting portions of it to your own little SEO litterbox is arrogant conceit, more suited to politicians than to internet technologists. I'm glad I've hunted you down."
The Art and Science of an Effective Link Building Campaign: "To achieve high PageRank, Silverstein [who works at Google] said you want the expert sites in your market linking to you and hopefully nobody else."
Google sure keeps their projects in beta for a long time: Google News
Wired News: A Contest to Outwit Google: "The owner of an online forum won the first round of a worldwide search-engine optimization competition Monday, by using a backlinking strategy that scored his site as the top Google result for a made-up term, "nigritude ultramarine."
Google is considering RSS support (they already support ATOM, a similar standard).
Cyburbia: an urban planning portal. (via Worldchanging)
Jon Udell on portable usability labs. I saw a demo of VisualMark this year, and it is funky. It's mac only, but worth buying a mac laptop for, and you can test anything on windows or even Linux PC's by simply connecting a cable. It works like this: get a user behind any (Windows, Mac, ...) pc or laptop. Connect your mac running VisualMark. Connect mac cameras to capture face expressions. Run.
Knowledge Sharing at the World Bank (via the excellent Column Two)
XMLTV: "XMLTV is a set of utilities to manage your TV viewing. They work with TV listings stored in the XMLTV format, which is based on XML."
mod_torrent: "Mod_torrent is a drop in solution for Apache servers when deploying the BitTorrent file swarming technology. With mod_torrent your visitors share the bandwidth burden when distributing large files on your web site. The module transparently makes all, or optionally only certain types of files, retrievable by any client implementing the BitTorrent protocol."
LAMPPIX: Bootable webserver on a CD - SitePoint PHP Blog: "Here's a great idea - LAMPPIX a LAMP web server, bootable from a CD. It uses KNOPPIX, a Linux distribution designed to boot from a CD as well as XAMPP - a distribution of Apache the usual suspects (e.g. mmCache).
For PHP solution providers seems like a great way to distribute your product to customers without requiring specialist knowledge from them - just put the CD in and reboot."
About Google's OS: What comes after GMail and Froogle?: "gGrid? Grid? gRid? ... Greed?"
InfoWorld: The joy of outsourcing: "Although most IT managers want increases, the dirty little secret of IT budgeting is that a lot of IT gets steadily cheaper even within a 12-month budget cycle. The time between the day when a budget process begins and the books are actually closed on that budget could be as long as 18 months. This means that the $5,000 server you put in your budget in July 2004 for 2005 might cost $3,000 when you actually buy it in December 2005"
WorldChanging: Another World Is Here: Chinese Wikipedia: "For the 10th anniversary of China going online, PCWorld has a fascinating report about the growth of the "Chinese Wikipedia." In short: no censoring yet.
BBC NEWS | Inside the Google search machine: "Blogs are not so much of a problem," says Mr Cutts. "They show up less often than you expect."
DevNetwork Forums :: View topic - Come to the PEAR-fest: "I am one of the core developers of the PEAR package who has joined the game relatively late. I see the strengths of the PEAR installer as the reason PEAR will become an important force. The base PEAR class is one monstrous mistake, and PEAR_Error is pretty close. Everyone knows this." Well, I know now.
Bluemountain's impressive (it's an e-card service), in a somewhat nasty-you-have-to-be-a a-clever-consumer kind-a way.
Before they let you send a card, they ask you to sign up for a 30-day free trial. In order to cancel it, you have to call their customer support (you can't cancel online), which is only open during business hours. They then "verify" your address, probably to sell it, before closing your account. (I forgot to ask about that.) Before canceling, they offer a discount as well, so that's worth doing even if you want to use them.
M$ and SAP were thinking about merging: "The companies initiated merger discussions late last year, but eventually broke off talks for reasons of complexity"
Went to see Supersize me, a home-made documentary about a guy who eats nothing but McDonalds for a month. Thoroughly entertaining and informative, it's a great documentary, made with cheap gear. It encouraged me to finish the documentary I'm working on. It won't be this good, but it might be passable...
Everything TypePad!: TypePad vs. Comment Spammers: "According to our logs, most spammers try to cover their tracks by sending their posts through an "Open Proxy Server". An Open Proxy Server is a misconfigured or infected machine that forwards web requests for anyone on the entire Internet. The spammers use these proxies to avoid the one commment-per-minute restriction and the Blog Owner's IP address blocking.
So, we started blocking Open Proxies -- all 1.5 million of them. It immediately reduced the comment spam problem. In fact on our first day it blocked over 20,000 spam attempts!"
Why Wiki Works: "Deleting wiki pages is about as much fun as emptying the windows recycle bin."
Scene for Stargate with Teal'C on Flickr - Photo Sharing!. An example of the pretty f**ng fantastic photo annotation feature.
Life in NYC: "if you rave long enough, you'll arrive at something fairly reasonable eventually."
Calling the Apache htaccess masters from Ben Hammersley's Dangerous Precedent. With all the changing of blogging software lately, redirecting old URL's is a painpoint. The best suggestion so far seems to be to use a RewriteMap.
Michael's selling: Domain (studioid.com) For Sale.
On the train today I saw a weathered sign saying "Welcome home to our soldiers! Great Job!".
Not So Simple Search - Metrics - CIO: "According to a survey of 300 companies by Boston-based Delphi Group, nearly 30 percent of business users spend more than eight hours per week searching for electronic information."
Where is the qualitative research (not surveys!) about search? Who is watching businesspeople closely for days, logging exactly what they do (as opposed to what they say they do in a survey) and finding out why?
For what is one of the hottest IT topics around these days, there is surprisingly little valuable research being done on search. (I don't think I consider the above valuable research. Sorry guys, a survey just doesn't cut it.)
What's worse, the typical research findings ("business users spend x hours a day searching") are almost always followed by dodgy, Jakob-ish ROI conclusions ("At x US$/hour, this costs companies worldwide y billion US$ a year!"). Useful for convincing some of the more stupid CIO's, maybe. Not for much else. I guess I'm just in a pissed off mood today.
I am starting a mailing list for videoblogging.
1&1 Root Server. Looks good. 500Gigs/month bandwidth for US$ 49.
Steves Digicams - Apacer Disc Steno: "The Apacer Disc Steno CP-100 is a portable, battery-operated, multi-session CompactDisc recorder. Its main use is to transfer image files or other data from flash memory cards to CD-R or CD-R/W discs without a computer. This eliminates the need to buy multiple flash memory cards when vacationing or working out in the field and allows you to leave your laptop at home."
The Inevitable "Day After Tomorrow" Review. I went to see it yesterday. Bad, yes, but entertaining. When you live in New York, a lot of movies featuring the city become much more entertaining. You recognize the buildings, the people. It's kinda fun.
(Spoilers ahead)
I think many people will especially appreciate the part where millions of Americans try to enter Mexico, only to be stopped at the border. Only after the president forgives all Latin-American dept do they let them in.
LLRX.com - Trends in Blog Searching: "Recently, most major search engines have altered their algorithms to push blogs down in the search results. Engines that only return two results from any one site use this feature to limit the impact of blogs on the search results."
Textdrive - Wikipedia, the free encyclopedia: "TextDrive is a managed webhosting company created by Dean Allen (creator of Textpattern, a weblog/content management system) and Jason Hoffman, designed primarily (but not exclusively) as a hosting service for Textpattern-managed sites, similar to Six Apart's Movable Type-based TypePad service. Allen was able to raise just under $40,000 in startup funding in just over three days. The investors, also known as the "VC 200," were an assortment of web developers, designers, and bloggers. Allen chose this means of raising capital as a community-funded alternative to traditional venture capital." A competitor for Typepad :) Differentiation: you get hosting as well (meaning you can do more things than just host your blog).
I hadn't noticed this page before: IAwiki: WhereInTheWorld: IA practicioners organized per country.
Peter Morville now owns every single link on the search results page for findability on Google.
If you're considering moving away from MT, the Drupal MT import code is being worked on actively. Wait a few weeks, and your wait should be well worth it!
LanguaL Food Description Home Page: a multilingual thesaural system using faceted classification for the food industry.
Help pay B&A's hosting fees: "Because you're donating directly through DreamHost, rest assured that your donation will only be used to pay for this site's hosting fees. The site owner won't be able to run off and spend your donation on DVDs, fine steak dinners, or anything else completely unrelated to their web hosting bill!"