December 15, 2024

Live Webcast: Future of News, May 14-15

We’re going to do a live webcast of our workshop on “The Future of News“, which will be held tomorrow and Thursday (May 14-15) in Princeton. Attending the workshop (free registration) gives you access to the speakers and other attendees over lunch and between sessions, but if that isn’t practical, the webcast is available.

Here are the links you need:

Sessions are scheduled for 10:45-noon and 1:30-5:00 on Wed., May 14; and 9:30-12:30 and 1:30-3:15 on Thur., May 15.

Comments

  1. I’m called Tinker

  2. The recommended hosting change has apparently taken place.

  3. Retry: original failed to post, and its followups are hanging there referring to it and consequently not making much sense.

    What the hell is going on with the site? It died sometime between Saturday morning and Sunday evening and nobody got around to fixing it until late on the following Thursday. At which point, although it worked again, it acted as though it were on a quite low-capacity server, frequently slow or timing out.

    Disturbingly, the several-days-long outage cannot be due to an external factor, such as a severe weather event shutting off the power or net connection for an extended period of time. During the outage, the server intermittently responded, which it would not have done if the cause was external to the server premises, though the server never responded with the correct content for any URL during that time, nor even with legitimate error messages (e.g. where the requestor actually requested a bad URL, and got a 404 for his troubles).

    So the cause was internal to the place where the hosting server is located, and wasn’t a simple power or network outage. If it was a hardware failure and your hosting company was of decent quality, they’d have swapped in a standby replacement part, restored (if necessary) Friday night’s backup tape, and had things fixed within an hour. If your hosting company was of questionable quality but not outright abysmal, they’d not have gotten around to it until Monday, but it would have been up again well before noon on that day.

    It was not up again until an additional three whole days, all of them business days, had elapsed.

    If it was a software error/misconfiguration, then it is even worse news. It means that a) they do have weekend staff, because one of them was there monkeying with the system on Saturday afternoon or Sunday morning; b) said staff doesn’t know WTF they’re doing, as evidenced by making an error and then repeatedly compounding it by not testing the change they’d made and testing that the system passed basic tests of seeming to still be working afterward, or else finding that they’d hosed the system and then just saying “the heck with it” and leaving it like that instead of promptly hitting “undo”. Most likely, if it took them over four whole days to get around to fixing it, it means they’re either terminally slothfully lazy bums not worthy of your money, or they seriously screwed things up and didn’t have appropriate change management in place so that any tweak that proved to cause problems could be easily and swiftly rolled back, forcing them to rummage around for Friday night’s backup tape. That it took them four whole days to find said tape indicates that their backup storage is terminally disorganized, or their only backup storage is off-site and somewhere between Japan and the far side of the f*#!ing moon!

    Whichever scenario most accurately describes this debacle, it indicates that your current hosting company is horrible, God-awful, and terrible, and furthermore is miserable and sucky. I suggest finding a replacement posthaste, especially in light of this not being the first outage with similar symptoms (but thus far it is by far the most egregious in duration).

    The server’s continued shaky and overloaded/underpowered behavior is further evidence that your hosting provider’s QA is inadequate. Clearly they have scrimped on keeping their servers upgraded to a capacity capable of meeting the demand placed upon them.

    The best case for them is if there was a hardware failure, and it was a drive in the DB box. Even then, one of four things was true, none of which is acceptable quality:

    1. The drive was replaced too slowly, because their technicians couldn’t isolate a fault with both hands and a map, and probably also don’t work weekends, perhaps even being outsourced. No large hosting provider serving professional sites like FTT should fail to have 24/7 in-house repair technicians — knowledgeable technicians — for when their own internal stuff needs fixing.
    2. The drive was replaced too slowly because they don’t keep a few spares lying around, had to wait until Monday to order one, and their supplier takes three days to ship so they only got it Thursday morning. Also unacceptable.
    3. The drive was replaced quickly, but the backup tape took days to locate and restore, either because backup storage was horribly disorganized or because they had only offsite backup storage and no way to access the backup data by network, necessitating physically shipping it. In which case what were they thinking?!
    4. They didn’t even become aware of the problem until Thursday morning, in which case their system monitoring leaves something to be desired.

    AND they clearly didn’t have RAID or back-end failover, or a single drive failure wouldn’t have killed the site outright; they could have replaced it without anyone in the outside world even noticing. That’s bad in itself. A dead non-drive component as the culprit reduces it to them having no failover and taking four business days to replace the component, which is at least three days and 23 hours too long. A software/configuration error makes the failure itself their fault for fixing what weren’t broke, compounding this first error with some unguessable chain of further stupidities probably exacerbated by bad/lacking change management and backup management.

    The most likely explanation in any event, also explaining the overloaded behavior of their servers now and the high frequency (once every two or three months) of major downtimes, is of course that they’re cheapskates.

    Please find a provider that will give you your money’s worth, instead of one that gives as little as possible and pockets the difference.

    Pronto.

    Sooner or later, they’ll screw something up and lose their backups at the same time and FTT will be completely hosed. You know they will, based on their track record for the past year (or just the past four days).

  4. Actually, it’s even worse. It says 5 days, 6 hours, not 6 days, 5 hours. That means that the drive that blew on Sunday at the latest wasn’t noticed KIA until Tuesday. Apparently they not only take weekends off, but Mondays as well! A far cry from the 24/7 technical support that should be the bare minimum standard. That the trouble report has had no updates in days, even though the problem was apparently fixed only to happen again and then get fixed again, is also troubling.

  5. And it was just down again, this time for “merely” two days. Ludicrous. You deserve better hosting than this.

    C:Documents and SettingsHP_Administrator>nslookup http://www.freedom-to-tinker.com
    Server: mtrlpq02dnsvp1.srvr.bell.ca
    Address: 207.164.234.129

    Non-authoritative answer:
    Name: http://www.freedom-to-tinker.com
    Address: 208.113.232.84

    C:Documents and SettingsHP_Administrator>nslookup 208.113.232.84
    Server: mtrlpq02dnsvp1.srvr.bell.ca
    Address: 207.164.234.129

    Name: apache2-idea.idea.dreamhost.com
    Address: 208.113.232.84

    Dreamhost obviously doesn’t live up to its name. Wikipedia indicates it has a mixed good and bad reputation. http://www.dreamhoststatus.com/ seems to indicate that the problems are indeed a result of their having lost a drive in the DB server, and then royally f*$!ed things up — that message was posted six days ago, initially said the repair would be done within 30 minutes, then that it would instead take “hours”, and it’s not marked as resolved after nearly a week. (Read the comments and note how many others are jumping ship.)

    The problem began sometime between Saturday morning and Sunday afternoon a week ago; the time indicated on that post is about 11am last Monday, 24 to 48 hours after the drive really died. Obviously they don’t deign to fix things or even to discover problems on weekends. Strike 1. Then, either they lie about how long it will take to fix, or they are so bad at fixing things that they actually aren’t lying, merely mistaken and off by several whole orders of magnitude. I’m not sure which is worse, but … strike 2. Strike 3 is that this isn’t the first time, and these incidents haven’t even been all that infrequent.

    They’ve also had several recent problems with security — server exploits putting malicious iframes into pages they host AND passwords and other customer info leaked.

    Nightmarehost doesn’t deserve another dime of your money.

  6. Hi ed, I know this ain’t the write place to post this, so I hope you will accommodate me. I have a question I need to ask Alex Halderman, and it’s rather important. I decided to leave u this msg as you have made a post more recently than he has, in hopes that you will get to read this msg and pass it along to him. So will you pls pls ask him to get in touch with me at my email addy. Or if you could let me have his email so I can mail him.
    Thanks.