The horizon is not so far as we can see, but as far as we can imagine

Need help with finding old Bop News Articles (aka. bleg)

I’ve decided, after some prompting, to put together an anthology of my writings.  Unfortunately they’re scattered over multiple blogs and some of those blogs don’t exist any more.  I had assumed this would be relatively easy, using the Wayback machine, but they seem to have made some changes.

Going to a later bopnews front page works fine. However, what no longer works is the drop down author box, which goes to Wayback machine’s own “bopnews” site.  It would probably be possible to find most articles by going through the archive day by day, but I’d really rather not.

Suggestions and/or solutions?

Previous

All registered users will have to re-register Thursday August 15th

Next

Surveillance States and the End of Freedom

17 Comments

  1. If you use the Wayback machine to search for versions of the specific page:

    http://www.bopnews.com/a_ianwelsh.html

    I think that will work.

  2. Ian Welsh

    Sadly, I’m getting a consistent 404 on that.

  3. Formerly T-Bear

    Way back when, FDL had a search devise which may have been format specific:

    site:(site to be searched).com(space)(search object)

    which seemed to pull up most of the items having that name. I haven’t used this in years and it may not work. A case of old dog, old trick.

  4. Ian Welsh

    KZK: odd, it was definitely 404 last night, but not today. Leaving page open for when I have time, praying. Thanks.

  5. KZK

    You can save whole webpages to your personal computer and then work off that. IE–>File–>Save As.

  6. Ian Welsh

    Thanks, that much I do know, I’m just about to head afk and I wrote a LOT of articles for BOP.

  7. Ian Welsh

    Yeah, get there and then a lot of the “more” links don’t work. Sigh. I’ll keep at it.

  8. Julien

    The Wayback Machine has a Web service API, I’ll see if I can cobble together a script to extract what they have saved.

  9. KZK

    Archive.org is slow at spidering/archiving and bopnews disappeared suddenly. Some of the pages never got spidered.

  10. Julien

    Indeed, I won’t be able to retrieve everything, but I’ll get the summary if the full post is not there.

    Turned out the API is more trouble than it’s worth, so it’s going to be old-school screen scraping.

  11. Julien

    Here’s a link to what I was able to pull out: https://dl.dropboxusercontent.com/u/59112939/ian_welsh_bopnews.zip

    You’ll find two directories: “full”, when the full article was available, and “partial”, when only the summary remained and the like to the full content was dead. There’s also a text file (retry.txt) under “partial”, where I got 503 responses from the Archive. This means those articles might still be available, but the machine serving them was down when I ran my script.

    All the posts are in separate HTML files, with the post number as the file name. Unfortunately, the date format was only Month-Day, so the actual year of the post is lost. Hopefully, since they are numbered sequentially, you’ll be able to piece them back together.

    Let me know if that work for you.

  12. Ian Welsh

    Julien,

    you rock, thank you!

    KZK: yeah, the server went down and the decision was made that it wasn’t worth restoring/continuing. Pity, still (forgive me all the other lovely blogs I wrote for) the best blog I ever did write for.

    Though FDL had the best managing editor 😉

  13. Ian try adding a trailing #more to your search term thus:

    URI/#more

    If the URI has a page number try that:

    URI/#more-1234

    Also if your browser is Firefox then Print pages to Pdf saves web pages as a PDF. I use it a lot and am very happy with it.

    Hope this helps.

    mfi

  14. barrisj

    Any luck in retrieving your contributions on The Agonist site? You and Newberry were amongst my favourites there several years ago.

  15. Peter Cowan

    Ian,

    Damn, I was under the impression that someone had the full archives of BOPNews on disk somewhere.

    Juliean, did you write a script to do the scraping on the wayback machine? If so, would you mind sharing it?

    barrisj,

    Ian and Stirlings writings on Agonist.org can still be retrieved via navigating the web site (for now, anyway).

    http://agonist.org/author/IanWelsh/

    http://agonist.org/author/StirlingNewberry/

    I had literally hundreds, maybe thousands, of blog posts by Ian and Stirling from every site they had written for since ~2006 archived in my google reader, until those fuckers at google decided to shut it down. the export option didn’t export the full articles, and I didn’t have time to write a scraper before it was too late. such a huge loss.

  16. Ian Welsh

    Stirling may have the full archives somewhere, but that’s not an option right now, though it may be again.

Powered by WordPress & Theme by Anders Norén