I’ve decided, after some prompting, to put together an anthology of my writings. Unfortunately they’re scattered over multiple blogs and some of those blogs don’t exist any more. I had assumed this would be relatively easy, using the Wayback machine, but they seem to have made some changes.
Going to a later bopnews front page works fine. However, what no longer works is the drop down author box, which goes to Wayback machine’s own “bopnews” site. It would probably be possible to find most articles by going through the archive day by day, but I’d really rather not.
Suggestions and/or solutions?
Rich Puchalsky
If you use the Wayback machine to search for versions of the specific page:
http://www.bopnews.com/a_ianwelsh.html
I think that will work.
Ian Welsh
Sadly, I’m getting a consistent 404 on that.
Formerly T-Bear
Way back when, FDL had a search devise which may have been format specific:
site:(site to be searched).com(space)(search object)
which seemed to pull up most of the items having that name. I haven’t used this in years and it may not work. A case of old dog, old trick.
kzk
I’m not seeing the 404’s:
http://web.archive.org/web/20060720193406/http://www.bopnews.com/a_ianwelsh.html
http://web.archive.org/web/20060720193344/http://www.bopnews.com/a_oldman.html
http://web.archive.org/web/20060720194200/http://www.bopnews.com/a_stirlingnewberry.html
Ian Welsh
KZK: odd, it was definitely 404 last night, but not today. Leaving page open for when I have time, praying. Thanks.
KZK
You can save whole webpages to your personal computer and then work off that. IE–>File–>Save As.
Ian Welsh
Thanks, that much I do know, I’m just about to head afk and I wrote a LOT of articles for BOP.
Ian Welsh
Yeah, get there and then a lot of the “more” links don’t work. Sigh. I’ll keep at it.
Julien
The Wayback Machine has a Web service API, I’ll see if I can cobble together a script to extract what they have saved.
KZK
Archive.org is slow at spidering/archiving and bopnews disappeared suddenly. Some of the pages never got spidered.
Julien
Indeed, I won’t be able to retrieve everything, but I’ll get the summary if the full post is not there.
Turned out the API is more trouble than it’s worth, so it’s going to be old-school screen scraping.
Julien
Here’s a link to what I was able to pull out: https://dl.dropboxusercontent.com/u/59112939/ian_welsh_bopnews.zip
You’ll find two directories: “full”, when the full article was available, and “partial”, when only the summary remained and the like to the full content was dead. There’s also a text file (retry.txt) under “partial”, where I got 503 responses from the Archive. This means those articles might still be available, but the machine serving them was down when I ran my script.
All the posts are in separate HTML files, with the post number as the file name. Unfortunately, the date format was only Month-Day, so the actual year of the post is lost. Hopefully, since they are numbered sequentially, you’ll be able to piece them back together.
Let me know if that work for you.
Ian Welsh
Julien,
you rock, thank you!
KZK: yeah, the server went down and the decision was made that it wasn’t worth restoring/continuing. Pity, still (forgive me all the other lovely blogs I wrote for) the best blog I ever did write for.
Though FDL had the best managing editor 😉
markfromireland
Ian try adding a trailing #more to your search term thus:
URI/#more
If the URI has a page number try that:
URI/#more-1234
Also if your browser is Firefox then Print pages to Pdf saves web pages as a PDF. I use it a lot and am very happy with it.
Hope this helps.
mfi
barrisj
Any luck in retrieving your contributions on The Agonist site? You and Newberry were amongst my favourites there several years ago.
Peter Cowan
Ian,
Damn, I was under the impression that someone had the full archives of BOPNews on disk somewhere.
Juliean, did you write a script to do the scraping on the wayback machine? If so, would you mind sharing it?
barrisj,
Ian and Stirlings writings on Agonist.org can still be retrieved via navigating the web site (for now, anyway).
http://agonist.org/author/IanWelsh/
http://agonist.org/author/StirlingNewberry/
I had literally hundreds, maybe thousands, of blog posts by Ian and Stirling from every site they had written for since ~2006 archived in my google reader, until those fuckers at google decided to shut it down. the export option didn’t export the full articles, and I didn’t have time to write a scraper before it was too late. such a huge loss.
Ian Welsh
Stirling may have the full archives somewhere, but that’s not an option right now, though it may be again.