One of these days I may get round to blogging a list of all the whiskies I like though it’s fair to say I have strong leanings toward sherry casks and regular strength whiskies, cask strength is mostly not for me. I’ve just opened a bottle of 16 year old Balvenie (triple cask) which is rather pleasant. My current favourite at the moment however is a newish one from Laphroaig that was matured in Pedro Ximenez casks and is 48% alcohol: Laphroaig’s PX Cask. Interestingly I bought a litre at the airport for about AD$110 but have seen it retailing locally for around $200.
I’m happy to report that full connectivity was restored to the house a day or two before we were due to fly to NZ. Returning home on Sunday, I was happy to discover that we still had net :) I’ll talk more about the NZ trip and tramping the Kepler Track once I’ve sorted out the photos and loaded them to flickr. I have about 130 photos that I need to weed though that should be relatively quick compared to weeding my photos from the European holiday over Dec/Jan. For the Europe set, I’ve managed to get it down to under 300 from around 700 but it still needs a couple more goes. I should have the Kepler set up this week at least.
While I’m in post NZ recovery, here’s 5 random things I’ve tweeted in recent months:
Been about two weeks since our home internet died; not even a dial tone. The lack of dial tone indicates the issue is either the socket or external. Took a couple of days to get through to TPG support who in turn contacted Telstra to send out a technician. The deadline of 4th April came and went. Turns out there’s issues at the moment logging requests for Telstra technicians. TPG have assigned a specific person to manage our issues. Fingers crossed.
On the other hand, this is a first world problem. I have good internet at work and on my phone. The house network itself is fine so we can still connect to the NAS and wifi to the printer. I have been very reliant on my telstra network for mobile however. 4 days ago, I exceeded my data and was charged $10 for an additional gig. This is less scary than I anticipated as I’d built up in my head all sorts of scary scenarios for “exceeding my limit”. On the other hand, around lunchtime today, on the final day of my monthly data period, I exceeded the additional gig and had another gig added for another $10.
This means I have half a day to use a gig of mobile data. I am sitting at home hotspotting my mobile to the mac mini and to the win laptop. I am computering as much as I normally would with unlimited broadband. Care factor: 0. I have booked accommodation in Queenstown..at last as we’re tramping the Kepler track next week…we did at least book the huts a few weeks ago.
One of the things I’m interested in is working with data sets around web harvesting and archiving. I’ve spent a bit of time over the years exploring the Internet Archive and other web archives, and I’m hitting the point where I’d like to understand the sorts of information gathered when you harvest a bunch of websites. What can be discerned from a site’s structure, how does it change over time, are there any other useful directions to explore?
When you harvest web sites you end up with a bunch of files in the WARC format. So far, in my limited experience, a typical WARC file is about a gig and one harvest can contain lots of these files. Depending on how your set up your harvester, you can save all content on a site including office files, music, video and so on. A harvest captures that website at one moment in time, and with repeated harvests it’s possible to get a sense of how it might change over time. As part of learning how all this works, I’m using a small archive of 72 WARC files that roughly total 55GB.
Having successfully installed lots of software on my machine at home, I might actually be ready to start experimenting. I’ve been following the Getting Started guide for installing Warcbase (platform for managing web archives) and associated software on a mac mini. While time consuming, it’s actually been straightforward and installing software on the mac has seemed easier than installing similar stuff under windows a year or so back. Of that guide, I have completed steps 1, 2, 3, and 5. Step 4 involves installing Spark Notebook but the primary site seems to be down at the moment so I’ve installed gephi to handle data visualisation. As a result I am now running:
- Homebrew – MacOS package manager
- Maven3 – software project management tool
- Warcbase – built on hadoop and hbase
- Apache Spark – an engine for large-scale data processing
- Gephi – data visualisation
In other words a bunch of tools for dealing with really large data sets installed on a really small computer :-) I’d originally bought the mac mini to migrate my photo collection from a much older Mac Pro and hadn’t considered it as a platform for doing large scale data stuff. So far, it’s holding up though I am feeling the limits of having only 8GB of RAM.
All those tools can be used on really big systems and run across server clusters. Thankfully, they also work on a single system but you have to keep the data chunks small. I tried analysing the entire 55GB archive in one go but spark spat out a bunch of errors and crashed. Running it file by file, where each file is up to a gig, seems to be working so far.
There’s been no working internet at home for a couple of weeks so I’ve been hampered in what help I can look up but at least had all the software installed before we lost connection. Spark may have had issues for a different reason eg I may not have specified the directory path correctly but I couldn’t easily google the errors.
I’m trying out a script in spark to generate the site structure from each archive and this is typically producing a file of about 2-3k from a 1GB file of data. The script is able to write to gephi’s file format, GDF. Gephi supports the ability to load lots of files and merge them into one. That means I can run a file by file analysis and then combine them at the visualisation stage. I haven’t worked out the code to run the script iteratively for each file and am manually changing the file name each time. The ugly image below is my first data load into gephi showing the interlinking URL nodes. I haven’t done anything with it, it is literally the first display screen. However it does indicate that I might at last be heading in a useful direction.
Next steps include learning how to write scripts myself and learning how to use gephi to produce a more meaningful visualisation.
So, VALA is running a tech camp in July and I wanna go. In fact, I’m fairly sure I will go. I can teach myself coding things and did study computer science a decade or two ago. Actually now I think about it, it was nearly 3 decades ago. Eep! I’m almost 50 and still pottering along and trying to work out what I want to do with my life. Anyway I can teach myself but do tend to learn better with other people around.
A year or so back, I was playing with code on my vaio (running Win8 then, win10 now) and trying to get stuff working to explore and analyse web harvesting stuff. Got caught in a neverending circle of installing software dependencies and eventually ran out of puff without getting to the playing-with-code stage. I did have docker running, virtualbox running linux, and got most of the way with maven2.
This year I’m trying again on my mac mini. Installations ran smoothly, I’ve had few issues with software dependencies…I now have docker and maven3 and
SPARQL apache spark installed and running. I have approached it differently this year, following a different guide. Also, the mac is easier as unix is fully integrated with the OS, whereas it’s a separate thang under windows.
I stalled a month ago as I couldn’t get the test example in
SPARQL spark/scala to work. I realised a few days later that it was probably an issue with pathnames. Finally got round to trying again last night, and it was indeed a pathname issue and I resolved it in minutes and got the text example to work.
So my current dev environment is a mac mini, not the windows laptop. But I wanna take it to tech camp. So I looked at connecting the mini to laptop and it’s sorta doable but a little bit painful with reduced functionality.
Or I could apply what I’ve learnt from the mac install and revisit the windows install and get it all running there too. That’s the cheapest option and a happier one as I remain fond of my laptop and want to keep using it. I love the idea of a handheld projector but it is a wee bit excessive and possibly gratuitously so.
2017 is shaping up to being an interesting year for special editions of science fiction and fantasy novels. I mostly like to buy nice editions of books in these fields, partly because I like pretty books and partly so that I can have something with better lastability than some of my increasingly dodgy paperbacks. Mostly I like the pretty. Also, I like reading books that feel nice in the hand and printed with good fonts.
In a comment on my privileged purchasing power (good job, no mortgage) I am starting to lose track of pre-orders for interesting things. I’ve long past the point of waiting for stuff to appear in bookshops. I am on the mailing lists for several speciality publishers in my favoured fields. This means I hear about books they’re planning, and when they’re likely to release. They usually allow you to pre-order titles too, plus some of their stuff never actually make it to bookshops these days, or at least not the special editions. I am a collector and an addict…I’m not sure which is the more prominent attribute.
Books I have pre-ordered:
- Reaper’s Gale by Steven Erikson – this is book 7 of 10 of the Malazan series. These are some of the best books I have ever read…and I’ve almost read the entire series twice. The Subterranean Press edition is due for release in August 2017 and I have the first 6, all with the same numbered edition.
- Abaddon’s Gate by James S. A. Corey – this is book 3 of The Expanse series, which has also been made into a TV show. Also published by Subterranean Press.
- I think I’m waiting for one of Perth writer, Greg Egan’s, books from Subterranean Press, but it may already have arrived. I think I have all of his Subterranean releases now.
Books that aren’t available for ordering, or pre-ordering, yet:
- The ‘Rynosseros Cycle‘ by Terry Dowling – I love this series much. Terry is a Sydney based writer and the Rynosseros books are based on a sort of futuristic dreamtime spanning across a couple of novels and groupings of short stories. PS Publishing, whom I rely on for special editions by Ian C. Esslemont, who is co-creator of the Malazan universe, have recently announced an Australian arm, and amongst other things, are planning to do a special edition edition of the Rynosseros Cycle
- Dune by Frank Herbert – It looks Centipede Press are going to release special editions of all of 6 of the Dune novels. I already have a nice edition of Dune (and only Dune) by Easton Press but a Centipede Press edition is to die for.
- Centipede Press are also doing a Masters of Science Fiction Series, for around US$40 per book. Two released so far:
I think I love Centipede Press the most. They do a lot of horror which I’m not really into. As an aside, there is a significant stream of dark fantasy and horror running through small press publishers in science fiction/fantasy these days. Each Centipede book is approached in a different way and a lot of ideas have gone into the design and development of their titles. I’ve managed to score some very nice books either full price, or discounted including, in addition to the above:
- The Sheep Look Up – John Brunner
- Stand on Zanzibar – John Brunner
- The Anubis Gates – Tim Powers. This edition is utterly gorgeous
- Ender’s Game – Orson Scott Card. Again, gorgeous and includes a “… separate book of the author’s original typed manuscript from 1975”
…and it’s only March :)
I came across this post from Kotaku about trying to collect and preserve the context of the world of computer games ie getting the external materials, promotions, articles and so forth which provide a real world background to the development of the game itself.
This sort of ties into one of my ongoing concerns in game preservation, how do I convey the sense of “atari thumb”? As this link shows, the Atari joystick was fairly basic. I spent so many hours using that controller as a teenager, thumb on the red button, mashing it as hard as I could. Eventually, you’d have to stop playing as your thumb got too sore to continue hence “atari thumb”.
There’s plenty of options around for game emulation including the almighty Internet Archive’s Game Arcade and MAME has just had its 20th birthday. However it’s one thing to be able to play the old games, it’s another thing entirely to talk about and understand the culture of gaming when the original systems existed. It’s nice to see for example, that the internet archive is maintaining an archive of old computer magazines including one of my favourites from the 80s, the UK Computer + Video Games. I bought this magazine every month, usually for one column, particular, the Adventurer’s Helpline.
The Adventurer pages were full of hints and reviews text adventures including the US Infocom, and the English Level 9. I have vague recollections of reader letters and responses too so it felt like there was an international community. There were also Oz based magazines including the Australian Commodore Review which morphed into the Australian Commodore & Amiga Review and included a dedicated text adventure section called “Adventurer’s Realm“. Capturing that external world of gaming is a tricky beast. Many years ago, I discarded most of my original copies of those magazines though did cut out all the adventure columns. I’m sorta hoping that I’ve retained that small archive somewhere in a box. On other hand, it seems to be the case that more and more of this material is being digitised and made available online.