identifying data

Wednesday and time to respond to an identity challenge from Paul :-) 4 questions about me and computer gear I like and I suspect question 1 and question 4 are going to be the hard ones. As this is a personal space, I tend not to talk about my work, or at least not directly. My about page provides hints of past current jobs but that’s about it.

Who are you, and what do you do?

My name is snail. I use my real name at work though even there I’d prefer to use snail but all the systems are based around official names not nicknames. Sadly. Many folk know me as snail except security and the switchboard so turning up and asking for snail ain’t gonna work :-) I am the Online Resources Specialist Librarian at the State Library of NSW and I am responsible for working with eresources, dealing with vendors, contract management, budget management, EZproxy, eresource troubleshooting and support, eresource subscriptions and digital archive purchases…and stats…and more stats. I am the Library’s representative on the NSLA eResources Consortium. 3 years ago I implemented a project for whole of domain web harvesting of all government websites under *.nsw.gov.au and I’ve been running that ever since…I’ll be commencing the primary annual captures today. I may have been blogging about the web harvesting stuff recently :)

What hardware do you use?

At work, I have a basic laptop running Windows 7 plugged into a 24″ widescreen monitor, along with a Das Keyboard Professional 4 mechanical keyboard and a Logitech trackball. I have a Jabra bluetooth hub hooked up to the desk phone which is paired to my mobile hearing aid loop, enabling me to hear telephone calls through my hearing aids.

laptops, tablet, phone ereaderI have a personal laptop, 2013 11″ Sony Vaio running Windows 10, which I use occasionally at work for external testing. At home, I have a mac mini connected to a 24″ widescreen monitor, with a Logitech G610 mechanical keyboard and a Logitech trackball. Behind the scenes I’m running a home server on a 4 bay QNAP TS-421 in RAID 5: each drive is 3TB for a total of 12TB which I’m primarily using it for backing all my machines, running my itunes server, and photo archive. I have a 7″ Nexus (2013) tablet, a Samsung galaxy s5 phone, and a Sony PRS-T2 ereader. Even a Psion 5mx that still works! I have several old keyboards too, assorted external hard drives and lots of USB sticks. :-)

And what software?

30533574640_5de8d36502_nThe machine at work is on Windows 7 and has just migrated to Office 365. The personal laptop is running Windows 10 and tends to run Open Office variants, has a virtualbox running Linux Mint, and a few other odds and ends. The mac mini is running whatever is the current MacOS and the phone and tablet are running android. I’ve never been much good at this single operating environment malarkey :-) Some of my favourite software includes:

and more browser variants than I care to count including lynx.

What would be your dream setup?

I wish all my devices would talk better to each other, a universal standard for talking across different machines, operating systems and so on. More speed, more bandwidth and greater customisation options. I like things to look pretty, both the hardware and the software, and I don’t like it when fab looking customisations break things. I like working from home but like working near colleagues too and some way of merging the two environments would be fab. I want better ears to hear conversations and chit-chat.

why nls8?

In a few days time, I’ll pop into the car and drive down to Canberra for the 8th-ish New Librarians’ Symposium – I say “ish” as I recall there was at least a 1.5, and I don’t remember if there were other in-between events. I’d like to link to some of the earlier NLS websites but ALIA’s own conference page only links back to 2008, ignoring the earlier iterations of NLS, and even the ones listed are not available because ALIA are upgrading their conference website though I don’t really understand why “upgrade” means removing access altogether. Thankfully, I’ve found the NLS2006 site on the wayback machine, along with the 2004, and even the first in 2002.

It feels a bit odd going to NLS as I am very definitely not a new librarian by a long shot. I’m probably what is termed a mid-career professional which doesn’t sit well either as I’ve never been career or goal focused mostly just wanting to work with interesting people and occasionally do fun things. To be honest, mostly just wanting to work. I s’pose one could argue that I’m going to mentor newer members of the profession but that would be nonsense as I’ve never been much of a mentor-type. With that said, I remember one of the concerns in the early days was about ensuring there was a continuity of contact between different parts of the profession and avoid that sense of cliques developing. I want to make sure I don’t end up in a clique myself and want to get to know people outside my usual circles. I’m also going because it’s always been a bloody good conference, with a good sense of engagement, a welcoming attitude and lots of fun.

So yeah, all my reasons for going are totes selfish and all about me :-)

Menu for conference dinner, NLS2006My first NLS was in Adelaide in 2004 and I have found some of my thoughts on my previous blog iteration. I recall being blown away by it and made lots of new friends in the profession many of whom I’m still in touch with. Alan Smith, State Librarian of SA, spoke on the importance of thinking two jobs ahead and working out what you need to do in-between to get there. I’ve tried to apply that thinking but keep failing and still have no idea what I want to do next, nevermind after that. Post NLS3, I ended up on the committee for the next version,  NLS2006 (we chose to use the year rather than number), 2 years later; it seemed to go pretty well and was a total blast.

I made it to one or two NLS since and I missed a few as life stuff intruded. I think the last one I attended was in Perth…which I may have gatecrashed :) I’ve been on organising committees for a few library camps and unconferences too though I don’t think I’ve been on a full blown conference committee since NLS2006. Camps/unconferences are reasonably easy to organise, however something like NLS takes 2 years of commitment to make it happen. It is a rewarding experience and I have no regrets, likewise I applaud the efforts of the NLS8 committee in making it happen.

using big data to create bad art

A few weeks back, I installed a lot of software on my computer at home with the plan to work out what to do with large data sets, particularly web archives. One of my roles at work is being responsible for managing and running the Library’s web archiving strategy and regularly harvesting publicly available government websites. That’s all fun and good but you end up with a lot of data and I think there’s close to 5TB in the collection now. The next tricks revolve around what you can use the data for and what sorts of data are worthwhile to make accessible. Under my current, non NBN, download speeds I estimate it would take a few months to download 5TB of data assuming a steady connection.

The dataset I’m using currently is a cohesive collection of publicly available websites containing approximately 68GB of data in 61 files. Each file is a compressed WARC file, WARC being the standard for Web ARChive files. Following some excellent instructions, I ran the scala code from step 1 in my local install of spark shell and successfully extracted the site structure. The code needed to be modified slightly to work with the pathname of my data set, roughly

  • run Spark Shell with sufficient memory, I’m using 6 of my 8GB of RAM
  • run “:paste”
  • copy in scala code
  • hit “Control-D” to start the code analysing the data

I think that took around 20-30 minutes to run. The first time through, it crashed at the end as I’d left a couple of regular text files in the archive directory and the code sample didn’t handle those. Fair enough too, as it’s only sample code and not a full program with error detection and handling. I moved the text files out and ran it again. Second time through it finished happily.

The resultant file containing all the URLs and linkages was a total of 355kb, not bad for a starting data size if 68GB and provides something a little more manageable to play with. Next step is to load the file into Gephi which is an open source, data visualisation tool for networks and graphs. I still have little idea how to use gephi effectively and am mostly just pressing different buttons and playing with layouts to see how stuff works. I haven’t quite got to the point of making visually interesting displays like the one shown in the tutorial, however I have managed to create some really ugly art:

ugly data analysis

I hit the point a while back where it’s no longer sufficient to play with sample bits and pieces and I need to sit down and learn stuff properly. To that end I ordered a couple of books on Apache Spark, then ordered another book, Programming in Scala, and wondering whether I should also buy The Scala Cookbook. Or perhaps I shouldn’t try and do everything at once. I am reading both the Spark books concurrently as they’re aimed for different audiences and take different approaches. However after an initial spurt through the first couple of chapters, I haven’t touched them in a couple of weeks. I also need to learn how to use Gephi effectively and there’s a few tutorials available for doing that. I should explore other visualisation tools too as well and continue to look at what other sorts of tools can be used.

5 bits

Having started with 5 articles a week or so back, I thought it might be worth aiming for 5 articles each time. Was tempted to go with 7 but whittled it down to 5. This is a bunch of articles I’ve read and tweeted in the last week.

That’ll do.

shelf by shelf 12 – library stuff

Well some stuff, some bits and even a rock. A shiny rock. I can’t quite remember the origins of the rock though I’ve had it since I was a child. It’s not a particularly pretty rock but it does have lots of flecks of shiny, mirror-like bits. Maybe I found it in the bush, or at school or perhaps someone gave it to me. I’ve had it since I was very young.

At the far end of the shelf to the right, I have several years worth of the Australian Library Journal. I still need to add a couple of the recent editions to the shelf and I think the latest edition arrived in the last week. A few years ago I tried to track down a full set in print but didn’t have much luck. I always meant to hunt around some more but there always seems to be other books to pursue. Alongside the journals is a history of LIANZA I picked up at their annual conference a few years ago.

Also on the shelf is an old, old memory: Gareth Powell’sMy Friend Arnold’s Book of Personal Computers“. Many, many years ago in the 80s, Gareth Powell used to edit the computer section in the SMH. He’d actually been the travel editor or writer and somehow moved from there into computers. Lots of computer folk hated him as he was never sufficiently techie nor especially precise. I loved him as he knew how to write and was always throwing in cute affectations “…down in the potting shed I call my office..” and such. His approach was all about being accessible and interesting, and he was willing to take the piss out of himself. Around that time, he put out on a book on how to use computers for folk who didn’t know much about them…like his fictional friend Arnold. I picked up it a copy secondhand a few years ago.

My Friend Arnold’s Book of Personal Com
My Friend Arnold’s Book of Personal Comput
My Friend Arnold’s Book of Personal Compute

in 5 years time…

After yesterday’s 2005 post, I’ve been revisiting some of my old posts from the era. I didn’t use a blogging platform in those days and only started to when I moved to wordpress in 2007. The site started out as a single page template that I grabbed from a template site. Over time, I modified it substantially, moved columns, and eventually achieved a separation of style and content. I learnt stuff around html, css, and even a little xml. The rss feed was painfully handcoded for each post in xml. No easy generation, no scripts; just chunks of reusable code. Even that site was a relocation of an older site that was little more than a basic weblog with occasional commentary.

Brixton Tube Station
Brixton tube station

The handcoded version lasted from 2002 to 2007 which brings me to this post from April 2004 ie just over 10 years ago.  It was all about the old interview question of where do you see yourself in 5 years time? My answer at the time talked about how much my life had varied over the years and ultimately concluded

Grab opportunities when they arise but I’ll be buggered if I can think beyond that.

10 years later and that still rings true for me. Not long after that post I ended up in a place I never expected to be: working on vendor-side for a digital content provider. Spent 7 years there and had lots of fun, one of the best jobs I’ve ever had. I was one of those librarians who thought vendorland was the dark side and to be avoided at all costs. I no longer think that.

Detroit fire hydrant

These days, I’m on library-side once more, working at the State Library of NSW again. I wouldn’t have predicted 5 years ago that I’d be at SLNSW, and I certainly couldn’t have predicted working for a vendor at all. I’ve recently had to re-apply for an updated version of my position and was successful. Oddly, this was something of an affirming result. I’m still doing mostly the same sort of job, though there’s room for it to broaden in interesting directions. I have a bit of a sense I had when I initially got the position two years ago, that there were interesting things to be done, and new ways to go.

I still have no idea where I’ll be in 5 years time. As I said 10 years ago, I hope I’m still grabbing interesting opportunities as they arise.

snail i am

snail. A name I call myself…I’lll stop there lest I sing out loud.

I realised on Sunday that it’s probably been 25 years since I started calling myself snail, initially online and later offline. I think I started using it in 1989 and I’ve been online since I discovered electronic bulletin boards (Adventurer’s Realm, Viatel) in 1984…connecting via the wonderful tones of the 1200/75 baud modem that plugged into the cartridge port of my Commodore 64.

I tend to prefer being called “snail” in person though some prefer to use my real name “Sean”. At work, I’ve always defaulted to “Sean” but occasionally wonder whether it’s something I should, or could, change. I remain ever flexible.

snail and snailThere’s big restructures at work (State Library of NSW) and I have recently had to apply for my own position; or rather an updated version of my position. In happy news I have made it through and as of Friday, my job title will change from “Online & Licensing Librarian” to “Online Resources Specialist Librarian”. I first got this job 2 years ago and I s’pose it’s something of an affirming experience that I have been successful in retaining it.

I first worked as a contractor for SLNSW around 10-12 years ago and while I did and learnt lots, always felt I hadn’t quite got the hang of the place. I seem to be doing better this time round, some of the time at least. I used to travel around NSW training librarians in how to search online. In the intervening years, I was an electronic solutions consultant on the vendor side. Best job I ever had and I loved it much. 7 years was enough and a good time to move on. Prior to that I was part of the reference team at Bankstown Public Library and have also worked in the NSW Parliamentary Library and a big law firm.

I’ve been blogging on one platform or another (previous versions were handcoded) for more years than I can remember and my posting has become, #blogjune aside, increasingly erratic. That’s ok, it’s still my space; I have other spaces too.