beyond the stream

…or at least the mainstream. I read yet another article today about the decline of newspapers and particularly regional newspapers. Many regional newspapers are owned by larger groups and when the owner strikes problems and advertising revenue dries up, particularly at the moment, then papers get cut. This seems to lead to an increasing domination of the city papers which in turn results in a reduction in awareness of local issues and local connection ie the local newspaper is one part of the glue that connects folk together and gives them a shared space of sorts.

Libraries are another part of that glue, providing a welcoming space for all, free from commercial demands. It’s a place that’s not trying to move you on to make space for a paying customer, or sell stuff to you. Libraries are a mix of spaces: some quiet some noisy, places to meet, to relax, to read, to chat, to hang, even to snooze. They provide a community hub and remain one of the few free indoor spaces that people can gather and chat.

There are online hubs too, though predicated on the basis that the community has access to online material, the digital divide remains ever prevalent with some communities having better access than others. Once again, libraries may well be the only place that folk are able to use a computer, or access content online.

Over the years, there has been a rise in “pay it forward” groups on facebook for example in communities across Oz eg Port Macquarie, Inner West of Sydney, or Perth. These groups provide on one hand an opportunity for folk to clear out stuff, and on the other, an opportunity for folk to get things they need. A sharing space for advice and tips, increasing reuse and recycling.

I recall years ago, when a colleague and I ran a minecraft session as part of International Games Day, we didn’t get great numbers. A parent who turned up, commented that we should have promoted to some of the parenting groups on facebook. They’d only heard about the games day accidentally but were in a facebook group of several thousand parents in western Sydney. Sure enough, nationwide, there are millions of parents participating in such groups and finding folk to hang with.

In some respects, facebook groups remind me a little of usenet of old with a mix of general and specific. Some groups have strict rules for engagement and keeping on topic while others ebb and flow depending on where the commonality lies. The challenge with such groups is that facebook is a bit of a closed shop, you’ve got to be on it, with an account to see many of the groups, and participate. At the same time, it’s not quite like the AOL of old with that being the only platform, facebook groups tend toward a gated feel rather than closed though the latter exist too. They can be inclusive and exclusive.

bursts of inactivity

This was a comment elsewhere but I thought I might add it here as it’s a little bit meta and a little bit where I’m at.

Trampers on the Kepler Track

Where are we now? Some folk in the community are hitting a peak and I seem to be heading toward a trough, perhaps I am old…I am a decade or two older than quite a few that I am chatting to on twitter these days. I remember my uni days which stretched on forever…yes I was at uni for a decade or so. Every year or two, I needed to make new groups of friends to ensure I continued to have friends as others continued to graduate. I did finish eventually with a BA (Philosophy, History & Philosophy of Science) [and an unofficial major in Computer Science and a Master’s in Librarianship. So nerr…I finished and people didn’t really expect me to finish…professional student, years on the dole…yet here I am…a senior librarian at one of the top libraries in the country.

I am not a manager, I have no staff reporting to me. Somehow I keep finding interesting projects in odd nooks and crannies. Imbuing whatever job I’m doing with some extension of who I am. Allegedly, my primary role is to look after eresources, manage contracts and budgets, deal with suppliers…and stats for usage…always stats. Yet somehow I keep squeezing a little bit of me in…I do more tech stuff than most, I have managed to grab some tech support into my role…tech support seems to be a natural home of sorts.

Trampers on the Kepler TrackHowever, I manage to pull in other things..some years ago I was tasked with implementing a strategy to harvest web sites, which I did. I have, via my employer, been capturing  NSW government websites for several years. That’s several terabytes of data now and I continue to experiment with tools for exploring that content and looking at ways for making it publicly available. I’ve recently taken over the Library’s capturing of social media…so I’ve set up a working group to take some of the weight. Meanwhile I’m exploring policy and looking at what’s possible with other platforms.

I can see the shape of me developing…I turn 50 this year and am happy to say that I keep seeing endless possibilities, so many directions to head, so many things to try. At 50 I want to work forever, actually I think the government wants me to work forever too. However right now, I don’t want to stop. I want to keep pushing. I want to keep doing.

At 50 I have more hope than I did at 20. My horizon is larger.

Hmmm…this post has not been a regurgitation of my comments elsewhere…I might have to squeeze them in another day, or not, and continue ever on.

harvest testing

One of the difficulties with working with web harvests is that it is but one of several priorities and not even the key one. A lot of my job is focussed on managing the Library’s eresources collection, dealing with suppliers and looking after budgets. In addition I’ve been running the Library’s web harvesting programme for about three and a half years now. The main crawl of NSW government websites was set up originally by Archive-It and these days I have it scheduled to run twice a year. There are other smaller crawls that are run throughout the year.

However there’s never been a lot of time for exploring the harvested content in detail and ensuring we’re getting the material we think we are. We do run some testing by searching for specific content within the archive eg budget papers and check that it contains all relevant content including spreadsheets and documents. There’s only so much you can do to manually test when this particular archive is 3.5TB and contains around 74 million documents. The Archive-It software does provide some tools for checking crawl results and broadly indicating missed material.

However, as readers continue to explore the collection, they come across things where we haven’t fully captured the content we thought we had. A recent example is the Electoral Atlas of NSW 1856-2006 edited by Eamonn Clifford, Antony Green and David Clune. The State Library does hold it in print and until recently the digital content was hosted on the NSW Parliamentary website.

On initial inspection, it appeared that the content had been captured via the harvest both by SLNSW and the National Library (NLA). The NLA version doesn’t descend further while the NSW version does display the individual election results eg 1984:

Election details of the 1984 New South Wales state election

However, all the links in the 1984 Election Links section return a “Not in Archive” message, similarly for other years. In this example, there is some happy news in that the main wayback machine seems to have captured the site in full including those pages we’ve missed. The question that I need to explore and may need to ask Archive-It about, is why their crawl captured that information and our’s didn’t.

As a side note, I’ve found the Wayback browser plugin (Firefox, Chrome) rather useful for finding archived versions of pages that no longer exist on websites.

 

identifying data

Wednesday and time to respond to an identity challenge from Paul :-) 4 questions about me and computer gear I like and I suspect question 1 and question 4 are going to be the hard ones. As this is a personal space, I tend not to talk about my work, or at least not directly. My about page provides hints of past current jobs but that’s about it.

Who are you, and what do you do?

My name is snail. I use my real name at work though even there I’d prefer to use snail but all the systems are based around official names not nicknames. Sadly. Many folk know me as snail except security and the switchboard so turning up and asking for snail ain’t gonna work :-) I am the Online Resources Specialist Librarian at the State Library of NSW and I am responsible for working with eresources, dealing with vendors, contract management, budget management, EZproxy, eresource troubleshooting and support, eresource subscriptions and digital archive purchases…and stats…and more stats. I am the Library’s representative on the NSLA eResources Consortium. 3 years ago I implemented a project for whole of domain web harvesting of all government websites under *.nsw.gov.au and I’ve been running that ever since…I’ll be commencing the primary annual captures today. I may have been blogging about the web harvesting stuff recently :)

What hardware do you use?

At work, I have a basic laptop running Windows 7 plugged into a 24″ widescreen monitor, along with a Das Keyboard Professional 4 mechanical keyboard and a Logitech trackball. I have a Jabra bluetooth hub hooked up to the desk phone which is paired to my mobile hearing aid loop, enabling me to hear telephone calls through my hearing aids.

laptops, tablet, phone ereaderI have a personal laptop, 2013 11″ Sony Vaio running Windows 10, which I use occasionally at work for external testing. At home, I have a mac mini connected to a 24″ widescreen monitor, with a Logitech G610 mechanical keyboard and a Logitech trackball. Behind the scenes I’m running a home server on a 4 bay QNAP TS-421 in RAID 5: each drive is 3TB for a total of 12TB which I’m primarily using it for backing all my machines, running my itunes server, and photo archive. I have a 7″ Nexus (2013) tablet, a Samsung galaxy s5 phone, and a Sony PRS-T2 ereader. Even a Psion 5mx that still works! I have several old keyboards too, assorted external hard drives and lots of USB sticks. :-)

And what software?

30533574640_5de8d36502_nThe machine at work is on Windows 7 and has just migrated to Office 365. The personal laptop is running Windows 10 and tends to run Open Office variants, has a virtualbox running Linux Mint, and a few other odds and ends. The mac mini is running whatever is the current MacOS and the phone and tablet are running android. I’ve never been much good at this single operating environment malarkey :-) Some of my favourite software includes:

and more browser variants than I care to count including lynx.

What would be your dream setup?

I wish all my devices would talk better to each other, a universal standard for talking across different machines, operating systems and so on. More speed, more bandwidth and greater customisation options. I like things to look pretty, both the hardware and the software, and I don’t like it when fab looking customisations break things. I like working from home but like working near colleagues too and some way of merging the two environments would be fab. I want better ears to hear conversations and chit-chat.

why nls8?

In a few days time, I’ll pop into the car and drive down to Canberra for the 8th-ish New Librarians’ Symposium – I say “ish” as I recall there was at least a 1.5, and I don’t remember if there were other in-between events. I’d like to link to some of the earlier NLS websites but ALIA’s own conference page only links back to 2008, ignoring the earlier iterations of NLS, and even the ones listed are not available because ALIA are upgrading their conference website though I don’t really understand why “upgrade” means removing access altogether. Thankfully, I’ve found the NLS2006 site on the wayback machine, along with the 2004, and even the first in 2002.

It feels a bit odd going to NLS as I am very definitely not a new librarian by a long shot. I’m probably what is termed a mid-career professional which doesn’t sit well either as I’ve never been career or goal focused mostly just wanting to work with interesting people and occasionally do fun things. To be honest, mostly just wanting to work. I s’pose one could argue that I’m going to mentor newer members of the profession but that would be nonsense as I’ve never been much of a mentor-type. With that said, I remember one of the concerns in the early days was about ensuring there was a continuity of contact between different parts of the profession and avoid that sense of cliques developing. I want to make sure I don’t end up in a clique myself and want to get to know people outside my usual circles. I’m also going because it’s always been a bloody good conference, with a good sense of engagement, a welcoming attitude and lots of fun.

So yeah, all my reasons for going are totes selfish and all about me :-)

Menu for conference dinner, NLS2006My first NLS was in Adelaide in 2004 and I have found some of my thoughts on my previous blog iteration. I recall being blown away by it and made lots of new friends in the profession many of whom I’m still in touch with. Alan Smith, State Librarian of SA, spoke on the importance of thinking two jobs ahead and working out what you need to do in-between to get there. I’ve tried to apply that thinking but keep failing and still have no idea what I want to do next, nevermind after that. Post NLS3, I ended up on the committee for the next version,  NLS2006 (we chose to use the year rather than number), 2 years later; it seemed to go pretty well and was a total blast.

I made it to one or two NLS since and I missed a few as life stuff intruded. I think the last one I attended was in Perth…which I may have gatecrashed :) I’ve been on organising committees for a few library camps and unconferences too though I don’t think I’ve been on a full blown conference committee since NLS2006. Camps/unconferences are reasonably easy to organise, however something like NLS takes 2 years of commitment to make it happen. It is a rewarding experience and I have no regrets, likewise I applaud the efforts of the NLS8 committee in making it happen.

using big data to create bad art

A few weeks back, I installed a lot of software on my computer at home with the plan to work out what to do with large data sets, particularly web archives. One of my roles at work is being responsible for managing and running the Library’s web archiving strategy and regularly harvesting publicly available government websites. That’s all fun and good but you end up with a lot of data and I think there’s close to 5TB in the collection now. The next tricks revolve around what you can use the data for and what sorts of data are worthwhile to make accessible. Under my current, non NBN, download speeds I estimate it would take a few months to download 5TB of data assuming a steady connection.

The dataset I’m using currently is a cohesive collection of publicly available websites containing approximately 68GB of data in 61 files. Each file is a compressed WARC file, WARC being the standard for Web ARChive files. Following some excellent instructions, I ran the scala code from step 1 in my local install of spark shell and successfully extracted the site structure. The code needed to be modified slightly to work with the pathname of my data set, roughly

  • run Spark Shell with sufficient memory, I’m using 6 of my 8GB of RAM
  • run “:paste”
  • copy in scala code
  • hit “Control-D” to start the code analysing the data

I think that took around 20-30 minutes to run. The first time through, it crashed at the end as I’d left a couple of regular text files in the archive directory and the code sample didn’t handle those. Fair enough too, as it’s only sample code and not a full program with error detection and handling. I moved the text files out and ran it again. Second time through it finished happily.

The resultant file containing all the URLs and linkages was a total of 355kb, not bad for a starting data size if 68GB and provides something a little more manageable to play with. Next step is to load the file into Gephi which is an open source, data visualisation tool for networks and graphs. I still have little idea how to use gephi effectively and am mostly just pressing different buttons and playing with layouts to see how stuff works. I haven’t quite got to the point of making visually interesting displays like the one shown in the tutorial, however I have managed to create some really ugly art:

ugly data analysis

I hit the point a while back where it’s no longer sufficient to play with sample bits and pieces and I need to sit down and learn stuff properly. To that end I ordered a couple of books on Apache Spark, then ordered another book, Programming in Scala, and wondering whether I should also buy The Scala Cookbook. Or perhaps I shouldn’t try and do everything at once. I am reading both the Spark books concurrently as they’re aimed for different audiences and take different approaches. However after an initial spurt through the first couple of chapters, I haven’t touched them in a couple of weeks. I also need to learn how to use Gephi effectively and there’s a few tutorials available for doing that. I should explore other visualisation tools too as well and continue to look at what other sorts of tools can be used.

5 bits

Having started with 5 articles a week or so back, I thought it might be worth aiming for 5 articles each time. Was tempted to go with 7 but whittled it down to 5. This is a bunch of articles I’ve read and tweeted in the last week.

That’ll do.

shelf by shelf 12 – library stuff

Well some stuff, some bits and even a rock. A shiny rock. I can’t quite remember the origins of the rock though I’ve had it since I was a child. It’s not a particularly pretty rock but it does have lots of flecks of shiny, mirror-like bits. Maybe I found it in the bush, or at school or perhaps someone gave it to me. I’ve had it since I was very young.

At the far end of the shelf to the right, I have several years worth of the Australian Library Journal. I still need to add a couple of the recent editions to the shelf and I think the latest edition arrived in the last week. A few years ago I tried to track down a full set in print but didn’t have much luck. I always meant to hunt around some more but there always seems to be other books to pursue. Alongside the journals is a history of LIANZA I picked up at their annual conference a few years ago.

Also on the shelf is an old, old memory: Gareth Powell’sMy Friend Arnold’s Book of Personal Computers“. Many, many years ago in the 80s, Gareth Powell used to edit the computer section in the SMH. He’d actually been the travel editor or writer and somehow moved from there into computers. Lots of computer folk hated him as he was never sufficiently techie nor especially precise. I loved him as he knew how to write and was always throwing in cute affectations “…down in the potting shed I call my office..” and such. His approach was all about being accessible and interesting, and he was willing to take the piss out of himself. Around that time, he put out on a book on how to use computers for folk who didn’t know much about them…like his fictional friend Arnold. I picked up it a copy secondhand a few years ago.

My Friend Arnold’s Book of Personal Com
My Friend Arnold’s Book of Personal Comput
My Friend Arnold’s Book of Personal Compute

in 5 years time…

After yesterday’s 2005 post, I’ve been revisiting some of my old posts from the era. I didn’t use a blogging platform in those days and only started to when I moved to wordpress in 2007. The site started out as a single page template that I grabbed from a template site. Over time, I modified it substantially, moved columns, and eventually achieved a separation of style and content. I learnt stuff around html, css, and even a little xml. The rss feed was painfully handcoded for each post in xml. No easy generation, no scripts; just chunks of reusable code. Even that site was a relocation of an older site that was little more than a basic weblog with occasional commentary.

Brixton Tube Station
Brixton tube station

The handcoded version lasted from 2002 to 2007 which brings me to this post from April 2004 ie just over 10 years ago.  It was all about the old interview question of where do you see yourself in 5 years time? My answer at the time talked about how much my life had varied over the years and ultimately concluded

Grab opportunities when they arise but I’ll be buggered if I can think beyond that.

10 years later and that still rings true for me. Not long after that post I ended up in a place I never expected to be: working on vendor-side for a digital content provider. Spent 7 years there and had lots of fun, one of the best jobs I’ve ever had. I was one of those librarians who thought vendorland was the dark side and to be avoided at all costs. I no longer think that.

Detroit fire hydrant

These days, I’m on library-side once more, working at the State Library of NSW again. I wouldn’t have predicted 5 years ago that I’d be at SLNSW, and I certainly couldn’t have predicted working for a vendor at all. I’ve recently had to re-apply for an updated version of my position and was successful. Oddly, this was something of an affirming result. I’m still doing mostly the same sort of job, though there’s room for it to broaden in interesting directions. I have a bit of a sense I had when I initially got the position two years ago, that there were interesting things to be done, and new ways to go.

I still have no idea where I’ll be in 5 years time. As I said 10 years ago, I hope I’m still grabbing interesting opportunities as they arise.

snail i am

snail. A name I call myself…I’lll stop there lest I sing out loud.

I realised on Sunday that it’s probably been 25 years since I started calling myself snail, initially online and later offline. I think I started using it in 1989 and I’ve been online since I discovered electronic bulletin boards (Adventurer’s Realm, Viatel) in 1984…connecting via the wonderful tones of the 1200/75 baud modem that plugged into the cartridge port of my Commodore 64.

I tend to prefer being called “snail” in person though some prefer to use my real name “Sean”. At work, I’ve always defaulted to “Sean” but occasionally wonder whether it’s something I should, or could, change. I remain ever flexible.

snail and snailThere’s big restructures at work (State Library of NSW) and I have recently had to apply for my own position; or rather an updated version of my position. In happy news I have made it through and as of Friday, my job title will change from “Online & Licensing Librarian” to “Online Resources Specialist Librarian”. I first got this job 2 years ago and I s’pose it’s something of an affirming experience that I have been successful in retaining it.

I first worked as a contractor for SLNSW around 10-12 years ago and while I did and learnt lots, always felt I hadn’t quite got the hang of the place. I seem to be doing better this time round, some of the time at least. I used to travel around NSW training librarians in how to search online. In the intervening years, I was an electronic solutions consultant on the vendor side. Best job I ever had and I loved it much. 7 years was enough and a good time to move on. Prior to that I was part of the reference team at Bankstown Public Library and have also worked in the NSW Parliamentary Library and a big law firm.

I’ve been blogging on one platform or another (previous versions were handcoded) for more years than I can remember and my posting has become, #blogjune aside, increasingly erratic. That’s ok, it’s still my space; I have other spaces too.