a question of title

A week ago, Con sent me a bunch of questions to answer as there was talk at the time of interviewing folk and posting answers with each person interviewing another person. That seems to have segued into Kathryn’s list of questions which folk can nominate dates and questions and so on. I answered all of Con’s questions in an email back to her, noting that one or two may need essays :) Anyways, the plan is now to post them on my blog and I think Con and Kathryn for the inspiration. A curious starting point for other conversations.

I commented yesterday that I had had 3 job titles in 10 years and they are:

  • Online & Licensing Librarian
  • Online Resources Specialist Librarian
  • Senior Librarian, Online Resources

Two words remain throughout: “Online” and “Librarian” which works for me rather well. All titles are reasonably accurate and all I deal with is online content: selection, access, and collecting. “Licensing” is a tricky one as I need to read, assess and recommend licensing agreements from suppliers for the content that we have. I am not a lawyer nor do I play one on TV. The only thing I play on TV is computer games. Yet I read and comment on license agreements. NSLA provides guidelines and it is valuable to read older decisions and ensure that licenses enable and support access to things like:

  • document delivery
  • walk-in users
  • remote access

Licenses set out the needs and obligations to ensure all parties are respected.

My most recent job title change is a minor one, the reintroduction of “Senior” and the removal of “Specialist” – the latter is being repurposed for librarians a grade below.

…and on we go

Time for things is a needed thing. Me making the time or finding time to engage is important. Following my post the other day, I belatedly remembered there are a bunch of folk who continue to blog and engage and do stuff. That would be the new cardigan/ausglam crowd, full of many interesting people with whom I keep not getting round to engaging with much though I am loosely listed with them. Based in Victoria but not limited to Victoria.

They even blog regularly and unite around a monthly topic which seems like an easy way to ensure you blog at least monthly. June’s topic is Exuberance though I’m not feeling it at the moment; neither youthful nor exuberant. Might try and write something this month. I have found myself referenced in the monthly summary when my thoughts align with the topic of the day, most recently in February and April.

A day or two on and Ruth’s quote on the value of time still resonates much.

i am a dodgy librarian

I don’t read academic papers.

There I’ve said it. For work stuff, it’s more fun doing than reading. Yet with so many things, it is the reading that is the fun part. I go to conferences and listen to folk talk about their papers but rarely read the paper itself. When I hit a thing at work that needs more thought, I’ll check online, do a google, find key blogs or forums in that area.

I don’t even use google scholar.

I’m not active in academic circles. I’m not part of the conversation nor the production around academic papers. I sometimes skim a paper looking for something specific or a key outcome.

Perhaps I am not sufficiently reflective of the work I do. I have access to lots of academic papers via my library. I do engage here and there, usually more in a community of interest sort of way. Conference papers are almost like a calling card of finding people of interest or relevance. I am curious about their insights and the stuff around their papers and what they do.

Perhaps there is some sort of middle career lethargy going on. I’m conscious that I need to share more of what I’m doing but a blog post means I can get to the nitty gritty quicker, academic papers require more work, bigger hurdles. Hmmm…the same could be said of reading…blogs you can get straight into, but academic papers demand more, have more hurdles. Blogs can be conversational and papers formal.

Perhaps I find formal environments challenging, of which academic papers are but one part. I like piecemeal, adhoc, adlib, playing with bits and pieces. Formal papers seem like another sort of space.

While writing this, I read a comment of Ruth’s on another post:

“Perhaps the value is in the dedicated time to share one’s thoughts, rather than the medium.”

That is a sentiment that I like a lot.

beyond the stream

…or at least the mainstream. I read yet another article today about the decline of newspapers and particularly regional newspapers. Many regional newspapers are owned by larger groups and when the owner strikes problems and advertising revenue dries up, particularly at the moment, then papers get cut. This seems to lead to an increasing domination of the city papers which in turn results in a reduction in awareness of local issues and local connection ie the local newspaper is one part of the glue that connects folk together and gives them a shared space of sorts.

Libraries are another part of that glue, providing a welcoming space for all, free from commercial demands. It’s a place that’s not trying to move you on to make space for a paying customer, or sell stuff to you. Libraries are a mix of spaces: some quiet some noisy, places to meet, to relax, to read, to chat, to hang, even to snooze. They provide a community hub and remain one of the few free indoor spaces that people can gather and chat.

There are online hubs too, though predicated on the basis that the community has access to online material, the digital divide remains ever prevalent with some communities having better access than others. Once again, libraries may well be the only place that folk are able to use a computer, or access content online.

Over the years, there has been a rise in “pay it forward” groups on facebook for example in communities across Oz eg Port Macquarie, Inner West of Sydney, or Perth. These groups provide on one hand an opportunity for folk to clear out stuff, and on the other, an opportunity for folk to get things they need. A sharing space for advice and tips, increasing reuse and recycling.

I recall years ago, when a colleague and I ran a minecraft session as part of International Games Day, we didn’t get great numbers. A parent who turned up, commented that we should have promoted to some of the parenting groups on facebook. They’d only heard about the games day accidentally but were in a facebook group of several thousand parents in western Sydney. Sure enough, nationwide, there are millions of parents participating in such groups and finding folk to hang with.

In some respects, facebook groups remind me a little of usenet of old with a mix of general and specific. Some groups have strict rules for engagement and keeping on topic while others ebb and flow depending on where the commonality lies. The challenge with such groups is that facebook is a bit of a closed shop, you’ve got to be on it, with an account to see many of the groups, and participate. At the same time, it’s not quite like the AOL of old with that being the only platform, facebook groups tend toward a gated feel rather than closed though the latter exist too. They can be inclusive and exclusive.

bursts of inactivity

This was a comment elsewhere but I thought I might add it here as it’s a little bit meta and a little bit where I’m at.

Trampers on the Kepler Track

Where are we now? Some folk in the community are hitting a peak and I seem to be heading toward a trough, perhaps I am old…I am a decade or two older than quite a few that I am chatting to on twitter these days. I remember my uni days which stretched on forever…yes I was at uni for a decade or so. Every year or two, I needed to make new groups of friends to ensure I continued to have friends as others continued to graduate. I did finish eventually with a BA (Philosophy, History & Philosophy of Science) [and an unofficial major in Computer Science and a Master’s in Librarianship. So nerr…I finished and people didn’t really expect me to finish…professional student, years on the dole…yet here I am…a senior librarian at one of the top libraries in the country.

I am not a manager, I have no staff reporting to me. Somehow I keep finding interesting projects in odd nooks and crannies. Imbuing whatever job I’m doing with some extension of who I am. Allegedly, my primary role is to look after eresources, manage contracts and budgets, deal with suppliers…and stats for usage…always stats. Yet somehow I keep squeezing a little bit of me in…I do more tech stuff than most, I have managed to grab some tech support into my role…tech support seems to be a natural home of sorts.

Trampers on the Kepler TrackHowever, I manage to pull in other things..some years ago I was tasked with implementing a strategy to harvest web sites, which I did. I have, via my employer, been capturing  NSW government websites for several years. That’s several terabytes of data now and I continue to experiment with tools for exploring that content and looking at ways for making it publicly available. I’ve recently taken over the Library’s capturing of social media…so I’ve set up a working group to take some of the weight. Meanwhile I’m exploring policy and looking at what’s possible with other platforms.

I can see the shape of me developing…I turn 50 this year and am happy to say that I keep seeing endless possibilities, so many directions to head, so many things to try. At 50 I want to work forever, actually I think the government wants me to work forever too. However right now, I don’t want to stop. I want to keep pushing. I want to keep doing.

At 50 I have more hope than I did at 20. My horizon is larger.

Hmmm…this post has not been a regurgitation of my comments elsewhere…I might have to squeeze them in another day, or not, and continue ever on.

harvest testing

One of the difficulties with working with web harvests is that it is but one of several priorities and not even the key one. A lot of my job is focussed on managing the Library’s eresources collection, dealing with suppliers and looking after budgets. In addition I’ve been running the Library’s web harvesting programme for about three and a half years now. The main crawl of NSW government websites was set up originally by Archive-It and these days I have it scheduled to run twice a year. There are other smaller crawls that are run throughout the year.

However there’s never been a lot of time for exploring the harvested content in detail and ensuring we’re getting the material we think we are. We do run some testing by searching for specific content within the archive eg budget papers and check that it contains all relevant content including spreadsheets and documents. There’s only so much you can do to manually test when this particular archive is 3.5TB and contains around 74 million documents. The Archive-It software does provide some tools for checking crawl results and broadly indicating missed material.

However, as readers continue to explore the collection, they come across things where we haven’t fully captured the content we thought we had. A recent example is the Electoral Atlas of NSW 1856-2006 edited by Eamonn Clifford, Antony Green and David Clune. The State Library does hold it in print and until recently the digital content was hosted on the NSW Parliamentary website.

On initial inspection, it appeared that the content had been captured via the harvest both by SLNSW and the National Library (NLA). The NLA version doesn’t descend further while the NSW version does display the individual election results eg 1984:

Election details of the 1984 New South Wales state election

However, all the links in the 1984 Election Links section return a “Not in Archive” message, similarly for other years. In this example, there is some happy news in that the main wayback machine seems to have captured the site in full including those pages we’ve missed. The question that I need to explore and may need to ask Archive-It about, is why their crawl captured that information and our’s didn’t.

As a side note, I’ve found the Wayback browser plugin (Firefox, Chrome) rather useful for finding archived versions of pages that no longer exist on websites.

 

identifying data

Wednesday and time to respond to an identity challenge from Paul :-) 4 questions about me and computer gear I like and I suspect question 1 and question 4 are going to be the hard ones. As this is a personal space, I tend not to talk about my work, or at least not directly. My about page provides hints of past current jobs but that’s about it.

Who are you, and what do you do?

My name is snail. I use my real name at work though even there I’d prefer to use snail but all the systems are based around official names not nicknames. Sadly. Many folk know me as snail except security and the switchboard so turning up and asking for snail ain’t gonna work :-) I am the Online Resources Specialist Librarian at the State Library of NSW and I am responsible for working with eresources, dealing with vendors, contract management, budget management, EZproxy, eresource troubleshooting and support, eresource subscriptions and digital archive purchases…and stats…and more stats. I am the Library’s representative on the NSLA eResources Consortium. 3 years ago I implemented a project for whole of domain web harvesting of all government websites under *.nsw.gov.au and I’ve been running that ever since…I’ll be commencing the primary annual captures today. I may have been blogging about the web harvesting stuff recently :)

What hardware do you use?

At work, I have a basic laptop running Windows 7 plugged into a 24″ widescreen monitor, along with a Das Keyboard Professional 4 mechanical keyboard and a Logitech trackball. I have a Jabra bluetooth hub hooked up to the desk phone which is paired to my mobile hearing aid loop, enabling me to hear telephone calls through my hearing aids.

laptops, tablet, phone ereaderI have a personal laptop, 2013 11″ Sony Vaio running Windows 10, which I use occasionally at work for external testing. At home, I have a mac mini connected to a 24″ widescreen monitor, with a Logitech G610 mechanical keyboard and a Logitech trackball. Behind the scenes I’m running a home server on a 4 bay QNAP TS-421 in RAID 5: each drive is 3TB for a total of 12TB which I’m primarily using it for backing all my machines, running my itunes server, and photo archive. I have a 7″ Nexus (2013) tablet, a Samsung galaxy s5 phone, and a Sony PRS-T2 ereader. Even a Psion 5mx that still works! I have several old keyboards too, assorted external hard drives and lots of USB sticks. :-)

And what software?

30533574640_5de8d36502_nThe machine at work is on Windows 7 and has just migrated to Office 365. The personal laptop is running Windows 10 and tends to run Open Office variants, has a virtualbox running Linux Mint, and a few other odds and ends. The mac mini is running whatever is the current MacOS and the phone and tablet are running android. I’ve never been much good at this single operating environment malarkey :-) Some of my favourite software includes:

and more browser variants than I care to count including lynx.

What would be your dream setup?

I wish all my devices would talk better to each other, a universal standard for talking across different machines, operating systems and so on. More speed, more bandwidth and greater customisation options. I like things to look pretty, both the hardware and the software, and I don’t like it when fab looking customisations break things. I like working from home but like working near colleagues too and some way of merging the two environments would be fab. I want better ears to hear conversations and chit-chat.

why nls8?

In a few days time, I’ll pop into the car and drive down to Canberra for the 8th-ish New Librarians’ Symposium – I say “ish” as I recall there was at least a 1.5, and I don’t remember if there were other in-between events. I’d like to link to some of the earlier NLS websites but ALIA’s own conference page only links back to 2008, ignoring the earlier iterations of NLS, and even the ones listed are not available because ALIA are upgrading their conference website though I don’t really understand why “upgrade” means removing access altogether. Thankfully, I’ve found the NLS2006 site on the wayback machine, along with the 2004, and even the first in 2002.

It feels a bit odd going to NLS as I am very definitely not a new librarian by a long shot. I’m probably what is termed a mid-career professional which doesn’t sit well either as I’ve never been career or goal focused mostly just wanting to work with interesting people and occasionally do fun things. To be honest, mostly just wanting to work. I s’pose one could argue that I’m going to mentor newer members of the profession but that would be nonsense as I’ve never been much of a mentor-type. With that said, I remember one of the concerns in the early days was about ensuring there was a continuity of contact between different parts of the profession and avoid that sense of cliques developing. I want to make sure I don’t end up in a clique myself and want to get to know people outside my usual circles. I’m also going because it’s always been a bloody good conference, with a good sense of engagement, a welcoming attitude and lots of fun.

So yeah, all my reasons for going are totes selfish and all about me :-)

Menu for conference dinner, NLS2006My first NLS was in Adelaide in 2004 and I have found some of my thoughts on my previous blog iteration. I recall being blown away by it and made lots of new friends in the profession many of whom I’m still in touch with. Alan Smith, State Librarian of SA, spoke on the importance of thinking two jobs ahead and working out what you need to do in-between to get there. I’ve tried to apply that thinking but keep failing and still have no idea what I want to do next, nevermind after that. Post NLS3, I ended up on the committee for the next version,  NLS2006 (we chose to use the year rather than number), 2 years later; it seemed to go pretty well and was a total blast.

I made it to one or two NLS since and I missed a few as life stuff intruded. I think the last one I attended was in Perth…which I may have gatecrashed :) I’ve been on organising committees for a few library camps and unconferences too though I don’t think I’ve been on a full blown conference committee since NLS2006. Camps/unconferences are reasonably easy to organise, however something like NLS takes 2 years of commitment to make it happen. It is a rewarding experience and I have no regrets, likewise I applaud the efforts of the NLS8 committee in making it happen.

using big data to create bad art

A few weeks back, I installed a lot of software on my computer at home with the plan to work out what to do with large data sets, particularly web archives. One of my roles at work is being responsible for managing and running the Library’s web archiving strategy and regularly harvesting publicly available government websites. That’s all fun and good but you end up with a lot of data and I think there’s close to 5TB in the collection now. The next tricks revolve around what you can use the data for and what sorts of data are worthwhile to make accessible. Under my current, non NBN, download speeds I estimate it would take a few months to download 5TB of data assuming a steady connection.

The dataset I’m using currently is a cohesive collection of publicly available websites containing approximately 68GB of data in 61 files. Each file is a compressed WARC file, WARC being the standard for Web ARChive files. Following some excellent instructions, I ran the scala code from step 1 in my local install of spark shell and successfully extracted the site structure. The code needed to be modified slightly to work with the pathname of my data set, roughly

  • run Spark Shell with sufficient memory, I’m using 6 of my 8GB of RAM
  • run “:paste”
  • copy in scala code
  • hit “Control-D” to start the code analysing the data

I think that took around 20-30 minutes to run. The first time through, it crashed at the end as I’d left a couple of regular text files in the archive directory and the code sample didn’t handle those. Fair enough too, as it’s only sample code and not a full program with error detection and handling. I moved the text files out and ran it again. Second time through it finished happily.

The resultant file containing all the URLs and linkages was a total of 355kb, not bad for a starting data size if 68GB and provides something a little more manageable to play with. Next step is to load the file into Gephi which is an open source, data visualisation tool for networks and graphs. I still have little idea how to use gephi effectively and am mostly just pressing different buttons and playing with layouts to see how stuff works. I haven’t quite got to the point of making visually interesting displays like the one shown in the tutorial, however I have managed to create some really ugly art:

ugly data analysis

I hit the point a while back where it’s no longer sufficient to play with sample bits and pieces and I need to sit down and learn stuff properly. To that end I ordered a couple of books on Apache Spark, then ordered another book, Programming in Scala, and wondering whether I should also buy The Scala Cookbook. Or perhaps I shouldn’t try and do everything at once. I am reading both the Spark books concurrently as they’re aimed for different audiences and take different approaches. However after an initial spurt through the first couple of chapters, I haven’t touched them in a couple of weeks. I also need to learn how to use Gephi effectively and there’s a few tutorials available for doing that. I should explore other visualisation tools too as well and continue to look at what other sorts of tools can be used.

5 bits

Having started with 5 articles a week or so back, I thought it might be worth aiming for 5 articles each time. Was tempted to go with 7 but whittled it down to 5. This is a bunch of articles I’ve read and tweeted in the last week.

That’ll do.