bits of online history

I discovered today that tucows is retiring/disappearing/perhaps mostly gone. Admittedly I don’t think I’ve used it since my windows XP days. Was a fab source for shareware/freeware utilities to enhance XP. There used to be another site, now long gone, called DownloadSquad that would have regular reviews/announcements of new software and I’d usually try out something new every other week. Over time, some of those things have been incorporated into operating systems.

I remember there used to be a tool so that I’d hit the space and start typing to launch documents and software. These days, that’s built into windows via the windows key and windows indexing has improved lots. There are still things I like to install on new systems like cygwin and text adventure interpreters but a lot less than I used to.

Meanwhile I came across a timeline of web browsers dating back to the early 90s and of course my old favourite text only browser, lynx, is still kicking about – I usually have it installed as part of my cygwin setup. The downside of using a text viewer to browse webpages is that you usually have to scroll through a bunch of pages to get to the content as per below:

Sydney Morning Herald via lynx

There’s been a European case around geoblocking game purchases ie forcing folk to buy games in their country rather than from whatever country they can get it from cheaper. It’s an interesting result as gaming isn’t the only area that’s done this sort of thing, books being another good case. I remember when I started buying via amazon how much cheaper the same edition of a book was that way than locally. These days, I don’t buy much from Amazon tending to either buy from local distributors like Booktopia or direct from publishers.

In other news, the Alta-Vista URL still works but redirects to Yahoo…which still works.

sff sorta

Yesterday was the official start of the Sydney filmfest…online version. Wednesday night is also the night of our weekly online trivia with friends so we decided to stick with that and watch some shorts afterwards. Famous last words.

I was curious how it was going to work and the FAQ supplied by the fest was a little light on detail. I initially tried casting from my new laptop to the TV and could not find the TV on the network. Then tried running it on my android phone and still couldn’t cast and the phone browser for the SFF site wanted me to download the SHIFT72 app which I did. Except that the SHIFT72 app didn’t have any SFF content and SFF have confirmed this morning they’re not using SHIFT72 to deliver festival content.

We then tried using the built in browser on the TV itself (the TV is running full android) without success and was again asked to download the SHIFT72 app. Last resort was to try a cabled HDMI connection from my laptop to the TV. Of course my new laptop only supports USB-C so I dug out my old laptop (which mostly works) and used its HDMI connection. Took a bit for the TV to recognise it but we got there in the end and were able to watch a couple of good shorts.

Failure points: me trying to troubleshoot tech issues after a whisky or two meant I got increasingly frazzled. Thinking about it in the light of day, I have a vague recollection that I need to switch on casting on the TV itself prior to casting from the laptop. Will test that this evening. A further thought this morning re USB-C, I do have a USB-C to USB-A converter which means I potentially could plug the new laptop into the USB-A port on the TV. I may even have a USB-A to HDMI converter, if so that could work too though curious as to the effect of having two chained converters.

Troubleshooting network stuff is a little tricky at the moment as I’m running wifi channels on both the old router and the new Orbi mesh setup. The TV is on the old, the laptops are on the new which shouldn’t be a problem as other things work. The new setup is piped off the old router and the plan will be to switch off wifi broadcasting on the old one eventually. I need to keep the old router in play as the Orbi gear doesn’t have a plug option for landline telephones and the old router does.

a little order

When lockdown commenced, my workload increased and for a few weeks there was a lot more email too. Things did settle down to a normal of sorts eventually. I am fortunate that my entire job can be done online and that I have worked from home before. My last couple of years at Gale was spent exclusively working from home though interspersed with visits to the Melbourne office and libraries around Australia and New Zealand. This time round, there are no visits anywhere…other than to the takeaway up the road and brekky on the weekend.

My days have been much the same minus the commute, in fact the commute has transitioned to an extra hour sleeping in. Curiously I am starting work more refreshed both from the increase in sleep and the lack of travelling. This will be hard to give up though I have experimented in the past with going to bed earlier. One thing I found was that regardless of what time I started, I still tended to finish around the same time at the end of the day.

Finally moving on from milk crates. Cables are somewhat sorted and accessible.

This does mean I haven’t done much beyond what I normally do. I’ve not commenced writing the great Australian novel, nor learnt a new language; I’ve not taken up further study nor undertaken any large projects. Thanks to Ms19 I have taken up roller skating again and I have been walking occasionally.

One project I did take on was organising all my cables and this was a task I’ve been putting off for some years. They’ve primarily been kept in an overflowing milk crate, maybe even some in a second crate. Several years ago I did go through and spool each cable tying them off either with a wire tie or rubber bands. Sadly the rubber bands have not stood the test of time and in many cases were a nasty, sticky mess to clean up. An initial task was to replace all the rubber bands with new ties and clean them. I did get rid of a few cables though I suspect the household had hoped, somewhat optimistically, that I would get rid of a lot more :-)

I ordered a deck of coloured drawers from Officeworks and separated my cables into 5 loose categories: ethernet, power, audio, USB (various types), hardware (including my old Eees). There were a few other odds and ends that ended up in the top couple of drawers eg a couple of HDMI cables have ended up in the top drawer with the ethernet cables as that’s tidier than the drawer with the different types and sizes of audio cables which includes speaker cables, various playstation cables, and sound systems I don’t remember. There’s even several metres of telephone cable which I originally used to run a dial up connection in a terrace house in Newtown with the cable running from the phone port in the loungeroom, up the stairs and through a door into a study space. I suspect I could possibly get rid of that cable at least though remain reluctant…just in case.

zip is dead…really this time

It finally happened. Zip, my old, old ISP, is well and truly dead; connections started failing early June and none of the URLs work anymore. Email doesn’t work and no matter what URL variations I use I can no longer reach the old blog. They announced they were killing it off a year or so back and I haven’t been charged since but some things continued to work. No more.

cropped-5278187613_df96b4a56b_b.jpgThankfully I still have a couple of backups of my offline development environment which includes a full copy of the blog and the wayback machine has grabbed a copy too. Sadly, this post of mine from 2010 still needs to be done. Perhaps while film remains on my mind I could at least migrate over all my movie ratings. Or radically, add ratings for the films I’ve seen this year.

New things to do now include updating the email profiles on a couple of my devices to remove zip altogether. That should stop the error messages I get every time I start up. Proving yet again I prefer shiny and pretty over any work of depth, I have instead updated the theme and changed the banner pic :-)

zip is gone…almost

For many, many years…decades even, my main email/ISP etc was hosted on an outfit called Zip, or even zipworld. It was progressively swallowed up by larger and companies, till in 2015 it ended up with Telstra. Telstra recently announced that they were shutting down the smaller networks though I could seek an account with them if I liked.

a shipping crane by the waterAdmittedly, the last few years I’ve been maintaining my zip account primarily as an email forwarder for sending/receiving email. At home, my partner has connectivity with another provider. My old website no longer works though I do have full backups (on my PC, external hard drive, and NAS), plus you can find it on the wayback machine.

Update: it’s not dead yet. Curiously, if I use “my.zipworld.com.au” instead of “www.zipworld.com.au”, my old site is still accessible :-) Of course, all the links I have that point to it are broken.

My primary email address (not zip) was pointing to my zip account now points to my gmail account. My old zip account is mostly used by a couple of elists, the odd family missive, and a lot of spam. The mail server hasn’t died yet though I expect that will happen one day but I’m still successfully using it to send email…and spammers continue to use it successfully to send me email.

Anyways, I am a little sad to say goodbye to dear old zip. The big advantage in the early days was the work they did in maintaining a local usenet server and it was why I signed up in the first place. Of course, it’s been a long since I used usenet either. Usenet was replaced by other things, and eventually there was twitter and facebook, which picked up some sense of community that I was missing.

harvest testing

One of the difficulties with working with web harvests is that it is but one of several priorities and not even the key one. A lot of my job is focussed on managing the Library’s eresources collection, dealing with suppliers and looking after budgets. In addition I’ve been running the Library’s web harvesting programme for about three and a half years now. The main crawl of NSW government websites was set up originally by Archive-It and these days I have it scheduled to run twice a year. There are other smaller crawls that are run throughout the year.

However there’s never been a lot of time for exploring the harvested content in detail and ensuring we’re getting the material we think we are. We do run some testing by searching for specific content within the archive eg budget papers and check that it contains all relevant content including spreadsheets and documents. There’s only so much you can do to manually test when this particular archive is 3.5TB and contains around 74 million documents. The Archive-It software does provide some tools for checking crawl results and broadly indicating missed material.

However, as readers continue to explore the collection, they come across things where we haven’t fully captured the content we thought we had. A recent example is the Electoral Atlas of NSW 1856-2006 edited by Eamonn Clifford, Antony Green and David Clune. The State Library does hold it in print and until recently the digital content was hosted on the NSW Parliamentary website.

On initial inspection, it appeared that the content had been captured via the harvest both by SLNSW and the National Library (NLA). The NLA version doesn’t descend further while the NSW version does display the individual election results eg 1984:

Election details of the 1984 New South Wales state election

However, all the links in the 1984 Election Links section return a “Not in Archive” message, similarly for other years. In this example, there is some happy news in that the main wayback machine seems to have captured the site in full including those pages we’ve missed. The question that I need to explore and may need to ask Archive-It about, is why their crawl captured that information and our’s didn’t.

As a side note, I’ve found the Wayback browser plugin (Firefox, Chrome) rather useful for finding archived versions of pages that no longer exist on websites.

 

bits and whiskies

Sat down at the computer today for the first time in a while and installed docker. I have it installed on most of my machines and got round to it on the vivomini today. Was a simple matter to run:

sudo apt install docker.io

enter my password and off it went. Docker containers include everything you’re likely to need to run a particular batch of software. Installing software is rarely simple and may rely on the presence of other packages which leads into a vicious circle of finding all the dependencies and installing them. In this case, I wanted to try the new-ish docker container for the Archives Unleashed Toolkit which, in earlier days and been a little challenging in a on docker environment. Whereas this version was dead simple via docker on a linux command line:

Step 1 sudo docker pull archivesunleashed/docker-aut
Step 2 sudo docker run --rm -it archivesunleashed/docker-aut

Both steps took a while but I think it was around 15-20 minutes altogether on my ADSL2 house wifi (my NBN option is HFC and that’s been delayed several months). When the second step finished I was greeted with the opening screen for the spark shell and ready to work. Very nice and will have more of a play later.

For now, I’m currently downloading Horizon Zero Dawn: The Frozen Wilds and rather looking forward to revisiting my favourite game of 2017, and possibly even my favourite game since Skyrim. Actually, I’m not sure on the latter and I haven’t actually stopped playing Skyrim. I have been playing a lot of Assassin’s Creed: Origins over the last couple of months and it feels like there’s still so much to explore. Some of it is a bit repetitive yet it’s wonderful exploring such a well realised version of Egypt, in the time of Cleopatra, and its surrounds. With that said, I’m at the point where I’m going to ease back and pop into it occasionally rather than have it as my primary game.

Then there was whisky. All the bottles I had opened in early November are now finished. Back then I had 9 bottles altogether with 5 open, now  9 bottles and 4 open. Actually I have an additional 7 bottles but they’re each 50ml and combined are equivalent to a single bottle. My partner bought me a box of 4 peated malts for christmas, and I picked up a taster pack of 3 Loch Lomond whiskies. Whiskies opened include:

  • Hellyers Road 10 year old (46.2%) – a nice, soft dram from Tasmania. Usually retails around $90 and I think I’m on my second bottle.
  • Ben Nevis 18 year old (single cask, 54.7%) – strong but delish, loving this one and on to the second bottle. This was $240 and is part of a fund raiser for a new distillery in Corowa, NSW.
  • BenRiach Peated Cask Strength Single Malt (56%) – also strong and also delish. This was $150 and I have a suspicion that BenRiach is turning out to be one of my favourite distilleries after Highland Park and Overeem. I have also enjoyed their 17 year old PX cask.
  • Glenmorangie: The Duthac (43%) – more yum. This was a christmas present and was released for travel retail and is primarily available at duty free places at airports, Singapore in this instance. Part finished in Pedro Ximinez casks. Sherry casks are my preferred and the Pedro Ximinez (PX) seems to raise that a notch or two.

Speaking of Pedro, I rather like sherry straight too. I used to prefer ports and muscats, and even had a port barrel maturing at one stage. I suspect if I ever do another barrel it will be for sherry. Of sherries, the Pedro Ximinez or PX (though it seems irreverent to shorten it such) is turning out to be my favourite. I have been trying out various releases from cheap to expensive, the most expensive being around $55 for 350ml! My favourite, while a little pricey, seems to be the Cardenal Cisneros at $56/750ml, though cheap compared to whisky.

knuth

I often say professionally that I did a compsci major (though can never claim it officially) yonks ago but decided against becoming a programmer. That’s not a decision I regret mostly, though it must be said I continue to have strong leanings that direction. Scarily, it’s been over 25 years since those compsci days. Still, I learnt good stuff.

I recall in the second half of first year compsci, we had an older lecturer at the time who was actually a maths lecturer who seemed to have come across into computers. I can say “older” as I’ve just found this bio which sums up very briefly a rather fascinating career. He may even have been one of my favourite lecturers as he liked to play with new ideas and introduced stuff he knew about from maths into computing. I was a very rare beast in compsci in that I was enrolled under BA and not directly in Compsci and I did no math. I had done first year math but it wasn’t quite my bag. Doherty was very big on mathematical ideas and assessing efficiencies of algorithms.

I recall him talking some weird algorithm for encrypting data and he worked through the basic idea in a lecture, I think it was based on some sort of fractional encoding model. At the end of the lecture, he said the next assignment would be to implement it. I found the idea of it fascinating. The next assignment came out and sure enough it was on encryption so I implemented the algorithm in Pascal that he’d talked about based on my lecture notes. The idea was you’d write code to encrypt a paragraph of text, and code to decrypt the text. I was mostly successful but because it relied on decimal conversion of larger numbers, it rapidly lost accuracy on the 8 bit macs we were using at the time. Out of a sentence of 10 words, it started losing letters by the end of the first word.

Turns out, I should have read the back page of the assignment. Doherty had decided that the technique was a little too experimental for first year compsci and had instead instructed everyone to use a hashing technique. I handed my assignment in and discussed with the class tutor what I’d done. He wasn’t familiar with the algorithm at all but was impressed that it worked and understood why it failed where it did. I got full marks and first year compsci was one of my few high distinctions at uni.

mini computers on top of computer books.Anyway, Doherty would often quote Knuth as the foundation of modern computing. Knuth was all about the development of algorithms and understanding their efficiencies. Algorithms are really important as they represent techniques for solving particular sorts of problems eg what is the best way to sort a random string of numbers? The answer varies depending on how many numbers are in the string, or even whether you can know the number of numbers. For very small sets, a bubble sort is sufficient, and from there you move on to binary searches, binary trees, and so on. I wasn’t always across the math but really appreciated the underlying thinking around assessing approaches to problem solving. Plus Doherty was a fab lecturer with a bit of character.

So Knuth. He is best known for his series, The Art of Computer Programming, which has gone through a few editions and I wonder if it will ever be actually finished; the fourth volume is actually labeled 4A: Combinatorial Algorithms Part 1. Volume 4 is eventually expected to cover 4 volumes: 4A, 4B, 4C, 4D. 4B has been partially released across several fascicles of which 6 have been released. Volume 3 seems to be the most relevant for where I’m at today and where I’m looking to play; #3 is around 750 pages devoted specifically to sorting and searching. So much of what we do online is reliant on being able to find stuff and to find stuff well, it helps if the data has been ordered.

Knuth has this been this name in my head though my life has gone in other directions. A few years ago, I did a google and found that not only were his books on Amazon, there was even a box set of Volumes 1-4A. I bit the bullet about 3 years ago and bought the set, cost around US$180 at the time and looks really, bloody good on the shelf. I haven’t read a great deal yet but dipped in a few times and planning to get into volume 3 properly at some point. I’ve recently being moving stuff around at home and don’t have a lot of space for books next to where my computer gear is these days. However, it turns out, the mac mini sits nicely on top of the set, and my newest computer, the VivoMini sits nicely on top of the mac. I sorta like the idea of these small computers sitting on Knuth’s foundation.

threading delights

I’ve had the new machine a few days and I’m starting to get the hang of it, but learning, lots of learning. Finding linux equivalents of windows tools and then working out how to install them. Troubleshooting unexpected java errors trying to get spark shell to compile properly – turns out I had the JRE but not the full JDK which means I had to download more stuff and update some config files as well pathname references so the system knows where to find stuff.

As it turns out I completely misread the new pages for Archives Unleashed and didn’t see the black menu bar at the top of the screen for all of the docs. Was a little too tired methinks. I  installed stuff using old versions of the docs I found on the wayback machine and other bits. Consequently I’ve ended up with a more recent version of Archives Unleashed (a bit of mouthful after the easier “warcbase”) with 0.10.1 instead of 0.9.0 and I’m running a current version of Spark Shell, 2.2.0, instead of 1.6.1. Anyway it all works…I think.

The next headache was that my harvest test data was still on the mac mini. I wasn’t sure how to get the data across as I couldn’t write to a windows hard drive from the mac. Then had the bright idea of copying the data, 56 files for a total of 80GB, to my home server via wifi. That took 6 hours…to the server, so I went away and did other things. Towards the end of that process I had a bit of time so I worked out that if I formatted a drive for the mac in exFAT format, I could install some utilities in linux to read it. That took an hour, half hour to copy to the drive, half an hour from the drive to linux. Phew.

Then I tried running the SCALA code for extracting the site structure and ran into a few errors as about 15% of the files have developed an error somewhere along the way. I removed all the broken files leaving me with 47 usable ones. All up, it took 18 minutes to process the data, not quite as fast as I was hoping. On the other hand, the advantage of having lots of ram is that there was plenty of space to do other things. Running the same job on the mac mini with dual core CPU and 8GB RAM brought it to a grinding halt and nothing else was possible. On the new machine, I could run everything else normally including web browsing and downloads.

htop2

Regardless of whether I allocated 5gb, 10gb, 24gb, or even 28gb of RAM, time taken to process still hovered around 18 minutes. With 28gb allocated it only needed around 15gb to process, as can be seen in the above screenshot of htop. The other nice thing about htop is that it demonstrated that all 8 CPU threads were in use. Where I think I saved some time is that swap doesn’t seem to have been required which would have reduced some overheads. Either that, or I haven’t worked out how to use swap memory yet.

Still very early days.

some new tech

Following my fun in July when I hit a bit of a wall in playing with large data sets and brought my mac mini to a grinding halt, I ruminated on next steps. Wall aside, it was a wee bit frustrating that running experiments on larger data sets took a long time to run and that’s been a bit off-putting to further progress. So I decided that I really did a new machine and was going to get an intel NUC skull canyon as it was small and fast. I waited for Intel to announce their new 8th generation CPUs which they did recently. Unfortunately the upgrade to the current 6th generation Skull isn’t due till Q2 2018.

On the other hand, prices have been dropping on the barebones Skull and you can pick one up for around AUD$700. However a retailer pointed out to me recently that the ASUS VivoMini, while pricier, uses 7th generation CPUs. Plus it’s a cuter box. After some umming and ahhing, I ordered the vivomini with 32GB RAM and an additional 1TB drive (it includes a 256GB SSD in the m.2 port). The CPU is a 7th generation quad core i7. Total cost was around AUD$1,700 whereas a similarly set up Skull would have been around $1,400-500. It has a small footprint and sits nicely on top of the mac mini.

36938037614_0820718b3e

Picked it up yesterday and it booted straight into windows. Today, somewhat trepidatiously, I had a go at setting it up to dual boot with linux. The last few years I’ve been running linux via virtualbox on windows and that’s been sufficient. It’s been a long, long, long time since I set up a dual boot machine and that was using debian which was a wee bit challenging at the time.

This time round it was all easy as. I followed some straightforward instructions carefully and tested initially on a live boot via USB and then used that USB to install it properly. I’ve booted back and forth between windows and linux several times just to be sure and so far so good. I’m currently writing this blog via firefox in ubuntu. My next step was going to be to set up warcbase however that’s been deprecated as Ian Milligan and his team have received a new grant and are working on building an updated environment under their Archives Unleashed Toolkit. So I’ll play with that instead :) Regardless I’ll still need to get Apache Spark up and running which is likely my next step.