I’m in my early days of playing with big data stuff though finding the small systems I’m using a little on the slow side. Stuff takes a while to load, a while to run, a while to die. Lots of waiting. In Monday’s post, I commented about running code for extracting site structures from my data set and it taking 20-30 minutes before crashing when it encountered an unexpected file, then another 20-30 minutes to make a successful run. That did at least mean I had time to catch up on Masterchef :-) Interestingly, Gephi seemed to run faster on my work laptop (windows 7) than it did on my mac mini. Mind you, neither system was built with this sort of processing in mind.
A friend recently introduced me to a new term, “nuc clusters”. These are based on Intel NUC (Next Unit of Computing) mini computers and they have been getting more and more and impressive each year. They’re tiny computers, smaller than a mac mini but a little chunkier. The high end versions are close to AUD$1,000 and then you need to add high speed RAM and solid state drives (SSD). On the other hand, that includes a quad core Intel i7 chip, space for 32GB RAM and 2 high speed M.2 sockets for SSDs. For around $1,500 you end up with a “basic” system with some serious power on a really, tiny footprint. Some enterprising folk have taken that a wee bit further and set up desktop server racks, networking multiple NUCs together into server clusters. There are naked versions that fit in a shoebox, where they’ve removed the casings and built mini racks for the motherboards, a briefcase version, and someone has even constructed a server rack out of lego.
In other words, while it’s a little pricey, you end with some serious computing power that doesn’t take over your deskspace. While a 4 board system sounds really awesome effectively running 16 cores, 128GB RAM and 4-8TB SSD (there are now 2TB SSD but they’re not cheap), a starting price of $6,000 or so starts to sound scary. I had a look around at desktop tower based systems and a decently powered system probably starts around $3-4,000. Intel have recently announced a new CPU with 18 cores and 36 threads for around US$2,000 just for the CPU. Eep! Reading a few whirlpool boards I discovered that there’s a big second-hand market for servers and racks. Prices are pretty good though they’re usually ex-server farms and take up a fair bit of physical space, not to mention I’m not especially confident with hardware stuff and getting one of these up and running may be a little more challenging.
Then of course, there’s cloud based servers and clusters including Amazon Web Servers (AWS) who even have an EMR (Elastic MapReduce) platform for running all the tools I’m currently playing with, and then some. Ian Mulligan and co have even developed instructions for running warcbase in an AWS environment. I haven’t looked into these too much yet. Longer term I may need to give all this some more serious attention but for now the mac mini is at least adequate. Just.