Ludicity

Breaking My University's Machine Learning Competition

A few weeks ago, I popped open the Y-Combinator application form and saw that one of the only things they ask for is a story about hacking a non-technical system. Well, I don't really have a reason to beg for money, lacking an unprofitable idea to beeline for an IPO1 with, but I did find myself really wanting to write about this. Starting with all the hacking I did at university.

I have many, mostly vaguely negative, feelings towards university, which many people find odd coming from someone that did exceedingly well at it. The majority of the experience can best be summarized as occasionally having the privilege of being near extremely smart teachers and students, but most of the time I was being lavishly praised for doing very easy things, while watching the majority of my peers desperately flounder, and from time-to-time being forced to appease a confused authority figure.

That was my experience through the entirety of my psychology degree, and I thought I.T would be much more serious. In retrospect, the most valuable thing that I learned at university - though it took me a while to figure it out - is that the world outside of academia behaves approximately as stupidly.

Let us explore this dysfunction, starting with the lessons I learned from getting a perfect score on a machine learning competition then being asked to bury having done so to save everyone face.

As you read in horror, remember that I attended Monash University2, which is one of the best in Australia.

I.

We must begin with introducing my adversary, The Great Lecturer, The Beast With A Thousand Slides Who Writes Code That Does Not Compile.

In 2018, I was one year into my Masters degree in data science. Coming from a pure psychology background, this was invigorating in some ways, but mostly stressful because of the poor quality of the course material. This culminated in the unit everyone was most excited for - an introduction to machine learning. The course was run by someone who would eventually become a Very Important Person within the faculty, and boy oh boy, do I have some questions about how that happened.

To begin with, The Great Lecturer, The Swaying Professor Whose Papers All Read Death, opened the introductory lecture with a certain amount of charisma. I remember laughing and thinking "Hey, this guy's all right!". Oh, how I laughed and laughed, unaware of the horror that he was going to visit upon me for the next 12 months.

Then he does something which has stuck with me ever since. There are so many ways of looking at it. It was a little pebble before the avalanche, it was the chink in his armor, and everywhere I go I look for this event, because I know that I must find it then decide if I am running from the crushing snow or slaying Smaugh.

He displays about 200 lines of codes, invites us to run it when we leave the lecture, and upon arriving home I realize that it can't possibly run. There is a hardcoded zero-division error, which means he wrote 200 lines of code then just assumed it would work. I don't know how the rest of you work, but I assume that I have probably introduced some sort of terrible bug into my code unless I test it every five lines. And frankly, when the tests pass, I become more nervous because it might mean my tests are bad.

At this point, I reflect.

Would I ever show code to hundreds of students without testing out the ol' VSCode3 run button? ... No, I really don't think so. The only way that would happen would be if I was in a rush and made one last edit which I thought was safe... but I'd never divide a number by zero like a fucking dork.

What's my working hypothesis? Maybe he's actually great, for I am a lowly dweeb with minimal work experience, and he is a professor. But I also remembered the charisma and thought "Hey! Maybe this guy shouldn't have the job he has."

I file this away, and some arachnid part of my brain hunkers down in wait for six months, a hundred multifaceted eyes glimmering with malice and, most importantly, a burning desire to know the truth.

II.

Finally, months later, the opportunity arises to find out if he could really be that incompetent. As every university seems to do, there is a group assignment, presumably to improve our collaboration skills. There are so, so many layers to this.

Firstly, I suspect that almost everyone reading this blog will have the experience of being the person in the group that does all the work at university. I ran an ad-hoc poll at a party filled with my friends the other day, and I realized they were all that person. University tasks aren't actually complex enough at most places to warrant division of labor, and eventually you get tired of trying to wrangle people who are still working out things like "starting assignments before the day they're due" and "doing their own laundry". We've all been there, I've been there, everyone is absolved of their sins but I'm just going to go finish the work, okay?

Secondly, the university insists that this is actually preparation to the workplace, because you've simply got to collaborate and get the work done, or there will be consequences. If you are a student with a sense of justice, it is at this point that you have probably become incensed as you've actually noticed the lack of consequences for the slackers. If you complain, you will receive either nothing or if you're a real persistent jerk about it, they will bribe you by assuring you a minimum of a pass4 as long as you report on progress regularly.

Thirdly, it turns out this is all actually brilliant because that's exactly what the real world is like. A handful of assignments (read: good teams) where everyone's effort actually matters, but it's mostly sort of just blundering around until everyone limps past the finish line, declares mission accomplished, then all the savvy people leave before they must reap what they have sown. If only people had comforted me by saying this was an accurate simulation of most organizations instead of telling me it would be better after I left! Think of all the pain that could have been averted!

Why do the universities actually do it? Probably because they only have to pay tutors to mark ten teams instead of forty students, honestly.

In any case, The Great Lecturer, Whose H-Index Crushes Third World Nations, decides that it would be fun if the final assignment is a machine learning competition. Graded on a bell curve. With no marketing criteria other than "How accurate was your answer?"

For the blissfully unaware of the whole machine learning space, here is a one sentence explanation which will allow you to understand this story, and is also probably one of the lowest-bullshit definitions available on the internet in 2024. Ready? Braced? Okay:

A human being goes through some records and categorizes them, then a computer looks at those categorized records and uses nerd math to figure out if it can approximate the rules the human was using.

Phew, that was a lot. You're a thought leader now, congratulations on your ascension.

So we have an assignment where I'm given a big pile of categorized records, a big pile of uncategorized records, and told "the top four teams of the sixty-four teams enrolled will score a High Distinction". I don't know how it works in American universities, but maybe that's called an A+, yeehaw5. This is an extremely stupid assignment for many reasons, but I will leave figuring out the issues as an exercise to the reader, and instead merely kick off the brainstorming with "Do you think having to jealously guard your work fosters a learning environment which sparks joy?"

The assignment is here. The zero-division error is floating around my brain. I must score as highly as possible. We must do two things.

One: We do what we always do after studying for years - I tell my team I'm going to do the whole assignment. They turned out later to be actually competent but I had learned not to waste time betting on this for an assignment where one person is the optimal team size anyway.

Two: We must defeat The Great Lecturer, The Academic Whose Citations Span The Sky As A Million Screaming Stars.

III.

I begin work with one thought blazing at the forefront of my mind, and without it I would not have found what I did. That thought was this:

Surely a man that fails to run his code before taking it to a lecture is careless everywhere.

The dataset we have been provided is strange. It is hundreds of thousands of records that look like this, with the categories having been masked by the staff with alphabetical codes (A - Z):

Record Category
Text A
Text A
Text ...
Text A
Text A
Text B
Text B
Text ...
Text B
Text B
Text C
Text C
Text ...
Text C
Text C

And the text itself is stranger still. Even trying to guess what the link is between some randomly sampled A-categorized stuff, I come up with nothing. It all reads like:

A new type of mosquito has been discovered in South Africa.

Which is in the same category as:

It is a special day in the Mayan calendar.

What on earth is the link between these things? Remember, the goal of the assignment is to get our computers to learn some rules connecting the text to the categories, then running that rule against a dataset that looks thus:

Record Category
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?

Any team that figures out what the rule is, while they can't just label the records by hand, will be able to manually verify their algorithm's correctness and gain a huge advantage over the competition. Most people don't even try, but the savvier students at least try to cheat a little bit.

Unfortunately, The Great Lecturer, Whose Weighty Soul Is Borne Aloft By A Cacophony Of Weeping Doctoral Candidates, has assured us that he has gone to great lengths to find this data set, and that we will be unable to figure out where it is coming from. A preliminary Google search reveals this to be true. Everyone gives up.

Apart from me, because I remember his zero-division error.

I grab a record and run an exact search, meaning that I wrapped the text in quotation marks.

IV.

Jackpot, you bastard. I've cracked your puzzle, you sick fuck, and no God can save you from me now that I know you are weak. They're some sort of weird articles posted on BBC webpages, which I now suspect were generated from an RSS feed. I note that there's a tiny label saying china at the bottom of the page. I do some checking and yes, he has somehow scraped a million of these, and the labels at the bottom are what the categories are. The reason that mosquito discovery and the Mayan calendar are both labelled as China is... low-quality data entry at the BBC.

But we aren't even getting started. Remember, I broke the competition, I didn't just score slightly higher than the other nerds, who were also presumably on track to become some of Australia's leading data science practitioners6.

My arachnid-mind is triumphant, and thinks "If someone is going to be this silly... no. He wouldn't have. But yes, he would."

You see, there is one major issue with simply looking up all the records in what we call the "test" data set - the one that I need to label and get graded on. It's huge and I'd have to repeatedly run searches to try dig up the records that I'm looking for. It's unachievable, but if it was achievable then I'd just have all the answers. Unless...

Allow me to draw your attention to something. Recall the dataset looks like this.

Record Category
Text A
Text A
Text ...
Text A
Text A
Text B
Text B
Text ...
Text B
Text B
Text C
Text C
Text ...
Text C
Text C

All the As, then all the Bs, then all the Cs.

What if The Great Lecturer, Whose Heart Is A Cannibal Inferno Burning Taxpayer Money, didn't shuffle the test data? Because if that was the case, then I could take this:

Record Category
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?
Text ?

And select a record in the middle:

Record Category
Text ?
Text ?
Text ?
Text ?
Text B
Text ?
Text ?
Text ?

Then select another record in the middle of that split:

Record Category
Text ?
Text ?
Text A
Text ?
Text B
Text ?
Text ?
Text ?

Then repeat that process between A and B until I find the exact spot where A becomes B7. By finding the boundary between A and B, then the boundary between B and C, I can just assume everything between those points is B with a number of comparisons that I'm too lazy to work out or look up, but it would have a pleasingly large value in the denominator, or maybe a log or somewhere else. You know, one of the happy complexity symbols, not the sad ones.

V.

I am right. No one has ever been more right, and I am not open to arguments to the contrary. Please leave them in the Complaints box, yes, the one next to the raging bonfire and paper ash. By the next day, I have achieved a perfect score on the assignment.

Of course, the thing that we all expected to happen in the real world happens. In a good book, I would have been given an amazing grade for my slick work, but this wasn't a book. The university states that this exploit isn't allowed as a submission. The superficial reason is that it doesn't demonstrate any understanding of machine learning, but the real reason is that it's embarrassing. I would probably go extremely far out of my way to hire a student that did the same thing (but of course I'd say that, having done it). But a more concrete, less masturbatory data point is that one of my team members, Peter Ince - who actually had industry experience, offered to help me find employment. Peter is in the crypto space so I never took him up on this8 but still, the person that was actually employed on real engineering projects was impressed!

However, I'm left in a slightly awkward position. Between fencing training, my other assignments, and messing around with this, I don't have a good solution prepared for the assignment. I was smart enough to know this was going to happen, but I just had to know if I my read was right on The Great Lecturer, Whose P-Hacking Rends Reality Asunder, being clueless.

Miraculously, things get stupider.

With a few days left, the first thing I do is ensure that I have some solution that performs adequately, as the bell curve means that I can instantly guarantee myself a pass since I know that most of the teams are probably just starting anyway. One of our lecturers had spoken about Amazon's H2O platform, which does AutoML (read: you put the data in and it just kinda tries stuff until something works). So I start by using H2O and save that solution, which takes ten minutes.

Then I Google similar problem domains, and perform some actual work, setting up some simple training stuff to run on my laptop overnight using XGBoost. I put a lot of work into this as I'd sure like a high grade, if only so that my Southeast Asian surgeon dad is not embarrassed to have produced such a mediocre son, but the solution feels lackluster at the end. It was really just churning through tons of different options in XGBoost, and there simply didn't seem to be any obvious improvements to try. Years later, with actual work experience and knowledge, I still have no idea what else would really be worth trying, as most of the problem probably does degenerate into a complex word search.

I submit to grading and score 4th with the XGBoost solution, being approximately 0.025% off from the 1st place score! High Distinction! Let's go! Astute machine learning specialists will notice that, doing the assignment fairly, your grade is almost certainly determined by variance during the model building process rather than effort, which is probably also a metaphor for life outside of university.

Wait. What was the 5th place score? It was still lower than what I got from just pushing everything into H2O and clicking run? What the fuck were all the other students doing?

AI_ROI.png

Are we not seeing return-on-investment because AI is hard, or is it because of those sixty-four teams, only four of them rolled out an answer better than using the lowest effort solution in the world as a benchmark and those are the only people that are worth hiring? Or is it, perhaps, because The Great Lecturer, Whose Hideous Forms Belies His Extensive Cross-Disciplinary Research, has a counterpart, The Great Executive, Whose Rolex Clicks Through Gears Of Human Suffering? We may never know.

Intermezzo:

During my first programming class (may I remind you, at the postgraduate level) I arrive to find that the lecture hall, normally packed, sits half-vacant, almost a third of my peers missing.

Our first assignment was to write a Python script that takes a user's name and age from the terminal and print it. A full third of the students plagiarized because they were unable to figure out how to run a variation of print(input("What is your name?")), but they made the tactical error of being in noted gigachad lecturer Gavin Kroeger's class, Whose Luminous Being Suffuses My IDE, so they all got suspended instead of a rap on the knuckles.

They also all still graduated eventually, somehow, so please never, ever feel bad for not having a degree. I think they might increase the odds that a randomly given person is incompetent.

VI.

The Great Lecturer, Whose Forked Tongue Tastes The Air Of LinkedIn, runs another unit on data wrangling, which is such a spectacular disaster that the final assignment stretches two weeks into the exam period because students keep discovering new issues that make the assignment literally impossible. It took some of the students over six months to recover from the sheer stress.

Eventually I am in a room with his head tutor, who remarks that thanks to another issue with the assignment, I would score 75%, but am only allowed to score a maximum of 50% because The Great Lecturer, Whose Tweets About Research Impact Arrive In Satan's Ear, decided that scoring less than 50% on the flawed part of the assignment removes all your marks from the second part. Almost every student had crashed in at 49% for technical reasons I can't even remember.

The other students have sadly accepted their low grades - many of them were caught out on the same error, but we do not accept that authority figures can make mistakes here (also a useful life lesson). The head tutor is having this conversation with the entire cohort.

However, I am Malaysian, and I have a least a passing understanding of dealing with morally compromised midwits via cultural osmosis.

I remark that I am extremely distressed that I have to raise such a formal complaint with the dean, or possibly the ombudsman, over one mark. However, if my grades were 75%, then I would probably be happy, and it's just one mark. If only we could decide that the subjectively graded parts of my assignment were worth one mark more.

He nods cautiously, changes my grade on the first section from 49% to 50%, gives me a total grade of 75%, and we never speak of the matter again.

The real funny thing about this is that there was a point in my life where I thought grades mattered outside of applying for courses with academic gatekeepers.

VII. Reflections

Reflecting on the experience, I wish I had been more open to the idea that much of my work experience would consist of people not taking things terribly seriously - which is not even incompatible with having fun. I could have learned a lot from university and spared myself a lot of pain, instead of learning some things the hard way. It's true that university is unusually silly in most domains, but in many ways the exaggeration of the worst traits in large, impersonal organizations (no accountability, many authority figures, those lowest in the rungs lacking perspective, only having remediation mechanisms where authority judges itself) would have prepared me for things like my eventual navigation of the immigration machinery in a way that my young, inexperienced brain could have actually processed.

Indeed, many of the lessons from my university experience stem not from lecture material, but from simply paying attention to what I struggled with and what others struggled with (and similarly watching to see what people didn't struggle with and copying them).

Take my psychology studies. The smartest psychologists I know, such as Melbourne University's Charles Malpas9 were absolute machines, but the majority of practitioners were very lackluster, with even PhD students frequently demonstrating that they lacked any serious foundation in statistics, epistemology, and scientific thinking. The real value and test of a psychology qualification is whether you can discern the bullshit from the useful material, and weaponize that against the system.

Psychology degrees are obsessed with two thousand word essays. When I tutored, I would always instruct my students to seek feedback from their tutors, and specifically ask "What would I need to do in order to score 80%? It is very important to me."

The truth of the matter is that because the majority of psychological research is silly10, especially the type that goes into undergraduate courses, you're really being marked on writing ability, which is actually vibes-based if you want to score higher than 65%. For some reason, many students believe the fiction that their grades are based on some merit intrinsic to the writing, but once you have passed the basic hurdles of formatting and grammar, the difference between 75% and 90% is style and, in the case of mediocre lecturers, pandering to their biases. If I started grading random fiction models on from 0% to 100% and insisting that various reviewers would consistently land within even 5% of each other, you would call me insane. If I do this for student essays, now I'm just a normal tutor.

Most of the advice my students got was nonsense. Anything from "paragraphs should be less than a hundred words" to "why are your paragraphs shorter than a hundred words?". The real reason that you always ask tutors for advice is because you are trying to make them have to overcome cognitive dissonance if they want to grade you lower than the amount you were aiming for. At the very least, how could a student do worse after seeking advice? What would that say about the advice giver? Heck, this would work on me and I'm the main distributor of this trick! The course is actually filled with material saying "Humans are great at coming up with post-hoc justifications for things", and the real test is to see if you're clever enough to act on that information even if the people setting the test don't realize that's what it's about.

Is that calculating and possibly awful? Yes, but people were paying me for grades. Trade offer: You give me money. I give you an unfair advantage.

Similarly, I'd advise them to:

  1. Actually used spaced repetition for studying, because the Australian system is based on multiple choice questions and this is probably objectively the fastest way to rack up infinite points in those formats
  2. Leave all social media groups with other students, because they tend to experience massive groupthink and produce homogenous work, then wonder why they all only got 65%
  3. Finish their work two weeks before it was due or I wouldn't look at it, so they'd stop suffering from perpetual Parkinson's Law

My overwhelming experience both during and after university has been that many people struggle to do very basic things, in a way that is perplexing and interesting in equal parts. I've said it before and I'll say it again, the reason I keep piano playing in my life is that I am profoundly untalented at it, and I refine my playing approximately as stupidly as The Great Lecturer, Whose Grants Are Unrealized Cancer Cures, produces his lecture material. Painfully stumbling across the keys every week in front of my clearly-perplexed teacher11 helps keep me grounded even as I occasionally accomplish something that might otherwise go to my head.

It also brings to mind the simple joy of being a recalcitrant malcontent who doesn't take kindly to undeserved authority, which is not the best self-promotion but probably honest. There's something healthy about seeing clearly and trusting your intuition over social scripts, at least sometimes.

When Morpheus says:

Come on! Stop trying to hit me and hit me!

Or Yoda comes out with the classic:

Try not. Do or do not. There is no try.

If we briefly put aside that the authorial intent was probably to say something that sounds profound but is actually silly12, there are many symbolic ways to tackle problems, such as studying very hard, talking about how tech debt prevents systemic improvement, and juggling Jira cards which are completely unrelated to solving the actual problems. I think that's what causes people to have this inkling that there's something there, even if they forget by the time the movie is over. Symbolic efforts are frequently a failure of imagination, a failure of genuine attention, or to simply have a woe-is-me story when things go wrong - and I've done all three at different points in my life, absolutely. Seeing clearly is a joyous and enriching skill, and is incidentally the part of the LessWrong/Rationalist movement that I like, when all the weirdness is excised.

Out of it all, the great lessons I took were that you can usually tell if you're trying by looking at whether you're not doing what the struggling masses are doing, that everyone is going to be clueless in a few areas of their life (hopefully not the ones they charge society huge amounts of money for), and most entertainingly, that incompetence is widely exploitable for fun and profit.

In any case, I wish all of that veil-piercing somehow waived those enormous university fees13.


PS:

I'm going to be in Fiji for three weeks, starting on May 18th. If there are any readers in Nadi who would like to get coffee, let me know! If I don't write while I am there, I will freeze my Liberapay and Patreon, which by the way, did you know I have those?


  1. lmao get owned, VCs. Proceed with the light of the bridges you're burning, right? 

  2. lmao get owned, alma mater. 

  3. Actually, I was probably using Pycharm because universities really don't seem to teach the important things - I don't know anyone that uses Pycharm in the data space. 

  4. I've only seen this once, and in retrospect it should have been a valuable life lesson. 

  5. lmao get owned, Americans. 

  6. lmao get owned, Australians. 

  7. This long-winded explanation was because I don't write much on technical things, and we have several non-technical readers with us. However, I thought it was interesting that this only came to me because I had recently skimmed Use The Index, Luke! which was my first introduction to binary search algorithms. Weird how almost everything ends up being useful! 

  8. In fact, he is the only person in the space that I respect enough to bother wondering whether there's some tiny sane pocket in there. 

  9. Who will be no doubt thrilled to see that one of his highest profile students, who credits him as a statistics mentor, mentions him in an article referring to good time complexity as having "happy symbols". 

  10. Probably most research in every field, honestly. My Honours and Masters theses were complicated and stressful, but even I don't believe in the results. 

  11. lmao get owned, Dime. 

  12. Although I will note that a common error with new sabre fencers it that they will attack your blade instead of you because they think that's what fencing is! You can stand totally immobile and they will frequently bounce off your guard anyway. 

  13. lmao get owned, me.