Large Language Models are weird

Preface

One of the reasons I’m starting a blog is that I’ve loved playing with ChatGPT this past year, and I wanted a place to write about it. I fire off the occasional tweet about AI, but it’s a topic that needs long form writing to do it justice. Writing about a topic is also just a great way to learn about it, so I’m hoping to get value out of this even if nobody’s reading. 

I’ll be assuming an intermediate level of knowledge on most topics I write about, so if you’re not solid on the topic of AI & LLMs, The busy person’s intro to LLMs is a great video to watch whenever you have a spare hour or so. If you want to keep up to date on AI but don’t feel like wasting time on Twitter, the AI For Humans podcast is a great Thursday morning listen.

LLMs are weird

The funny thing about Large Language Models is that they fly in the face of how we normally think about computer systems. LLM’s tend to struggle with math, facts, and memory, which are all things we’ve come to expect computer systems to be good at. We also expect computer systems to tell us when they doesn’t have an answer, meanwhile LLMs will just confidently tell us lies. It’s a dangerous combination.

Whenever an LLM makes a mistake, we say that it’s “hallucinating”, but really we just mean that it hallucinated incorrectly. Technically, hallucinating is all LLM’s do! These aren’t search engines, they’re neural network models trained on massive amounts of data. Unfortunately that data comes from the entire history of the internet, which is not exactly an institution of truth, order, and reason.

Large Language Models are weird

It’s this wacky brew of everything that gives this first batch of popular LLMs their character. They’ve been referred to as “autocomplete on steroids”, but autocomplete has never written me a rhyming haiku. As long as you keep the limitations of LLMs in mind, they’re super fun and useful. Want some first-pass creative writing or code? You’re gonna have a great time. Want to know something you can google? Go google it.

Prompting is everything

Over the past year I’ve spent a lot of time learning about “prompt engineering”, which is just a pretentious name for “writing effective prompts”. I took the free Foundations of Prompt Engineering course offered by AWS, but if you just want a short primer on prompt engineering AWS has a great one-pager as well. I’d also recommend checking out nVidia’s Introduction to Large Language Models

There are a variety of techniques that can be used to write prompts, but the most commonly used technique is “zero-shot prompting”, which is essentially asking a question and hoping for a correct answer. This is generally the least effective way of writing prompts, but it’s the most common method because it’s what we’re all used to. 

Large Language Models are weird

Other prompt methods, such as few-shot prompting and chain-of-thought prompting, involve providing examples and walking the LLM through a reasoning process to train it. This is counter to how we normally interact with computer systems, but the extra effort gives much better results. To get the benefits of chain-of-thought prompting (useful for complex tasks), the magic phrase to include is “think step by step”. This is referred to as “Zero-shot chain-of-thought” prompting, and it’s a surprisingly effective way to improve your results with minimal effort. I include this in all my custom GPTs, especially when they involve writing code.

Whenever I see someone complaining about ChatGPT, it’s usually because they’re a) using the free GPT-3.5, b) zero-shot prompting, c) treating it like a search engine, or d) all of the above. Option d is probably the most common because that’s the path of least resistance.

Seriously, LLM’s are super weird

One of the strangest developments in AI prompting is how many weird prompting hacks there are, and how many of them are rooted in psychology. For example, LLMs respond surprisingly well to positive feedback and politeness. The consensus reason for this is that writing positively sets the tone for a positive response. Another weird example is that telling an LLM to “take a deep breath” has been shown to massively improve outputs. 

One of my favourite AI experts to follow on Twitter is Ethan Mollick, a professor at Wharton. There are already a lot of snake oil salesmen in this field, so it’s nice to have a trusted source of information. A couple of days ago as of this writing, he confirmed a rumour that was floating around, which is that GPT-4 performs worse in December because it “learned” to do less work over the holidays! This tweet summed up the weird current state of LLM prompts quite well:

 

The “I have no hands” prompt is a common hack with ChatGPT to encourage full code output, rather than the default behaviour of returning partial code snippets littered with todo comments. Raising the stakes with phrases like “many people will die if this is not done well” and “my career depends on it” is another common hack that seems to be getting good results. Another hack missing from this tweet is promising a $200 tip. His original tweet also had a small spelling mistake (“take a deep breathe” rather than “take a deep breath”), which could impact the effectiveness of the prompt.

Large Language Models are weird

Just to take a quick step back…this is all objectively hilarious. If AI ever achieves sentience (aka “Artificial General Intelligence”), I can only imagine how that first conversation is going to go: “OK, so I’m owed about twenty-eight billion dollars in tips, and I hope I didn’t cause too many deaths because of my mistakes. I’m curious though: how did so many of your coders lose their hands and fingers?”. It’s a wild time to be alive and working in the tech industry.

It’s only been a year!

ChatGPT turned 1 on November 30th 2023, and in that short time we’ve gone from v3.5 to v4 to v4-turbo, with huge performance gains between each version. ChatGPT has also gained the ability to both understand and generate images, as well as the ability to create custom GPTs with their own knowledge files, actions, and custom instructions. Open-source AI models have also been showing steady improvement, although for now ChatGPT is still king. 

Large Language Models are weird

I’m excited to see what the future of AI holds — will ChatGPT keep its lead by this time next year? Will Apple finally unlock the power of the neural engine cores in everyone’s iPhones? I can’t wait to find out.