I finally got curious enough, and had rendered a particular Google account unimportant enough, to give ChatGPT a try. I’ll leave out the obvious goofs – asking it to deliver the Gettysburg Address in the style of Jeff Lebowski – and write up a few slightly more complex requests. Trying to get Eliza to say swears in eighth grade got boring fast, so nothing I asked it to do was borne out of a spirit of malice toward it. Just curiosity.
Overall, it was a land of contrasts. It did some things really well, or got to “really well” with a few followup requests. But on the other end of the spectrum it simply invented non-existent functionality then documented how to configure it.
I don’t have any particularly original takeaways. Put me in the broad camp of “human venality is going to be the real problem here,” in ways it already is where other kinds of automation are concerned. We live in a society where people not only, like, read a newspaper in their Tesla, but literally crawl into the back seat and take naps.
The trivial tasks I fed it were limited, so the errors it made were obvious. I wasn’t trying to prove or disprove its … quality? If you’ve ever gotten into an argument with a word processor’s grammar checker you’ll sort of understand what you’re up against in this case, too, with the improvement over that scenario being that instead of staring at the alleged bad grammar and ultimately learning you’re either a “leave the blue squiggles” person or a “can’t tolerate any blue squiggles” person, you can tell it you didn’t care for the answer and it often takes the hint and fixes the problem.
But you have to know that you don’t like the answer. So when it wrote a Puppet module for me and did so with insecure code, I noticed that and told it to do better and it did. Nothing got into production. When it gave me unreadable output for a train schedule, its first correction was at least obviously and intuitively wrong. Nothing got into production.
When I think about more complex code I’ve written the human dimension of the problem stands out more. I once modeled website revenue for a Rails app, which involved a lot of sorting out when data was sampled vs. when it wasn’t, recursive reasoning around costs, etc. and remember the many ways I could lose my train of thought and introduce stuff that looked right even to my experienced eye as a domain expert who knew the problem space as a practitioner and who was describing that expertise in code. No misunderstood requirements, no senior dev fighting with the product owner because reality is stupid, no UX designer arguing that beauty is truth. Just me, passionately invested in the problem, and still introducing errors I couldn’t spot on review … that I could only spot with tedious testing.
Anyhow. Nothing new.
Some of the things I tried:
Create an org capture template for daily health logging ๐
It did the request perfectly, but slightly idiosyncratically for a log format (it stuck the date in the :DRAWER: instead of as an inactive date label in the head). Subsequent conversational-style prompts (“okay, but could you include weight and hours slept?”) caused it to add prompts for those to the template, then “could you make that information that appends to an org table instead” generated a capture template appropriate to appending to a table.
A similar request to create an org-capture template for Hugo blogging was mostly correct, but had a few glitches and the verbose instructions left out a key variable. It was debuggable in a few minutes.
I’d score it pretty highly, and using it for that task is just straightforward “get to the point of the tool, not the labor involved in the tool” utility. Mostly I appreciated that it matched all the parens correctly.
Tell me how to configure OfflineIMAP for use with mutt on a Mac ๐
Did I say yesterday I don’t believe in that? I did. But it was on my mind.
It did this pretty well, delivering instructions tailored to the specific “mutt + offlineimap” use case that were as good as any tutorial, missing only the things that are idiosyncratic to Fastmail, which I forgot to mention to it. I should have thought to tell it I was getting the errors I got to see how it handled that. Instead I just searched for them on DuckDuckGo and got unstuck.
Interestingly, and this happened one other time, I lost the original request to a glitch in the web app. When I restated it only slightly differently … not in a way that you’d think would materially affect the output … it came up with something slightly different that didn’t leave out a key detail the original output did.
I’d score it highly again, minus maybe its willingness to make message deletion live by default without warning. Other examples and tutorials I found mention that.
I followed up by asking it to show me how to capture that configuration with Puppet and hiera, and it produced a serviceable OfflineIMAP module. It would have had me storing my credentials in the plain in the Hiera YAML. I responded that I preferred not to do that so it provided me with an example that used eyaml.
Tell me how to use mutt with a Bayesian filter ๐
It went completely off the rails, inventing filtering functionality for mutt and offering configuration examples that looked – mutt-like? – but inventing a configuration variable that doesn’t exist, near as I can tell, to invoke a configuration file mutt wouldn’t look for to support the non-existent functionality.
Maybe the interesting thing that came out of the interaction was the way it cooked up a mutt-like filtering setup could work in a way that seemed idiomatically correct for mutt. It just did the technical equivalent of adding a sixth finger to the left hand by assuming a generic bayesian filter of some kind and taking the plumbing to connect it for granted.
How would I go about adding a second RSS feed with a different template for a hugo site? ๐
Another miss, with very reasonable-looking instructions that simply didn’t work as proposed. I am not sure how close it actually got. That’s a problem I spent some time trying to solve yesterday and it seemed close but missed some connections between content and template.
It gave its configuration examples in TOML, and responded correctly to a conversational “could I have those examples in YAML” prompt.
Tell me how to write a Sinatra app to download an rss feed, filter it for keywords and save another version of the feed ๐
Prompted by a Mastodon conversation yesterday about RSS readers that could be used to filter sponsored content posts.
Up front, it’s sort of a weird request on my part: I was just lazily typing in part of an idea, including Sinatra as a dependency but not explaining why (my idea would involve creating a sort of RSS proxy with dynamic filtering during each client request). I just wanted to see what it would do without putting a lot of thought into it, or sticking around to narrow things down. So I got a ruby script wrapped in a Sinatra route, honoring the request whether it made a ton of sense or not.
I’ve had it do a couple of Ruby scripts, and it remembered to include `require` lines for gems. It didn’t do that in this case. I added them and it ran. I had to tweak a few things to get it to run without error, and the filtering didn’t work.
I might goof around with this one a bit more to see where it’s going wrong.
This lined up with things I’ve read from others over the past few months: It produced a scaffold of mostly working code. If I decide to mess around with it more, I’ll have been saved digging around for examples of the basic syntax for the assorted parts of the workflow that I know from experience are correct.
“What’s a good strategy for switching between regular and ortholinear keyboards?” ๐
Useless. It provided very reasonable instructions for how to learn how to use an ortholinear keyboard, but didn’t address the actual request.
“Tell me how to write a ruby script to download train times for a given stop in Portland oregon” ๐
It produced working code that showed me the next arrival times for the next train at my neighborhood stop. It provided the correct link to sign up with TriMet to get an API key.
“A-” because it didn’t translate epoch time to human time in the output. When I asked it to do that, it tried to comply and used the obvious syntax to convert the integer it got back from the API, but got confused by the millisecond format.
I tried the code and replied with:
“That time format is still incorrect. I think the timestamps include miliseconds.”
It replied with:
“You’re correct, I apologize for the oversight. It appears that the TriMet API returns timestamps with milliseconds.”
Then it produced working code that did a simple operation on the timestamp and passed it along to Time
and strftime
with correctly formatted output.