End-user Programming of Flappy Bird with ChatGPT: A Reality Check

Is it possible to program non-trivial applications and customize code without knowing much about programming?
Impressive showcases of ChatGPT suggest a positive answer.
Some people have even claimed that programmers will be replaced or disappear.
So, can end-users create working and non-trivial applications, like games, just using instructions in natural language with ChatGPT?

We have considered the game Flappy bird, using Python.
We have employed various methods and strategies to interact with ChatGPT versions 3.5 and 4, using different prompts, and trying to play the role of an "end-user"

Method, prompts, code, observations, and results

Specifically, the five (prompting) methods we tried are as follows:

A short prompt. The prompt is given as a simple sentence that provides the overall (general) goal of the task. No specific information nor detailed functionalities are given. At lot have to be inferred by the Large Language Model (LLM) at generation time. For instance: > Write a Flappy Bird game in Python.
Providing a list of characteristics (this time written by a human). The user will come with a list of functions and/or characteristics that describe the game. The list is given all at once to ChatGPT and is used as the prompt.
A short description of the main features. This can be seen as a mix of previous methods, as the prompt is written in natural written language and not as a list of functionalities or characteristics. Yet, the text gives more information than the prompt proposed in the second method. For instance: > Write the code for a Flappy Bird game in python. I have a folder "assets" with "background.png", "pipe.png" and "bird.png". I would like to know my score and keep track of my highcore in a separate file. I would like to have a start screen when I first open the game, where I can see my highscore, and a game over screen where I can see my score, highscore and play again.
Giving ChatGPT an example of finished code doing what we want (in our case a complete Flappy Bird game written in Python), and asking it to return a prompt. The prompt returned was never the same (probably because of the temperature parameter), but was usually in the form of a list of characteristics.
A list of prompts (without having to look at the code in between the requests). This last method would mimic the behavior of an end-user that carefully thought about what they need. They do not really interact with the LLM and only give the instruction one after another without caring for the provided output. Only the final output is used and potentially asked to be improved, anything before that is only the user trying to build little by little what would be its codebase.

We obtained different codes and games, with varying qualities (let say).

Observation and synthesis

Do claims generalize? (Use the prompt and you'll get a game)

No, we did not find a method (or a magic prompt) that systematically works and would generate a feature-rich game.
Though we have been able to generate interesting and playable games without technical intervention, there are also several cases and sessions that lead to a dead-end situation, far from the ideal scenario usually promised.

First, it happens that we reach a dead "end-state" with a non-working or unusable game (mainly due to inability to fix an issue), see for instance:

2nd method, 2nd session (it is possible to continue, but just requiring skills to debug code "("Pipes only came 1 by 1 and not by pair. Trying to implement that only led to more problems that ChatGPT couldn't solve.")")
2nd method, 4th session (10 interactions, Too many problems (no top pipes, always at the same height, wrong collision detection...))

Second, despite the same original prompt (and the same "method"), we can have very different generated code.
This difference is in terms of:

issues in the code, requiring more or less fixing and debugging effort
features supported: ChatGPT takes the liberty to implement (or not) some functionalities, and the resulting game might be very different... It forces again to interact with ChatGPT, specifically with regard to what has been generated.

The consequence is that several specific interactions are needed, some leading to the worst situation (no working game), far from the ideal scenario.

Is it possible to program without expertise?

Yes, but again, it is not systematic.

Direct interventions in the code are sometimes needed (to fix issues!).
For example, we had to [...] change the values of the gravity, bird_movemement and pipe_gap to make it easier and more controllable.
ChatGPT does not seem to find how to fix the code by itself (when we just point out the problems).
Besides, it is more challenging to interact as end-user when the game was from the start in a very bad shape, and basically unplayable or missing critical features. In this case, it is not possible to get a visual observation that would help to formulate a proper feedback.

The style of ChatGPT is sometimes to decompose the problem and put placeholders in the code... but without the implementation!
From a developer perspective, it is an interesting style that forces to consider a step-by-step implementation.
However, from a end-user perspective, the game is in an incomplete shape.

There are sometimes ChatGPT explanations related to a "feature" that does not exist, leading to time-consuming effort.
For example, Implement a feature that has no meaning "There was a 'FLAP' event that made the bird jump every 200ms. That's not part of the game!"

Some examples of issues that needed to be fixed:

clock (pygame specifics?) => bird crashes immediately
gap placement between the pipes
collision detection
breaking of older functionality
missing a bit of diversity

Both fixes are time-consuming. It is possible to help, thanks to end-user, visual observations about the game.
Sometimes, there is the need to orient discussions and interactions towards code.

How are games different? What are the features?

In our git repository we share sessions and code.
Our observation is that you can get well a feature-rich Flappy bird, but also a game with closed pipes that make the progression of the game limited.

See the Youtube video for an excerpt of games you can get

Conclusion and perspectives

Formulating a prompt and systematically getting a comprehensive and playable game is not yet a reality.
ChatGPT can provide impressive results, but not all times. A magic prompt is missing to make ChatGPT reliable.
Many interactions are rather needed to fix issues or expand the features, sometimes in non-technical terms and with instructions out of observations of the current game. But the control of the code is not that far and seems inevitable.

Throughout our experiments, we notice several positive aspects of using ChatGPT:

inspiration, funny variants
discovering new features
good starting point for developers (or end-users)
sometimes it can work and end-users can create interesting games

There are several interesting directions to consider:

change the targeted programming language and/or the framework: instead of Python and pygame, it's possible to use JavaScript and p5 for instance... A hypothesis is that the targeted technological space can help ChatGPT filling the gap between the prompts and the intentions.
find a super prompt or a language (who says a domain-specific language?) that leads to more determinism and control of ChatGPT
improve the usability and the integration of ChatGPT outputs into development environment. The back and forth between the IDE and ChatGPT is time-consuming. Moreover, some (informal) instructions of ChatGPT can be automatically applied onto the code base, limiting the user effort or the technical expertise required. The hope is to have a better feedback-loop!

If you are interested, we are sharing 35 sessions, with prompts, code, observations, and results (videos) with ChatGPT-3.5 and ChatGPT-4 in this repository: https://github.com/diverse-project/enduserprogrammingLLM/

And we have open positions (for internships, engineers, PhDs, post-docs, etc.) about the use of ChatGPT in software engineering at DiverSE!

Blog