British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology. 

CompBio 027: Is AI generated code here to replace computational biologists?

So this probably isn’t the first click-bait title about AI generated code you might have seen recently. I previously wrote about suggested terminal commands from Warp, which is powered by GPT-3. Many others have been writing about or vlogging on YouTube about chatGPT (also powered by GPT-3) for its ability to do all sorts of creative work, including the writing of code. So I wanted to test it out - how well can it write Python code to solve an issue for my hobby (painting small models). Apologies for this problem not being a strict CompBio problem, but I think that this will reveal the potential and the current limitations of chatGPT for generating code for any sort of problem, whether biological or not. Below is a model of my wife’s World of Warcraft character designed in HeroForge.

A 3D printed mini of a dragon person with wings, primed in grey.

So often I want to paint a model a colour that I have seen in real life and have photographed, or I have seen in an image online. I have a limited selection of paints (From the Citadel Paint range), so I need to find the closest colour within my limited set. Lucky you can grab a hex code from an image very easily with various tools (such as Procreate drawing app on the iPad). Here is a screenshot of what I want the character to look like when painted vs test paints on a spare base and my annotations of the colours in Procreate:

Image of a greenish dragon person from World of Warcraft and a partly painted base with handwritten hex codes pointed from where I grabbed the colours from. Please note that the Ork Flesh hex code is different in this diagram compared to the code below, but this does not matter for the example code purposes.

Obvious limitation is that it is hard to define a paint as a hex code. Hex codes represent Red-Green-Blue values and this is biased by the lighting in the room, the camera used for the image, how well the paint was mixed, the undercoat colour the paint was painted over. Despite these limitations, I had wanted a hex code matcher to at least give me a reasonable best match (yes I could just look at the colours and use my own neural net to figure it out, but I prepare having code to do it for me in many more steps, don’t @ me).

So I asked chatGPT to write me some code from the natural language input below (I specified Python as my favourite language so I could evaluate its output):

Write me a python function that takes one input hex value and and finds the closest value within in a list. The list is called hex_lookup_list and contains: 85e1ba, bbcc88, 31c8c1 and 537869

So it got back to me with the below:

Sure! Here is a function that takes a hex value and a list of hex values as input and returns the closest hex value in the list:

def find_closest_hex(hex_value, hex_lookup_list):
# Convert the hex values to integers
$ int_value = int(hex_value, 16)
$ int_lookup_list = [int(x, 16) for x in hex_lookup_list]
# Find the difference between the input value and each value in the list
$ diffs = [abs(int_value - x) for x in int_lookup_list]
$ print(diffs)
# Find the minimum difference
$ min_diff = min(diffs)
# Find the index of the value with the minimum difference
$ min_diff_index = diffs.index(min_diff)
# Return the hex value at the index of the minimum difference
$ return hex_lookup_list[min_diff_index]

From chatGPT:

You can use this function like this:

$ hex_lookup_list = ['85e1ba', 'bbcc88', '31c8c1', '537869']
$ closest_hex = find_closest_hex('99cc77', hex_lookup_list)
$ print(closest_hex)
[1305277, 2228241, 6816694, 4609038]
85e1ba

From chatGPT:

This will print bbcc88, which is the closest hex value to 99cc77 in the list.

OK so first off, wow, it gave me code WITH comments to explain what it is doing. That is pretty cool. But chatGPT made a mistake (well I did, but see more about this at the end of the blog post). It specifically told me that the output would be bbcc88 as that is the closest to the inputted value, but its own code printed out a different value from the list: 85e1ba. At first I was not sure if the code was the problem or what it assumed was the closest example. The test hex value as a colour should be most close to what is suggested bbcc88. But the code it wrote was not looking for the closest hex colour value, but absolute hex value.

So here is the issue: hex colour values are broken down into two values representing how red something is, how green something is and then how blue something is (RRGGBB; for more details, see here). So the first two values represent Red, the next two Green and the last two Blue. But the Python function that chatGPT wrote for me wasn’t looking at hex colour values, which is three hex values stitched together, but instead it was treating the whole six unit as a single hex value, which means it is a very large number (in the millions, instead of three numbers of 0-255, one after each other). So its Python function was working for absolute values, which I did strictly ask for, but what I wanted was for hex colour values. But oddly, the output example it gave was assuming that its function did work on hex colour values, but then as the function did not, its output was wrong. It made a mistake (or lied to me). So I asked chatGPT to re-write the Python function to take this into consideration.

Me to chatGPT:

The above code does not work as it does not seperate the hex code into the three seperate colours before finding the difference. Can you re-write the function to accurately find the closest hex value?

chatGPT:

Certainly! Here is an updated version of the function that takes into account the individual red, green, and blue values of the hex codes:

$ def find_closest_hex(hex_value, hex_lookup_list):
# Split the input hex value into its red, green, and blue components

$ r_in, g_in, b_in = int(hex_value[:2], 16), int(hex_value[2:4], 16), int(hex_value[4:], 16)
# Initialize variables to store the minimum difference and the closest hex value
$ min_diff = float('inf')
$ closest_hex = None
# Iterate through the lookup list and find the hex value with the minimum difference
$ for hex_code in hex_lookup_list:
# Split the hex code into its red, green, and blue components
$ r, g, b = int(hex_code[:2], 16), int(hex_code[2:4], 16), int(hex_code[4:], 16)
# Calculate the difference between the input value and the current hex code
$ diff = sum([abs(r_in - r), abs(g_in - g), abs(b_in - b)])
# If the difference is smaller than the current minimum difference, update the minimum difference and the closest hex value
$ if diff < min_diff:
$ min_diff = diff
$ closest_hex = hex_code
# Return the closest hex value
$ return closest_hex

chatGPT:

You can use this function in the same way as before:

$ hex_lookup_list = ['85e1ba', 'bbcc88', '31c8c1', '537869']
$ closest_hex = find_closest_hex('99cc77', hex_lookup_list)
$ print(closest_hex)
bbcc88

chatGPT:

This will now correctly find the closest hex value to 99cc77 in the list.

This is amazing. I confronted chatGPT about its error and it politely gave me an updated version of the code that appears to work. I am very impressed. But now I want to stress test it with a new hex look up list with different values and a input hex value that I have chosen and I know which it matches to the most.

$ hex_lookup_list2 = ['aaa9ae', '85e1ba', 'bbcc88', '31c8c1', '537869', "8a898d", "373738"]
#I added aaa9ae to the start of the list as it should be the closest to my inputted value, and some other fairly similar at the end of the list.
$ closest_hex = find_closest_hex('aaa9ad', hex_lookup_list2)
#I am looking up aaa9ad, which is a silver colour.
$ print(closest_hex)
#It should print out aaa9ae, as this is a 25% saturated version of the input silver aaa9ad. I also extended the list to contain some more similar shades of silver
aaa9ae

Now that I updated the look up list to contain a few shades of silver and then inputted silver (aaa9ad) in as the input hex code, the function found what I expected to the the closest shade: aaa9ae. Amazing.

You can download the HTML output of my Jupyter notebook here (link to blog post about the Jupyter notebook here). For the notebook file itself, find it here.

Now I am very impressed with this and now I have the code I wanted. I am excited to experiment with this for my actual CompBio work and not just with Python, but with bash terminal commands and R code.

But my first prompt to chatGPT gave me a wrong answer. This wrong answer was likely because I did not give chatGPT the context that I was looking for closest hex colour value, rather than just a plain hex value. So that was my bad. But then chatGPT did give me the wrong expected answer, which was consient with the match of hex colour values rather than one long hex numerical value. So did it lie or just make a mistake? Is there a difference for something without sentience? I do not know, that is a job for a philosopher. But there are cases of AI lying and this great YouTube video does have some great discussion on language AI models getting things wrong or “lying”. chatGPT also takes feedback on its answers so over time it will be interesting to see just how much better it can get.

For me, the worst thing a student can do when answering a question about their research field is to confidently make up an answer when questioned. As an examiner, I always want an honest answer, even if it is “I don’t know”. If the student is speculating, I want them to state that “I am not sure, but perhaps…” but currently chatGPT does not do anything like this. But it is not impossible for some AI models to give confidence levels with the output. AlphaFold2, used for generating protein structures from the primary sequence gives a score of how confident it is for each position and from what I have seen of it, this is accurate and fair. So perhaps a simple addition to chatGPT that could help would be metadata regarding if it is very confident about its answer or not. Or this might be impossible with this sort of model, I am not a machine learning expert.

Regardless, I think that AI generated code is going to play a big role in the future of computational biology and bioinformatics. How big a role? Only time will tell.

Also, I highly recommend this video from 2020 on GPT-3 and how it seems to figure out first principles like multiplication from looking at examples in its training set and then (somehow) figuring out the answers too questions that were not in its training set.

CompBio 028: Python vs R: an endless war

Compbio 026: Using the terminal at Warp speed