British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology. 

Compbio 026: Using the terminal at Warp speed

I have always just used a regular terminal, such as the MacOS Terminal app, or PuTTY on Windows when accessing a server. They have served me well. But apparently there is a world of more advanced terminals out there. I have stayed away from them, preferring something very simple, but a friend (Christian Pflüger - @BlauerPlums on Twitter) recently told me about the Warp terminal for MacOS (sorry Windows users) and I had to give it a try!

The thing that really sold me on the Warp terminal was this - AI generated suggested commands from requests written in natural language. When programming I still think in terms of “I want to filter this file so I can count only the exons on chromosome 1”, for example. Then I need to translate that into code. Whether than is Python, R or even a simple AWK command.

At this point I am pretty fluent in doing such simple tasks in Python, but Python is a verbose language. Many lines of code are usually required for such a step. In R and AWK, you can do such things with one or few lines. But I am personally less fluent in these - perhaps related to my poor brain function and the density of information stored in each command in R and AWK. Sometimes I will end up Googling the same simple command line usage over and over again because I keep forgetting them (my partner mocks me for not having the command to unzip at tar.gz file memorised but at this point I doubt that I ever will).

Warp uses GPT-3 to generate AI commands that it suggests after a natural language prompt. GTP-3 is from the same team as the now famous ChatGPT you will likely have seen around the internet, which can generate stories or summaries or conversations from a user inputted prompt. ChatGPT is based on GPT-3 and so let’s test it out with some simple commands. To get Warp to let you request an AI suggested input from natural language prompt, simple start off with a Hash symbol (#), also known as an octothorpe):

#list all the file prompt lead to the suggested ls -a command.

ls -a was entered and files (including hidden files) was listed, as expected.

#tar g zip the example_file. txt prompt lead to the suggested command “tar -zevf example_file.txt.tar.gz example_file.txt”

The above tar -zxcf command was entered and then ls -a was run to see that a compressed version of the file was created and it was.

I used rm to delete the unzipped example file so that after the next command, we could see that an uncompressed version had been created.

The above tar -xvzf command was entered and then ls -a was run to see that an uncompressed version of the file was created and it was.

I ran the tar -xvzf command and ls -a to see that the uncompressed version of the file had been created and it had.

Not only did I not need to leave the terminal to look up how to uncompress this tar.gz file but with Warp, I can enter multiple lines of command in at once and each will be executed in order. So I added ls -a to immediately tell me if the uncompressed file had been produced.

Now let’s see if it can do some of the complex filtering I want it to do without having to look up how to do it in my blog post on how to use AWK (link here).

#count all lines in a file where the third column equals exon prompt lead to the suggestion of awk 'if(§3=="exon" print SO? ' file.txt | wc -I

I asked for a generic command that would count all files where the third column equals an exon. It gave me this AWK command. Not formatted the way that I would write it but looks like it would work.

I ran the command awk '{if($3=="exon" ){print $0}}' Patens_318_v3.3.gene_exons.gtf | wc -1 and it gave me an answer of 589994

But notice how I asked for a generic command and thus had to replace the file.txt with the actual file. When I asked for the same command but with this specific GTF file, this happens:

#count all lines in patens_318_v3.3.gene_exons.gtf where the third column eguals exon prompt lead to the grep -c "exon" Patens_318_V3.3.gene_exons.gtf suggested command.

Here it does not suggest an AWK command but a simple grep (with -c to output a count rather than feeding it into wc -l, a simple trick I had not known). Perhaps this is because it knows that this file does not contain “exon” outside of column 3 and so this grep command would work rather than the more specific AWK command that it gave with a more generic request? If so, very exciting.

grep -c "exon" Ppatens_318_v3.3.gene_exons.gtf command gave the same answer for number of exon lines as before: 589994

I tried it and got the same results as the above AWK command. This seems great. So far this seems like a great assistant for helping someone work at the command line without needing to leave to constantly Google something new.

#Count all lines in Patens_318_v3.3.gene_exons.gtf that contain exon in column 3 and Chr02 in column 1 prompt generated a new command that is so long that it is clipped off (see image description below for command).

The output of head -20 and the command grep -c "exon" Patens_318_v3.3.gene_exons.gtf | grep "Chr02" from the AI suggestion above.

In this case I now ask it for something more complex. I want to search for two things in an AND statement. The suggested command is to run a grep and pipe it into another. Well that makes sense. EXCEPT the first of the two grep commands has -c. This means its output will be a line count. So when that enters the second grep, it will break it. But simply removing that -c tag from the first and adding it to the second, it will fix this AI generated command. Phew, computational biologists are not out of a job...yet! Let’s try the AI suggested vs my simple fix:

grep -c "exon" Ppatens_318_V3.3.gene_exons.gtf | grep "Chr02" gave an error, as expected, meanwhile grep "exon" patens_318 _v3.3.gene_exons.gtf | grep -c "Chr02” gave and answer of 34480

Yep, it went exactly as I expected, adding the -c flag to the first grep broke it but moving it to the second fixed it. As someone with experience with grep and the command line, I was able to spot this error before running it. But what if someone completely new was trying to use the Warp terminal to make their life easier? Would this be a nice teachable moment, or a frustrating moment of annoyance and confusion? I do not know.

My person opinion so far is that AI generated commands from Warp (via GPT-3) seem useful but are not (yet) a replacement for a basic level of knowledge and experience. I am personally excited to see how this could improve the speed at which I work and I am very excited to see how it and similar things will advance in the future. But for now it must come with a warning to anyone new to compbio as while it may aid you at times, it could also lead to a lot of frustration, something all too familiar to people learning coding for the first time anyway, but in this case, the frustration will not be at you but the tool. And to me, part of the learning experience is to remember “I really hate this darn machine; I wish that they would sell it. It won't do what I want it to, but only what I tell it” (I know it from my wife, original source is unknown). But in this case, it is the AI telling the machine what to do. Of course the AI is only acting as a suggestion tool, but I think this distinction is a subtle one.

CompBio 027: Is AI generated code here to replace computational biologists?

Compbio 025: Git for digital noobs (and why version control matters)