Wednesday, May 19, 2010

A simple Compilable Programming Language using Python and Assembler

I don't really blog much, but I thought I'd pass on what I did today and last night to try and under stand how languages like c/c++ and even some higher level languages like python and php work. To reproduce what I have done here, you're going to either need the following (or the equivalent and some smarts)
  1. Ubuntu 9.10 (or some other flavor)
  2. Python
  3. ac and ld (from the binutils packages)
  4. a text editor, and some patience
First I wanted to compile a helloworld program in assembler, just to insure that I had a capable system to compile, link and execute it.

Enter the following code into your text editor (preferably vi or nano). I found this sample program from Linux Experiences. You can check out his blog, it is was what this entire blog and project was based on.

Start your text editor, and copy everything between the cut markers below into it, and save it as helloworld.s.

filename: helloworld.s
-=[copy everything below this line]=-
.text
.global _start
_start:
movl $len,%edx # third argument: message length
movl $msg,%ecx # second argument: pointer to message to write
movl $1,%ebx # first argument: file handle (stdout)
movl $4,%eax # system call number (sys_write)
int $0x80 # call kernel

# and exit
movl $0,%ebx # first argument: exit code
movl $1,%eax # system call number (sys_exit)
int $0x80 # call kernel

.data # section declaration
msg:
.ascii "Hello, world!\n" # our dear string
len = . - msg # length of our dear string

-=[copy everything above this line]=-

Once you have the above saved into a file, you are going to want to compile it. To do this, just type the following into your shell...

$ as helloworld.s -o helloworld.o

Hopefully everything went well, and you now have an object file, which you now need to link and convert into an executable file. To do this, type the following into your shell...

$ ld helloworld.o -o helloworld

You should now be able to run your first program! Go ahead, and run it...

$ ./helloworld
Hello, world!
$

That's it! I was happy I could compile and run it, but I wanted more...

What I figured I could do, is create a simple python program that would parse a simple text file with minimal instructions or commands, and then generate assembler code that I could then compile, creating my own mock "programming language" that could then be compiled into machine code.

So, I created this python program... copy and paste it into 'newlang.py'. I actually hard coded in the program file it will read, as I was too lazy to actually figure out how to pass in a program. Ideally, one would rewrite this into c and create a proper parser for your language using Bison, etc.

filename: newlang.py
-=[copy everything below this line]=-
import fileinput

staticdata = []

filename = "prog1.lc"

# open program file
f = open(filename, 'r')

# loop through program file
for line in f:
# skip comments
if (line.startswith('#')):
pass

# handle echo
if (line.startswith('echo ')):
staticdata.append(line[6:-2])

f.close()

# open our new asm output file
o = open(filename+".as", 'w')

# add text block
o.write(".text\n")
o.write(" .global _start # tell it where our main function is\n")
o.write("\n")
o.write("_start: # this is the main function\n")

counter = 0
for vardata in staticdata:
o.write(" # echo static data to stdout\n")
o.write(" movl $len"+str(counter)+",%edx # move var length into register\n")
o.write(" movl $dat"+str(counter)+",%ecx # move pointer to data in register\n")
o.write(" movl $1,%ebx # move file handle to stdout into register \n")
o.write(" movl $4,%eax # instruct kernel to use sys_write\n")
o.write(" int $0x80 # call the kernel\n")
o.write("\n")
counter = counter + 1

o.write("# exit\n")
o.write(" movl $0,%ebx #\n")
o.write(" movl $1,%eax #\n")
o.write(" int $0x80; # call the kernel\n")
o.write("\n")

counter = 0
o.write(".data # start the data block\n")
for vardata in staticdata:
o.write(" dat"+str(counter)+":\n")
o.write(" .ascii \""+vardata+"\\n\" # a string \n")
o.write(" len"+str(counter)+" = . - dat"+str(counter)+" # string length\n")
o.write("\n")
counter = counter + 1

o.close()
-=[copy everything above this line]=-

Well, now that we have a very simple (and fragile) program to read in programs in our new language, we just need a program to read in to compile! I started, and ended with the following program, which I used the file extension ".lc" for no reason at all except to differentiate from all of the other files in the directory.

The language is simple, but very picky as I did not take the time to insure proper program structure, etc. I do not handle inline remarks, nor do I handle the fact that one could forget to close quotes, or leave them out completely.

So here it is...

filename: prog1.lc
-=[copy everything below this line]=-
# this is a comment
echo "Hello Hacker News!"
echo "Hello Reddit!"
echo "Look ma! No hands!"
-=[copy everything above this line]=-

You'll notice, that I was careful to close all quotes, and insure there was no trailing spaces!!

Ok, so I wanted to automate this a little more, so it seemed streamlined... so I created this shell file...

filename: compile.sh
-=[copy everything below this line]=-
#!/bin/sh
rm prog1
rm prog1.lc.o
rm prog1.lc.as
python newlang.py
as prog1.lc.as -o prog1.lc.o
ld prog1.lc.o -o prog1
-=[copy everything above this line]=-

One more thing to do...
$ chmod +x compile.sh

Now, what we have been wating for... compile your program!!!
$ ./compile.sh

If everything went well, you will have an assembler version of your program, an object file, and an executable... run it!

$ ./prog1

I am sure with some time, love and patience, one could make this into something cool!

1 comment:

Jim said...

Consider using a Makefile to 'streamline compilation'

Into a plain text file called Makefile use something like the following:

-- cut --
EXE=prog1
all: $(EXE)

prog1: $(EXE).lc
python newlang.py
as $(EXE).lc.as -o $(EXE).lc.o
ld $(EXE).lc.o -o $(EXE)

clean:
rm -f $(EXE) $(EXE).lc.o $(EXE).lc.as
-- cut --

Then, to compile just issue the command
$ make
which will only do a recompilation if the prog1.lc file has changed.

To remove the intermediate files, do
$ make clean