MiniBasic
How to write a BASIC Interpreter By Malcolm Mclean
© Copyright all rights reserved (except for permission to...

Author:
Malcolm McLean

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

MiniBasic

How to write a BASIC Interpreter By Malcolm Mclean

© Copyright all rights reserved (except for permission to use source code as described in text)

Introduction MiniBasic is designed as a simple programming language, based on BASIC. If you already know BASIC then you are well on your way to learning MiniBasic, if you don’t then MiniBasic is one of the simplest programming languages to learn. MiniBasic programs are written in ASCII script. They are then interpreted by the computer. This is in contrast to most “serious” languages, which are compiled, that is, translated into machine instructions and then run. Interpreted languages are slower than compiled languages, but they have several advantages. One major one is that they are portable – a MiniBasic script will run on any computer that has a MiniBasic interpreter installed. Another advantage, especially for beginners, is that errors are much easier to identify. Finally, MiniBasic is not really intended as a standalone program, except for teaching purposes. It is meant for incorporation into other products, where the user is expected to provide functions in a general-purpose programming language. An example might be a desk calculator which can be extended to provide user-defined functions like the Fibonnaci series, or an adventure game for which the user can design his own levels. For technical reasons, this is much easier to implement as an interpreted rather than a compiled language. One design goal of MiniBasic was that it should be easy to learn. Millions of people already know some BASIC from school or through having a microcomputer in the 1980s. The second design goal was that it should be easy to implement. The interpreter is written in portable ANSI C, and is freely available. It is in a single, reasonable-length source, and is available for incorporation into user programs. The final goal is that the interpreter should be what is technically known as “Turing equivalent”. This means that it is possible to implement any algorithm in MiniBasic. This required one major extension to common Basic – the ability to redimension arrays. It is impossible to implement graphics commands in portable ANSI C, so sound, graphics, and mice are not supported in MiniBasic. Interaction with the user in the standalone model is via the console. However, where MiniBasic is incorporated into another program, generally there will not be direct interaction with the user. The caller will create temporary files for input and output.

2

The first program You are now ready to write your first program in MiniBasic. Traditionally, this is “Hello World”. Firstly you need to install the MiniBasic interpreter. On a PC this is done by copying the executable MiniBasic.exe to your hard drive. Then you open a text editor and type 10 PRINT “Hello World”

Remember to terminate with a newline. Save as “Hello.mb” (the extension is optional). You then call the interpreter by typing “MiniBasic Hello.mb” in a command prompt. You should see the output Hello World All MiniBasic programs have line numbers. Execution begins with the first line and ends with the last line. Lines must be in order and every statement must have a number. The number must be the first character in the line. However, we can spread long strings (sequences of characters) over several lines. This second program 10 PRINT “In the beginning God created the heavens and the Earth” “and the Earth was without form and void “

Will output a string too long to easily fit on one line. Note that the second line must begin with a space character, to indicate that it is a continuation of the first line.

3

The second program There is very little point in a program that outputs something but has no input. So for our second program we will use the command INPUT. 10 20 30 40 50

PRINT INPUT PRINT INPUT PRINT

“Input first number “ x “Input second number” y “X + Y is”, x + y

INPUT will get two numbers that you type in the command prompt. It ignores any non-numeric characters, and translates the first number that you see. The comma separates items to print, and also tells the computer to insert a space. It is also possible to input strings of characters. To do this, we use what is called a “string variable”. A string variable always ends with the dollar character ($), and contains text rather than numbers. 10 PRINT “What is your name?” 20 INPUT n$ 30 PRINT “Hello”, n$

When inputting a string, INPUT reads up to the newline (which it discards). We can use the ‘+’ operator, but not any others, on string variables. 10 20 30 40 50

PRINT INPUT PRINT INPUT PRINT

“What is your first name?” fname$ “What is your second name?” sname$ “Hello”, fname$ + sname$

Notice that this program has a bug. The ‘+’ operator doesn’t insert a space, so unless you inadvertently added a space the program prints “FREDBLOGGS”. Try modifying the program by inserting a space between the two names.

4

The third program Now we need to get to the core of MiniBasic, the “LET” statement. MiniBasic will evaluate arbitrarily complicated arithmetical expressions. The operators allowed are the familiar ‘+’ ‘-‘ ‘*’ and ‘/’, and also MOD (modulus). Use parentheses ‘(‘ ‘)’ to disambiguate the order of evaluation. 10 20 30 40

PRINT “Enter temperature in Fahrenheit” INPUT f LET c = (f – 32) / 1.8 PRINT “=”, c, “Celsius”

As well as these, there are a large number of mathematical functions built into MiniBasic, for example POW(x,y), which does exponentiation, SQRT(x) (square root), SIN(x), COS(x) and TAN(x), sine, cosine and tangent. All the trigonometric functions take or return radians. The logarithm function, LN(x), takes a natural logarithm. There are also two mathematical constants, PI and e (Euler’s number). Be careful not to use these as variable names. To convert radians to degrees, divide by 2 * PI and multiply by 360. To convert a natural, base e log to log10, divide by LN(10). LET also works on string variables. String variables always end with the character ‘$’, as do string functions. 10 20 30 40

PRINT “What is your name?” INPUT name$ LET name$ = “SIR” + “ “ + name$ PRINT “Arise,”, name$

Note that expressions such as LET x = x + 1

are legal and are in fact very useful. Variable names must be shorter than 31 characters, and mustn’t duplicate any MiniBasic keywords.

5

The fourth program Programs often need to make branching decisions. In MiniBasic this is provided by the IF ... THEN statement. 10 20 30 40 50 60

REM Square root program. PRINT “Enter a number” INPUT x REM Square root of negative is not allowed IF x < 0 THEN 20 PRINT “Square root of”, x, “is”, SQRT(x)

We have also introduced the REM statement. This simply adds comments to make the program easier for a human to understand. REM statements are ignored by the interpreter. This program is actually not very effective. Really we should tell the user what is wrong. For this, we need the GOTO statement. A GOTO simply executes an unconditional jump. 10 20 30 40 50 60 70 80

REM Improved square root program PRINT “Enter a number” INPUT x REM Prompt user if negative IF x >= 0 THEN 80 PRINT “Number may not be negative” GOTO 30 PRINT “Square root of ”, x, “is”, SQRT(x)

An IF ... THEN statement always takes a line number as an argument. This can be of the form IF x < y THEN b, as long as b holds a valid line number. The operators recognised by the IF ... THEN statement are ‘=’, ‘<>’ (not equals), ‘>’, ‘<’, ‘>=’, ‘<=’. We can also use AND and OR to build up complex tests IF age >= 18 AND age < 65 THEN x

Use parentheses to disambiguate lengthy tests. IF age >= 18 AND (age < 65 OR job$ = “Caretaker”)

The IF ... THEN operators can also be applied to string variables. In this case the strings are ordered alphabetically.

6

The fifth program To enable MiniBasic to compute complicated functions we need access to arbitrary amounts of memory. For this we have the DIM statement. It creates a special type of variable known as an array. 10 REM Calendar program. 20 DIM months$(12) = "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" 30 PRINT "Day of birth?" 40 INPUT day 50 PRINT "Month?" 60 INPUT month 70 REM Make sure day and month are legal 80 IF day >= 1 AND day <= 31 AND month >= 1 AND month <= 12 THEN 110 90 PRINT "That's impossible" 100 GOTO 30 110 IF day <> INT(day) OR month <> INT(month) THEN 90 120 PRINT "Your birthday is", day, months$(month)

You might want to modify this program to contain another array, this time a numerical one, containing the lengths of the months. For the ambitious, you could also input the year, and check for February 29th. Arrays can have up to five dimensions. For instance you might want to hold a chessboard in a 2d array DIM board(8,8)

It is possible to redimension arrays. For a 2d or higher array this effectively scrambles the contents, but one dimensional arrays are preserved. For instance this program will enter an arbitrary number of values into an array 10 20 30 40 50 60 70 80 90

REM Median program. LET N = 0 DIM array(N+1) PRINT "Enter a number, q to quit" INPUT line$ IF line$ = "q" THEN 100 LET N = N + 1 LET array(N) = VAL(line$) GOTO 30

7

100 PRINT N, "numbers entered" 105 IF N = 0 THEN 1000 106 IF N = 1 THEN 210 110 REM Bubble sort the numbers 120 LET flag = 0 130 LET i = 1 140 IF array(i) <= array(i+1) THEN 190 150 LET flag = 1 160 LET temp = array(i) 170 LET array(i) = array(i+1) 180 LET array(i+1) = temp 190 LET i = i + 1 195 IF i < N THEN 140 200 IF flag = 1 THEN 120 210 REM print out the middle 220 IF N MOD 2 = 0 THEN 250 230 LET mid = array( (N + 1) / 2) 240 GOTO 270 250 LET mid = array(N/2) + array(N/2+1) 260 LET mid = mid/2 270 PRINT "Median", mid 1000 REM end

8

The sixth program It is possible to manipulate arrays of values just using IF ... THEN and GOTO, but it soon becomes very clumsy. For this reason MiniBasic includes FOR NEXT loops. Say we want to print out an array 10 REM Prints the days of the month 20 DIM months$(12) = “Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec” 30 FOR I = 1 TO 12 40 PRINT Month”, I, months$(I) 50 NEXT I 60 PRINT “Done”

FOR loops can be nested, 10 FOR I = 1 TO 8 20 FOR J = 1 TO 8 30 PRINT I, J 40 NEXT I 50 NEXT J

It is also possible to provide a STEP value other than one. FOR I = 100 TO 0 STEP –2

If you specify a null loop, such as FOR I = 10 TO 0, control will pass over the loop body and go to the next matching NEXT statement. The control variable must always be in the NEXT statement. FOR ... TO loops can take complex expressions, such as FOR I = x TO x * x, in these cases the initial value, the end value, and the step value are calculated once and then never modified. FOR ... NEXT loops can be nested up to 32 deep. If you attempt to jump out of a loop then you are likely to trigger errors

9

10 20 30 40 50 60

REM Bad use of a for loop LET X = 0 FOR I = 1 TO 10 INPUT Y IF Y < 0 THEN 10 NEXT I

However you may alter the counting variable within the loop. This can be used to force premature loop termination.

10

The seventh program We are now ready to put everything together. MiniBasic has facilities for input, output, mathematical and lexical calculation, flow control, and multi-dimensional arrays. String handling may be different to what you are used to in other versions of BASIC. In MiniBasic all functions that return a string end with the character ‘$’. These include CHR$(), STR$(), LEFT$(), RIGHT$(), MID$() and STRING$(). There is no limit to string length other than the computer’s memory. Some functions take a string argument, but return a numerical variable. These include LEN(), the length of the string, and ASCII() – the ASCII code of the first character of the string, also VAL() – the numerical value of string of digits. Internally, the NUL character, ASCII value 0, is used to terminate strings. The empty string “” consists of a single NUL. Here is a program which inputs a full name, checks for validity, and stores it in an array of variables. 10 20 30 40 50

REM String-handling program REM Inputs a name, tests for validity REM and breaks up into parts. PRINT "Enter your full name" INPUT name$

60 REM First check for non-English characters 70 LET flag = 0 80 FOR I = 1 TO LEN(name$) 90 LET ch$ = MID$(name$, I,1) 100 IF (ch$ >= "A" AND ch$ <= "z") OR ch$ = " " THEN 140 110 LET flag = 1 120 REM This forces the loop to stop 130 LET I = LEN(name$) 140 NEXT I 150 IF flag = 0 THEN 180 160 PRINT "Non-English letter,", ch$ 170 GOTO 40 180 190 200 210 220

REM Jump to subroutine LET return = 210 GOTO 1000 IF name$ = "" THEN 280 LET return = 240

11

230 240 250 260 270

GOTO 2000 LET N = N + 1 DIM out$(N) LET out$(N) = word$ GOTO 180

280 285 290 300 310 320 330

REM Print out the name PRINT "Name accepted" FOR I = 1 TO N PRINT out$(I) + " "; NEXT I PRINT "" GOTO 3000

1000 1010 1020 1030

REM strips the leading space IF LEFT$(name$, 1) <> " " THEN return LET name$ = MID$(name$, 2, -1) GOTO 1010

2000 2010 2020 2030 2040 2050 2060

REM get the leading word and put it in word$ LET word$ = "" LET ch$ = LEFT$(name$, 1) IF ch$ < "A" OR ch$ > "z" THEN return LET word$ = word$ + ch$ LET name$ = MID$(name$, 2, -1) GOTO 2020

3000 REM END

12

Keywords by type Arithmetical operators + - / * () ! MOD

Mathematical constants PI, e

Mathematical functions SIN, COS, TAN, ASIN, ACOS, ATAN, LN, POW, SQRT, INT RND

String functions that return a numerical value LEN, VAL, ASCII, INSTR, VALLEN

Statements PRINT, LET, DIM, IF, GOTO, INPUT, REM, FOR, NEXT

Auxiliary keywords THEN, AND, OR, TO, STEP

Functions that return a string CHR$, STR$, LEFT$, RIGHT$, MID$, STRING$

13

Keywords alphabetically e ACOS AND ASCII ASIN ATAN CHR$ COS DIM FOR GOTO IF INPUT INSTR INT LEFT$ LEN LET LN MID$ MOD NEXT OR PI POW PRINT REM RIGHT$ RND SIN SQRT STEP STR$ STRING$ TAN THEN TO VAL VALLEN

14

1) Expressions. All expressions are evaluated using floating-point arithmetic. The + and – operators have lower precedence than *, / and MOD (modulus), which have equal precedence and are evaluated left to right. ! (factorial) has the highest precedence. There is no exponentiation operator (use the POW() function instead). There are two mathematical constants, e, Euler’s number, 2.71281828, and PI, 3.14159265. It is possible to use variables or dimensioned variables in expressions. Typical expressions are 10

- absolute value 10

x (x + y) * 2 array(1,2)

- value of variable x - add x to y and multiply by two - value of array element 1, 2

POW( x + y, 2) + array(1, LEN(A$)) - raise x + y to the power 2 and

add an element of the array “array” given by 1 and the length of A$. MOD calculates the floating point modulus of a number. Both sides of the expression should be of the same sign. x MOD 0 is an error. Division by zero is also an error. Floating point arithmetic is not exact, so expressions such as SQRT(2.0) * SQRT(2.0) may not be exactly whole numbers. Using the function INT(x + 0.5) you can force an expression to be the nearest exact integer. Arrays are stored with the x dimension in the first column. This matters when initialising an 2d array with a list of values. 10 DIM a(4,4) = 1, 2, 3, 4, 5, 6, 7, 8 9, 10, 11, 12, 13, 14, 15, 16

15

will create an ascending list in a(x,y) order. Array indices start from 1 and finish at the dimension size. Thus a(1,1) is the first element of the array, and a(4.4) the highest. Note that there must be no space between an array name and the first parenthesis.

16

2) String expressions. All strings are stored internally in ASCII format, as NUL-terminated arrays. Use of extremely large strings is likely to slow down the program, since most operations involve internal copying of strings. A string literal consists of one or more concatenated quotes. A string can be spread over several lines, but the newline character is not allowed inside quotes. To enclose a quotation mark in astring, use double quotes. 10 LET A$ = “And God said “”Let there be light”” “ “and there was light.” “And God saw the light, that it was good.”

is an example of a legal string. Note that the start of the second line contains white space at the beginning to tell the interpreter it is a continuation of the previous line. To add a newline or other control character, use the CHR$() function. Note that CHR$(0) will prematurely terminate the string. Use the ASCII() function to perform numerical manipulation on characters. e.g LET B$ = CHR$( ASCII(B$) + 1) will set B$ to the next letter of the alphabet. The ‘+’ operator will concatenate strings. 10 PRINT “Fred” + “Bloggs” + CHR$(42) + x$

Will print FredBloggs* followed by the contents of x$ Functions with names ending in ‘$’ always return strings. Parentheses are not optional.

17

3) Relational expressions. Relational expressions are used only in IF ... THEN statements to make conditional jumps. A relational expression evaluates to either true or false. The allowed operators a =, <> (doesn’t equal), >, >=, <, <=. With expressions the comparison is numerical, and with strings it is alphabetical. Both sides of a relational operator must be of the same type. Relational expressions can contain the keywords AND and OR. Order of evaluation is left to right, but parentheses should always be used to disambiguate mixed expressions. Examples of use 10 IF (x <= 5 AND x > 0) OR x = 10 THEN 100

18

Alphabetical list of keywords Each keyword in MiniBasic is listed alphabetically, with a brief description of its function and how to use it.

e – Euler’s number. The mathematical constant e, 2.71281828... The base of natural logarithms, and used in many formulae. Usage e 10 LET y = POW(e, -x)

ACOS – arc-cosine. Calculates the inverse cosine of a number. The result is in radians. Input must be between –1.0 and 1.0. Usage num = ACOS(numeric) 10 LET rad = ACOS(x)

AND – logical operator. Used in IF ... THEN statements to perform two tests. If both are true then the test succeeds. Note it cannot be used as a bitwise AND operator, as in some programming languages. Usage IF relational AND relational THEN numeric 10 IF name$ = “Fred” AND age > 18 AND age <= 65 THEN 100

19

ASCII – get the numerical code for a character. Calculates the computer’s internal code for the first character in a string, or 0 if the empty string is passed. It is useful for performing direct manipulations on the representation, for instance testing for newlines (code 13). Usage num = ASCII( string ) LET x = ASCII(“*abc”)

x now contains the code for an asterisk, or 42.

ASIN – arc-sine. Calculates the inverse sine of a number. The output is in radians. The input must be between –1.0 and 1.0 Usage num = ASIN(numeric) 10 LET rad = ASIN(x)

ATAN – arc-tangent Calculates the inverse tangent of a number. The output is in radians. Note that for very extreme values accuracy may be lost. Usage num = ATAN(numeric); 10 LET rad = ATAN(x)

20

CHR$ - convert ASCII value to a string. Converts the computer’s internal numerical character code to a MiniBasic string of one letter. It is useful for performing numerical manipulations with the code. For instance, to insert a newline call CHR$(13). Usage str = CHR$(numeric) 10 LET X$ = “Line 1” + CHR$(13) + “Line2”

X$ now contains “Line 1” and “Line 2” separated by a newline character.

COS – cosine Calculates the cosine of an angle. The input must be in radians. Usage num = COS(numeric) 10 LET x = COS(degrees/180 * PI)

DIM – dimension an array. Use DIM to create a named list of numbers or strings. This is extremely useful when dealing with large amounts of data. For instance, if you a writing a program for a company with several employees, you can DIM an array to hold all their names. Arrays can have up to five dimensions. In practise, even on modern computers, memory fills up very fast with big arrays, and three dimensions is the maximum recommended. There must be no space between the name of the dimensioned variable and the opening parenthesis. 10 DIM name$(100)

Creates an array of 100 names.

21

10 DIM map(width, height)

Creates a 2d array of width * height entries, maybe representing grid squares on a map. Array elements are in the range 1 – maximum, so 20 LET map(1,10) = 2.0

sets the top left element to 2.0. map(width, height) is the bottom right element. If you try to access out-of-range elements the computer will throw an error. MiniBasic allows you to resize an array at any point by calling DIM on it again. If the array is one-dimensional, elements will be preserved. If the array has higher dimensions then the elements will be scrambled. Resizing an array is useful if, say, you are inputting a list of employee names and don’t know how many there will be. Arrays of zero dimensions may not be declared. MiniBasic also allows you to initialise arrays when you dimension them. This 10 DIM days$[7] = “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat”, “Sun”

will declare an array of days of the week. This method is useful for defining data. For 2d arrays, the first dimension is the lowest (x) dimension, so DIM name$(2, 4) = “Fred”, “Bloggs”, “Joe”, “Sixpack” “Homer”, “Simpson” “John”, “Doe”

is the correct order. Dimensioned variables are intimately connected with FOR ... NEXT loops. Use the loop counter to index into your array.

22

Usage DIM id(numeric, numeric) 10 DIM array(10)

Creates a single-dimensioned array of 10 numerical elements 10 DIM dictionary$(2, N)

Creates a 2-dimensional array of N * 2 strings 10 DIM factorial(10) = 1!, 2!, 3!, 4!, 5!, 6!, 7!, 8!, 9!, 10!

Creates a list of the first ten factorials

FOR -

start a for loop

FOR ... NEXT loops are extremely useful in programming. The FOR statement consists of three parts, the initial set-up value, a TO value, and an optional STEP value. 10 20 30 40

DIM array(100) FOR I = 1 TO 100 INPUT array(I) NEXT I

will input a hundred values into the array. The variable in the NEXT statement must be the same as that in the matching FOR. FOR loops may be nested to a maximum depth of 32. 10 20 30 40 50 60

DIM chess(8,8) FOR I = 1 TO 8 FOR J = 1 TO 8 chess(j,i) = 1.0 NEXT J NEXT I

23

The step value does not need to be 1, and may be negative. For instance 10 20 30 40 50 60

FOR I = 1 TO 10 STEP 2 PRINT I NEXT I FOR I = 10 TO 1 STEP –0.3 PRINT I NEXT I

The initial, to, and step values are calculated once on entering the FOR loop, they are then constant. 10 LET x = 10 20 FOR I = 1 TO x STEP x/5 30 PRINT I 35 REM Next line has no effect LET x = x + 1 50 NEXT I

In MiniBasic, if the TO value is lower than the initial value (or higher if the STEP value is negative) then the loop does not execute. Control passes to the first matching NEXT. It is important not to jump out of FOR ... NEXT loops, or get the nesting order wrong, otherwise MiniBasic’s control flow will become confused. The “Too many FORs” error is likely to be caused by jumping out of a loop. To terminate a loop prematurely, set the counter to the TO value, and jump to the matching NEXT Usage FOR id = numeric TO numeric STEP numeric ... NEXT id

GOTO – unconditional jump GOTO executes a jump to another line. The line number is usually a constant, but GOTO x is supported. GOTO is not considered good programming practise, but is essential in MiniBasic because flow control is so simple.

24

Usage GOTO numeric 10 GOTO 100 10 GOTO x

IF – conditional jump The IF ... THEN construct allows a MiniBasic program to make decisions. It can emulate any other control structure. If the test condition is true, then control jumps to the line indicated after the THEN keyword. If false, control passes to the next line. No statements other than a line number may appear after the THEN keyword, though the form IF y < 10 THEN x

is supported The relational operators are =, <> (not equal) , >, >=, < and <=. They can be aplied to strings or to numerical expresions. The AND and OR logical operators can also be used. MiniBasic does not perform lazy evaluation – all expressions will be evaluated so all array indices etc must be legal. Usage IF relational THEN numeric 10 IF x < 10 THEN 100 10 IF a$ <> “OK” AND a$ <> “YES” THEN x

25

INPUT – input a number or a string. To input data, use the INPUT function. In a test environment this will usually be typed by the user, if MiniBasic is a component of another program input will be provided by caller. INPUT x

inputs a variable. Any non-numerical characters are skipped over until a number appears. INPUT n$

inputs a string. Characters are read up to the first newline, which is discarded. Most consoles provide data a line at a time, so input you type will not be available until you press ENTER. If the input stream comes to an end, the program will fail with an error message. Usage INPUT id 10 INPUT x 20 PRINT x

INSTR – in string Looks for occurrences of a substring within a string. The first argument is the string to search, the second argument the string to search for, and the third argument the position at which to start. The return value is 0 if the string is not found, or else the offset of the first occurence. For instance x = INSTR(“zigzag”, “zag”, 1)

will return 4 26

It is very useful for string manipulation. For instance, to test if the onecharacter string ch is a digit we could write 10 IF INSTR(“0123456789”, ch, 1) <> 0 THEN 100

Usage num = INSTR(string, string, numeric) 10 LET x = INSTR(sentence, “and”, 1)

INT – convert real to integer. All MiniBasic numerical variables are stored as floating point. INT() returns the lower integer potion of the number. Thus INT(1.9) = 1. To round, call INT(x + 0.5). INT() is also useful for getting rid of small errors caused by floating-point calculation. Usage num = INT(numeric) 10 LET x = INT(x/2)

LEFT$ - returns the left portion of a string. To take the leftmost characters of a string, call LEFT$. If the string is too short to contain that number of characters, it returns the whole string. Usage str = LEFT$(string, numeric) 10 LET hello$ = LEFT$(“Hello World”, 5)

27

LEN – returns the length of a string. To find the length of a string in characters, call LEN(). The empty string “” returns 0. It is often necessary to examine each character of a string for processing. 5 REM PRINT A$, omitting the letter “x” 6 INPUT A$ 10 FOR I = 1 TO LEN(A$) 20 LET ch$ = MID$(A$, 1, 1) 30 IF ch$ = “x” THEN 50 40 PRINT ch$; 50 NEXT I 60 PRINT

Usage num = LEN(string) 10 LET x = LEN(“This is a string”)

LET – assignment The LET statement assigns a variable a value. If the variable does not exist it is created. 10 LET x = 10 10 LET name$ = fname$ + “ “ + sname$

Plain variables like x or length are always numerical, string variables like name$ always end with a dollar sign. It is illegal to try to assign a variable of the wrong type. LET will not create or increase the size of a dimensioned variable 10 15 20 25 30

DIM array(2,2) REM Legal LET array(1,2) = 10 REM Illegal out of bounds LET array(1,3) = 0

The form 10 LET x = x + 1

28

is legal and is often very useful. It is even legal is x has not been created (it is initialised to zero). Usage LET id = numeric LET id$ = string 10 LET x = 10 10 LET x = x + 1 10 LET a$ = CHR$(13)

LN – natural logarithm Computes the natural logarithm of a number, which must be greater than zero. Natural logarithms are to the base e. To convert to a base 10 logarithm, LET log10 = LN(x)/LN(10)

To convert to a base 2 logarithm LET log2 = LN(x)/LN(2)

Usage num = LN(numeric) 10 LET log = LN(x)

MID$ - middle string function. Use this function to obtain a string from the middle of another string. The first argument is the target string, the second argument the offset (1 – based) and the third argument the length of the substring to extract. LET x$ = MID$(“Distraction”, 4, 5)

sets x$ to “tract” 29

If the length is too long for the target string, the answer is truncated. By passing –1 for the length, we tell the function to extract the remainder of the string. LET x$ = MID$(“Distraction”, 4, -1)

sets x$ to “traction”. Usage str – MID$(string, numeric, numeric) 10 LET x$ = MID$(y$, 3, 4) 10 LET x$ = MID$(y$, 3, -1)

MOD – modulus. Modulus is not a function but an arithmetical operator. It calculates the remainder after division. LET x = 12 MOD 5

sets x to 2. Both sides of the MOD operator should be of the same sign. MOD 0 is an error. MOD also works for fractional values. 0.75 MOD 0.5 equals 0.25 Usage num = numeric MOD numeric 10 LET X = Y MOD 10 10 LET X = Y MOD –0.1

30

NEXT – terminates a FOR ... NEXT loop For description see FOR Usage NEXT id 10 FOR I = TO 10 20 PRINT I 30 NEXT I

OR – logical operator. Used in IF ... THEN statements to perform both tests. If either one is true, then the test succeeds and the jump is taken. IF job$ = “caretaker” OR age < 65 THEN 100

Note that lazy evaluation is not performed. Both sides of the expression will always be evaluated. When used in conjunction with AND use parentheses to disambiguate. Usage IF relational OR relational THEN numeric 10 IF x = y OR x = z OR x < 0 THEN 100

31

PI – mathematical constant. The mathematical constant PI, or 3.14159265... This is the ratio of a circle’s circumference to its diameter and is used in many mathematical formulae Usage PI 10 LET area = radius * radius * PI

POW – exponentiation function. POW() raises x to the power y. Fractional and negative powers are supported. LET x = POW(10, 2)

Will set x to 100. By passing 1/y as the exponent, we can obtain the yth root of x. For example, to obtain the cube root of two pass LET x = POW(2, 1/3)

Passing a negative power calculates the reciprocal. For example LET y = POW(x, -3)

sets y to 1/(x^3) Some values are illegal. For instance, POW(-1,1/2) will produce an indefinite result. Usage num = POW(numeric, numeric) 10 LET a = POW(x,y)

32

PRINT – output statement All of MiniBasic’s output is via the PRINT statement. It is used to print both numbers and strings. PRINT “Hello World”

Will output the string “Hello World”, followed by a newline. PRINT x

Will print the value of x in a human-readable format. It is possible to print many values in one line by separating them with commas. PRINT “Your salary is”, x, “Mr” name$

The comma will automatically insert a space. To suppress the newline, terminate the PRINT statement with a semicolon. 10 20 30 40

LET x = 10 LET y = 25 PRINT x; PRINT y

will output the string “1025” To print a bare newline, use the empty string 10 PRINT “”

Usage PRINT numeric or string, numeric or string ; (optional) 10 10 10 10 10

PRINT PRINT PRINT PRINT PRINT

“Hello World” x “Hello”, name$ “Enter your telephone number”; “” 33

REM – remarks This statement is purely for adding comments to programs so that a human reader can understand them. It is also frequently used for “commenting out” code – prefixing with a REM so it is not executed. MiniBasic allows for multi-line comments, as long as the first character of every continued line is a space. Usage REM any comments 10 REM Demonstration program by Malcolm McLean 10 REM This is an extremely long comment, which is spread over two lines. 10 REM PRINT “This PRINT statement is commented out”

RIGHT$ - get rightmost characters of a string. This is the twin to LEFT$. It takes the rightmost characters of a string. For instance LET A$ = RIGHT$(“Beholden”, 3)

Would set A$ to “den”. If the target string isn’t long enough, all of the string is copied. Usage str = RIGHT$(string, numeric). 10 LET A$ = RIGHT$(B$, 10)

34

RND – Random number generator Many applications need random numbers. RND() provides a pseudorandom number generator. The argument, which should be an integer, tells RND() to generate a random integer in the range 0 to N –1 10 FOR I = 1 TO 100 20 PRINT RND(10) 30 NEXT I

will output a stream of random digits in the range 0 – 9. If we pass RND() the value 1 the number generated is a floating point value in the range 0 – (slightly below) 1. The random number generator is deterministic. To force a certain behaviour, call RND() with a negative argument. This will “seed” the random number generator. LET dummy = RND(-10)

will give us numbers based from the seed 10 Calling RND(0) will always return 0. Usage num = RND(numeric) 10 LET die = RND(6) + 1 10 LET dummy = RND(-10) 10 LET p = RND(1)

SIN – sine Returns the sine of a number. The argument must be in radians. Usage 10 LET s = SIN(theta) 10 LET s = SIN(degrees/180 * PI)

35

SQRT – square root Calculates the square root of its argument, which must be positive. Usage num = SQRT(numeric) 10 LET root2 = SQRT(2) 10 LET dist = SQRT( (x1-x2) * (x1-x2) + (y1 – y2) * (y1 – y2))

STEP – increment for a FOR loop. STEP is by default 1, but can be any value, positive or negative. It is evaluated once when the FOR ... NEXT loop is entered. For further details see FOR Usage FOR id = numeric TO numeric STEP numeric 10 FOR i = 1 TO 100 STEP 10 10 FOR i = 100 TO 1 STEP –1 10 FOR i = min TO max STEP delta

STR$ - convert numerical value to string. The numerical value x = 10 and the string value x$ = “10” are two different things. To convert a number into a human-readable string use STR$ Usage str = STR$(numeric) 10 LET reg$ = “ABC” + STR$(num)

36

STRING$ - tandem string. If we need to create a string that consists of a shorter string duplicated many times, use STRING$. LET stars$ = STRING$(“*”, 20)

will create a string of twenty asterisks. A common use is creating variable numbers of spaces for output formatting. Usage str = STRING$(string, numeric) 10 PRINT STRING$(“ “, 10), out$

TAN - tangent Calculates the tangent of an angle, which must be in radians. Usage num = TAN(numeric) 10 LET x = TAN(theta) 10 LET x = TAN(degrees/180 * PI)

THEN – component of IF statement. THEN introduces the jump destination which is taken if the expression in the IF statement is true. The expression is always evaluated, even if the branch is not taken. For further details see IF Usage IF relational THEN numeric

37

TO – component for FOR ... NEXT loop In a FOR ... NEXT loop, TO indroduces the terminal value. When it is exceeded, the loop terminates at the next NEXT statement. For further details see FOR Usage FOR id = numeric TO numeric

VAL – calculate the numerical value of a string. This function converts a human-readable string containing numbers to a numerical variable. The string may contain numbers in scientific notation e.g. 1.5e20. The string is read up until the first non-numerical character is encountered. If the string does start with a number, 0 is returned Usage num = VAL(string) 10 LET x = VAL(“1024”) 10 LET x = VAL(“1.5e20”)

VALLEN – length of value This function is designed for use with VAL() to tell the caller how many numerical characters were translated. This is useful if stepping through a string containing many numbers. It also tells the caller whether a string is numerical or not – if non-numerical it returns 0. 10 20 30 40 50 60

INPUT a$ IF VALLEN(a$) <> 0 THEN 50 PRINT “You must enter a number” GOTO 10 LET x= VAL(a$) LET a$ = MID$(VALLEN(A$), -1)

38

This code will read a number from the input, and prompt is valid input is not entered. Usage: num = VALLEN(string) 10 LET slen = VALLEN(“121 dalmations”)

39

Errors Sometimes MiniBasic will terminate with an error message. Usually these are due to typing mistakes or logic errors in the bASIC program. Occasionally they may be caused by the computer running out of resources, by illegal input, or by internal errors in the MiniBasic interpreter. Can’t read program You have called MiniBasic with something it cannot recognise as a MiniBasic program at all, for instance with a text file containing a nursery rhyme. Program lines not in order Lines have to be in numerical order. If lines are out of order, you will receive this error. Line not found You have tried to jump to a non-existent line.

Syntax error line This means that the computer has encountered a line it cannot understand. It is a catch all error, incorporating things such as identifiers starting with digits, or lines not terminated with a newline. Out of memory The computer has run out of memory. This may occur when you try to dimension a huge array, or it may occur at any time if the computer is low on resources, since MiniBasic uses memory internally. Be particularly careful when dimensioning arrays with variables.

40

Identifier too long An identifier (variable name) is allowed to be only 31 characters long, including the $ for a string identifier. For dimensioned variables the number is one less. No such variable You have attempted to use a variable that has not been initialised. Bad subscript You have tried to access a dimensioned array beyond its dimensioned size. Too many dimensions You have tried to dimension an array with more than five dimensions. Too many initialisers In initialising a dimensioned array, you have tried to list more values than you have space for. Illegal type You have tried to use a string variable as the counter for a for loop Too many nested fors line Maximum depth of FOR .. NEXT loops is 32. Exceeding this limit is probably due to problems with jumping out of FOR ... NEXT loops. For without matching next You have declared a FOR statement but not a matching NEXT Next without matching for You have declared a NEXT statement without a matching FOR

41

Divide by zero You have attempted to divide by zero. This is a mathematical error Negative logarithm You have attempted to take the logarithm of zero or a negative number. This is a mathematical error. Negative square root You have tried to take the square root of a negative number. This is a mathematical error Sine or Cosine out of range You have attempted to pass a value not in the range –1.0 to 1.0 to the ASIN() or ACOS() functions. End of input file An INPUT statement has encountered an end of file condition. This could be due to some problem with the computer’s system. Illegal offset A string function has received an illegal value for a string offset, such as a negative second argument to LEFT$() Type mismatch You have entered a string expression where MiniBasic was expecting a numeric expression, or a numeric expression where it was expecting a string. Input too long Input lines can be a maximum of 1023 characters long. Lines longer than this are almost certainly either errors or malicious attempts to exploit the system, so they are rejected.

42

Bad value There has been an internal overflow. Usually this is caused by trying to calculate with ridiculously large value like 10. Not an integer A non-integer was used as an array index or to a function ( like RND() ) which naturally expects an integer. Note that floating point arithmetic is not exact so expressions like SQRT(3.0) * SQRT(3.0) may not return exactly 3.0. Use the INT() function to force a number to an exact integer. ERROR Unspecified error has occurred. This probably represents some internal problem.

43

How to write a BASIC interpreter Take out your pocket calculator and type in 1+2x3 Unless you have a really good one, the output will probably be 9. The calculator moves from left to right, evaluating the expression. A mathematician, on the other hand, uses the rule BODMAS (brackets, of, divide, multiply, add, subtract). So the result of the expression should be 7, or 1 + (2 * 3) This is the basic problem in writing an expression parser to interpret human-meaningful programming languages, like BASIC. The interpreter cannot simply bolt through the input. The secret is to store the state of the expression on a stack. Consider this simple problem. We have a programming language the uses the symbols ‘(‘,‘)’ ‘[‘. ‘]’, ‘{‘ ‘}’. The brackets have to match each other. So { fred[15] * (x+1) } Would be legal { fred[15 *)x + 1(} Would not be, because the square bracket is unclosed and the round brackets are the wrong way round. { fred[15 * (x+1])} would also be illegal, because the square bracket has closed whilst the round brackets are still open. Every opening bracket has to be matched with a corresponding closed bracket, in order. The solution is to use a stack. When you hit an opening bracket, push it onto the stack. When you hit a closing bracket, pop the stack. If the symbol on the top doesn’t match, you know you have an error. Finally, if the stack is not empty at the end of the expression, you have an unclosed bracket. 44

Now one of the nice things about C is that it allows recursive functions. Mathematical expressions are naturally recursive. If we have the expression y=x+(…) anything can go inside the parentheses, as long as it is a legal expression. So we could have y=x+(2*3) y=x+(x*3) y = x + ( x + (2 * 3) ) we can nest as deeply as we like. So if we are writing a parser, the algorithm is to parse the expression from left to right, until we hit an opening parenthesis. Then we parse another expression. Then we check for a closing parethesis. If we find it, we calculate the result and pass it to the enclosing expression. If we don’t, there must be an error y=x+(3 would be an error So would y=x+(3 +) because the contents of the bracket are not a full expression. There are three basic levels of arithmetical expression: the factor, the term, and the expression. An expression consists of several terms – a term being numbers that are added or subtracted. A term consists of several factors – a factor being a number that is multiplied or divided. 45

A factor consists of either a number, or a opening bracket, an expression, and a closing bracket. The subroutine that evaluates the factor therefore has to call the subroutine that evaluates the expression, if it hits an opening bracket. Hence expression interpreters are naturally mutually recursive. Generally it is a bad thing to have mutually recursive functions in your programs, but here is an exception. Another thing that is normally a bad idea is a global variable. However in MiniBasic you will notice several global variables. This is for several reasons, but mainly because each function needs to keep track of the state of the input. We read the input from left to right, one token at a time. The token that we have just read is stored, and is available for reference. Once we process it, it is done – we never backtrack in our reading. This method is called the Look-Ahead Left Right (LALR) parser algorithm with one token of lookahead. The vast majority of programming languages can be parsed this way. There are just a few human-meaningful expressions which don’t lend themselves to this treatment – this becomes a serious problem if you are writing a natural language parser, which is a much more difficult proposition than a BASIC interpreter. So we want to write a master function double expr()

which parses the input, and returns a number as an answer. If it hits an error it sets some flag somewhere. It also updates the input globally. Intutively, it might seem nice to take the input as a parameter double expression(char *str)

this is OK for the highest level function, but not for the functions that are called recursively. The reason is that, if expression() is called recursively, the caller needs to know how many tokens it has consumed. So we keep the input, and the number of tokens read, global. This leads us to the question of what is a token. It would be perfectly workable to say that each ASCII character is a token. In fact this leads to a bit of a nuisance, because it is easier to treat numbers separately rather than writing a grammar to build them up from their digits. When we come to add keywords, it will also be a lot easier to say that each keyword is a separate token. It is also convenient to just ignore spaces.

46

So for now, let’s define our tokens as the arithmetical symbols ‘+’, ‘-‘, ‘*’, ‘/’ the parentheses ‘(‘ and ‘)’, and VALUE, meaning any sequence of digits. Finall we also need a EOS taoken, to tell us when we have reached the end on the input. The tokenizer is called the lexical analyser. The important function is called int getoken(const char *str)

str is a const, because gettoken() never actually consumes any input. It simply tells us which token is waiting for us in the input stream. At this simple level, gettoken() is very easy to write. Look at the input, and skip leading spaces. If the character waiting is an operator, return that. If it is a digit, return VALUE, if it is the terminating NUL, return EOS, and otherwise flag an error. So we call gettoken() to set up the first token, and then call expr(). The key is that we suppress error parsing in this high-level function. An expression consists of one or more terms, held together by pluses or minuses. When we run out plus or minus tokens, we stop. /* parses an expression */ static double expr(void) { double left; double right; left = term(); while(1) { switch(token) { case PLUS: match(PLUS); right = term(); left += right; break; case MINUS: match(MINUS); right = term(); left -= right; break;

47

default: return left; } } }

The match() function is the other part of the lexical analyser. It flags an error if the variable token doesn’t match whatever is passed it to match, then it discards that token and calls gettoken() to read another one. In this function, the check is redundant because we are already checking token. Note that we do not discard any symbols the expression parser doesn’t understand. The term() function is almost the same as the expr() function. An expression is a series of terms, whilst a term is a series of factors. I have removed the modulus keyword for simplicity. /* parses a term */ static double term(void) { double left; double right; left = factor(); while(1) { switch(token) { case MULT: match(MULT); right = factor(); left *= right; break; case DIV: match(DIV); right = factor(); if(right != 0.0) left /= right; else seterror(ERR_DIVIDEBYZERO); break; } }

48

Note that terms can contain errors – it is illegal to divide by zero. When we hit an error, we want to flag it and terminate. This could be achieved by setjmp() and lngjmp(), but that gets ugly. It is better to use a sticky error. seterror() keeps a note of the first error it is informed of. We then allow the parser to trigger other calls to seterror(), which are ignored, until it returns control to the top level function. /* parses a factor */ static double factor(void) { double answer = 0; char *str; char *end; int len; switch(token) { case OPAREN: match(OPAREN); answer = expr(); match(CPAREN); break; case VALUE: answer = getvalue(string, &len); match(VALUE); break; case MINUS: match(MINUS); answer = -factor(); break; case SQRT: match(SQRT); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= 0.0) answer = sqrt(answer); else seterror(ERR_NEGSQRT); break; default: seterror(ERR_SYNTAX); break; } return answer; } 49

This function, which is much simplified from the actual code, is a bit different. A factor consists of a number, most simply. So if we have the token VALUE, we examine the digits to obtain the number, match it to move the lexical anlyser on, and return the result. However a factor can also be a opening bracket, an expression, followed by a closing bracket. Therefore we have to call expr() recursively. Another complication is unary minus. A factor can be a minus sign, followed by another factor. We are disallowing unary plus, but it could be added in the same way. Finally, I have allowed for another complication, a function call to SQRT(). All the other mathematical functions can be added to the factor() function in a similar way. With the expr() function, we have our basic logic for an expression parser. The high-level function, double expression(const char *str)

would clear the error state, and set up the lexical analyser with the first token. It then calls expr() to get the result, and match(EOF). It then checks the error state, and if everything is correct, returns the result. Otherwise, it reports the error. The expression parser is the skeleton round which MiniBasic is built. It is trivial enough to add the factorial and MOD operators, more functions, and a few bells and whistles like e and PI. The next major complication comes when we allow the user to add variables. We need to allow program of the form LET x = 1 + 2 LET y = x * x

To implement this, we need the concept of the lvalue. An lvalue is something which can be assigned. Since BASIC, unlike C, does not require declaration of variables, a lvalue can be either a pre-existing variable, or one we have not encountered before.

50

If the lexical analyser hits an alphanumerical string that is not a keyword, it reports it as a FLTID ( floating point identifier ). We maintain a list of all scalar variables in the system, in the array variables. For convenience, string variables share the same space. When control reaches a LET statement, we check the identifier to see if it is already in use. If not, we add it. Then we assign it the value on the right hand side of the equals sign. The structure of a LET statement is therefore LET lvalue = expression To allow for expansion into dimensioned variables, the LVALUE structure contains a pointer to the data item to assign. Once we have assigned the variable, it becomes available as a part of an expression. Therefore the factor() routine has to be expanded to accommodate a FLTID. The function variable() matches a float identifier. Failure to find in the context of an expression is an error. It will be noted that the interpreter searches the expression list in a linear fashion. This could easily be the focus for algorithmic improvements. String expressions are basically simpler than arithmetical expressions. They do not introduce any new concepts, except that the parser has to know whether it is parsing a string or an arithmetical expression. This is where the LALR model could break down. PRINT x could require us to parse x as a numerical expression, or as a string expression. MiniBasic, like normal BASIC, gets round this by requiring all string variables to end with the character ‘$’. We therefore know whether we are dealing with a string or a numerical expression. Strings are allocated using malloc(). This is slow, but it allows for arbitrary length strings without gobbling too much memory. The next problem is flow control. Flow control is what distinguishes a programming language from a calculator. The BASIC method is to use line numbers. When the function is called, we do an initial pass through the script, to index all the line numbers. This is easy because every line must begin with a line number and end with a newline character, except that 51

MiniBasic allows for continuation lines, which are blank. If the lines are not in ascending order, we reject the program. Indexing lines this way allow for reasonably efficient jumps – otherwise we would have to read through the whole script in order to find the destination. Execution starts at the first line. We store the line number, in internal consecutive numbering, in curline. We parse one line at a time, and return the destination line number, or zero in the normal case of control simply incrementing. An expression can be converted to an internal line number, by doing a binary search on the line list. Therefore the GOTO statement consists of GOTO expression And the expression is simply evaluated and returned. It is actually simpler to allow for arbitrary numerical expressions in jump destinations, including ones computed at run time, though it wouldn’t be if we were writing a compiler rather than an interpreter. IF is the slightly more complicated form of the GOTO statement. I chose to use the canonical BASIC form of IF … THEN linenumber, because it is familiar to microcomputer programmers, though in fact it is a pain to use and the more modern IF … ENDIF is a lot more intuitive. The IF statement requires the introduction of relational expressions. Similarly to numerical expressions, these have precedence of ANDs and ORs, together with relational operators like ‘>’ and ‘=’. They also have to allow nested parentheses. A relational expression parser can be built in exactly the same way as an expression parser. In fact it needs to call the numerical expression parser IF expression > expression is a perfectly legitimate and unexceptional form of use. The other essential for a programming language is the use of vector variables. A huge number of operations, such as calculations of the mean, or sorting, rely on lists. It is of course easy to emulate any multi-dimensional array with a onedimensional array. In fact in C most data which is inherently two dimensional, such as image rasters, has to be treated as one-dimensional 52

because of limitations in the language. However BASIC programmers expect to be able to use multi-dimensional arrays. The DIM statement, of course, just calls malloc() internally. To simplify coding I restrict arrays to at most five dimensions, which is about the maximum that can be written out by hand. Even a big computer will quickly run out of memory if arrays get much bigger than this anyway. Allowing arbitrary dimensions forces you to roll the indexing into a loop over the dimensions, which gets headachy. This does complicate the variable system, because we have both scalar and dimensioned variables. However they can be distinguished by requiring all dimensioned variables to end with an opening parenthesis. This is done at the level of the lexical analyser. The use of the LVALUE structure helps to keep things under control – it simply points into the array. The whole point of dimensioning arrays is to iterate over them. This can be done using IF statements and keeping a counter, but it is clumsy. Unfortunately FOR … NEXT loops introduce other problems into the interpreter. They can be nested, so a stack of control structures has to be maintained. Then the loop is exited at the terminating NEXT statement, but the control is in the FOR, which leads to other problems, particularly because the user may not enter a script with nicely nested loops. Finally, there is the issue of what to do if a user jumps out of a loop. The solution, which is simple but not the most elegant, is to allow the user to mess around with flow control, but keep the FOR stack relatively small, and insist on the matching NEXT being labelled with the control variable. So bad control will rapidly either overflow the stack or cause a mismatch error. The problem is that there then is no way to break out of the FOR loop legitimately. The FOR loop searches for a matching NEXT if the loop is null, the advantage is that it allows for cleaner user scripts, though the fiddliness is probably more effort than it is worth. The step size and terminating expression is evaluated once only, on loop entry. The cost of changing this and providing C-style fors is that you then need two FOR-evaluation routines, one for loop entry and one for each update. Finally, every program needs IO. Microcomputers didn’t have any sort of worthwhile backing store, so the canonical file-handling functions were not very good. In the UNIX world, it is quite common to take everything from standard input and direct 53

everything to standard output, redirecting by means of pipes. In the PC world, users expect graphical interfaces to their file system, which of course is way beyond our capacity to provide. The PRINT statement is built on top of ANSI fprintf(). The form is that the user can specify either a string or a numerical variable, so we need a function, isstring() to distinguish between the two. The INPUT statement suffers from the problem of what to do if input doesn’t match. The solution for inputting numbers is to ignore nonnumerical input until a number is found. It is implemented in terms of fscanf(). For string input, the string is defined as the line. This is then limited to 1024 characters to allow for the call to fgets(). In practise I suspect that most users of MiniBasic would want to provide their own IO extensions. For instance, if you want to control a plotter, you would provide instructions like PENUP, PENDOWN, PENMOVE and functions to query the pen position. However you could use the program as it is – printing the statements to stdout would be interpreted by a calling function and translated to pen commands, whilst changes to the pen state would go on standard input. MiniBasic is yours to use as you want. I would like to be acknowledged if you find the code useful, either as is or as the basis for redevelopment, but I don’t insist on this. If it makes your boss happy to think that the code was developed all by yourself in the five minutes it took to download this book, then that is fine by me. You can incorporate it in free or commercial products without charge. The only thing I insist on is that you do not try to restrict my rights in the code in any way, which means that I can make use of any enhancements, bug fixes, or derivative products as I see fit.

54

The design of MiniBasic MiniBasic is designed to allow non-programmers to add arbitrary functions to programs. Imagine we are attempting to SPAM-shield an email program. Different users have different ideas of what constitutes SPAM. By providing checkboxes we can go so far, but if a user want, say to, regard as SPAM everything with an attachment, unless it is under a certain size, unless it has come from a trusted list of addresses, then we are stuck. However by providing a MiniBasic interface, the user can input the relevant values and provide the logic. The calling program would do this by setting up the input stream with, say, the email address of the sender, the title of the email, the length of any attachment. The calling program then presents half of a MiniBasic program to the user, with the input set up eg 10 20 30 40 50 60 70 80

REM SPAM filter REM sender’s email INPUT address$ REM tile of email INPUT title$ REM length of attachment INPUT attachlen REM PRINT “Accept” to accept the email or “Reject” to reject

The user then provides the logic for his choice.

MiniBasic can of course also be used as a stand-alone console programming language. This is useful for teaching purposes, for testing MiniBasic programs, or if you simply want to write a “filter” program that accepts from standard input and writes to standard output. It was important that MiniBasic be simple to learn, and simple to implement. For this reason the syntax of the language has been kept as

55

close as possible to the type of BASIC used on microcomputers in the 1980s. Millions of people know a BASIC of this kind. Because of advances in computer power since the 1980s array initialisation was allowed. This allows us to eliminate the difficult to use READ and DATA statements. Re-dimensioning of arrays was also allowed, largely for theoretical reasons (it turns MiniBasic into a Turing machine). GOSUB was not included. It is not of much practical use without local variables and parameters, and a functional language isn’t very useful without some mechanism for passing and returning vectors. Adding these would have complicated the design of the interpreter considerably, and moved the language away from original BASIC. PEEK and POKE are obviously hardware-dependent, add potential security risks, and were also not included. The C-language source to MiniBasic is included. The interface is int basic(const char *script, FILE *in, FILE *out, FILE *err) In the standalone program these are called with stdin, stdout, stderr. In a component environment, these will usually be temporary memory files, and the input will be set up with the parameters to the function the user is to write. The function returns 0 on success or non-zero on failure. The source code is portable ANSI C. With the exception of the CHR$() and ASCII() functions, which rely on the execution character set being ASCII. The relational operators for strings also call the function strcmp() internally, which may have implications on non-ASCII systems. An interpreted language is obviously not a particularly efficient way of running functions. Variables are stored in linear lists, with an O(N) access time, so big programs are O(N*N). However because of the lack of support for subroutines MiniBasic is not very suitable for complex programs anyway. If you were to extend the scope of the program to run very large scripts, it would be necessary to replace the variable list with a hash table, binary tree, or other structure that supports fast searching. All MiniBasic keywords, with the exception of e, start with uppercase letters. This fact is exploited to allow faster recognition of identifiers starting 56

with lower case. Users can use this feature to gain some performance advantage. On a fast computer the efficiency of MiniBasic shouldn’t be a major problem unless users run very processor-intensive scripts, or if the function is in a time-critical portion of code. In these cases the answer would be to move to a pseudo-compiler system, where the MiniBasic scripts are translated into an intermediate bytecode that is similar to machine language. This is a project for a later date. Since MiniBasic is available as C source, it is possible to extend the language. Where possible extensions should be in the form of functions rather than new statements, to avoid changing the grammar of the language. To add a numerical function, foo, which takes a numerical and a string argument, write the function like this foo – check the first character of a string FOO(85, “Useless function”) double foo(void) { double answer; double x; char *str; int ch; match(FOO); /* match the token for the function */ match(OPAREN); /*opening parenthesis */ x = expr(); /* read the numerical argument */ match(COMMA); /* comma separates arguments */ str = stringexpr(); /* read the string argument */ match(CPAREN); /* match close */ f(str == NULL) /* computer can run out of memory */ return 0; /* stringexpr() will have signalled the error so no point in generating another */ ch = integer(x); /* signal error if x isn’t an integer*/ if( !isalpha(ch) ) seterror(ERR_BADVALUE); /* signal an error of your if ch isn’t valid */ if(str[0] == ch) answer = 1.0; else answer = 2.0; free(str); malloc(), so free */ return answer;

/* function logic */

/* str is allocated with

57

}

Once you have your function, add an identifier for it, FOO, to the token list. Then to the functions gettoken() and tokenlen() add the appropriate lines. Finally to the function factor() add the code for calling your function. For string functions, the procedure is similar, except that they must return an allocated string. The convention is that they end with a dollar sign, and the token id ends with the sequence “STRING”. Add the call to the function stringexpr(), and add your symbol to the function isstring() so that statements like PRINT know that it generates a string expression. To change the input and output model, you need only change the functions doprint() and doinput(). If you wish to change the error system then you need to look at the functions setup(), reporterror() and the toplevel function basic(). Currently the program takes FILE pointers, which should be flexible enough for most uses, but not if say you want to provide for interactive scripts.

58

Hello World 10 REM Hello World program 20 PRINT "Hello World"

59

Name Handling 10 20 30 40 50

REM String-handling program REM Inputs a name, tests for validity REM and breaks up into parts. PRINT "Enter your full name" INPUT name$

60 REM First check for non-English characters 70 LET flag = 0 80 FOR I = 1 TO LEN(name$) 90 LET ch$ = MID$(name$, I,1) 100 IF (ch$ >= "A" AND ch$ <= "z") OR ch$ = " " THEN 140 110 LET flag = 1 120 REM This forces the loop to stop 130 LET I = LEN(name$) 140 NEXT I 150 IF flag = 0 THEN 180 160 PRINT "Non-English letter,", ch$ 170 GOTO 40 180 190 200 210 220 230 240 250 260 270

REM Jump to subroutine LET return = 210 GOTO 1000 IF name$ = "" THEN 280 LET return = 240 GOTO 2000 LET N = N + 1 DIM out$(N) LET out$(N) = word$ GOTO 180

280 285 290 300 310 320 330

REM Print out the name PRINT "Name accepted" FOR I = 1 TO N PRINT out$(I) + " "; NEXT I PRINT "" GOTO 3000

1000 1010 1020 1030

REM strips the leading space IF LEFT$(name$, 1) <> " " THEN return LET name$ = MID$(name$, 2, -1) GOTO 1010

2000 REM get the leading word and put it in word$ 2010 LET word$ = ""

60

2020 2030 2040 2050 2060

LET ch$ = LEFT$(name$, 1) IF ch$ < "A" OR ch$ > "z" THEN return LET word$ = word$ + ch$ LET name$ = MID$(name$, 2, -1) GOTO 2020

3000 REM END

61

ROT13 10 REM ROT13 CODE 15 LET CODE$ = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz" 20 INPUT A$ 30 FOR I = 1 TO LEN(A$) 40 LET B$ = MID$(A$,I, 1) 50 LET TAR = INSTR(CODE$, B$, 1) 60 IF TAR = 0 THEN 90 70 LET TAR = (TAR + 26) MOD 52 80 LET B$ = MID$(CODE$, TAR, 1) 90 PRINT B$; 100 NEXT I 110 PRINT "" 120 GOTO 20

62

Median 10 REM Median program. 20 LET N = 0 30 DIM array(N+1) 40 PRINT "Enter a number, q to quit" 50 INPUT line$ 60 IF line$ = "q" THEN 100 70 LET N = N + 1 80 LET array(N) = VAL(line$) 90 GOTO 30 100 PRINT N, "numbers entered" 105 IF N = 0 THEN 1000 106 IF N = 1 THEN 210 110 REM Bubble sort the numbers 120 LET flag = 0 130 LET i = 1 140 IF array(i) <= array(i+1) THEN 190 150 LET flag = 1 160 LET temp = array(i) 170 LET array(i) = array(i+1) 180 LET array(i+1) = temp 190 LET i = i + 1 195 IF i < N THEN 140 200 IF flag = 1 THEN 120 210 REM print out the middle 220 IF N MOD 2 = 0 THEN 250 230 LET mid = array( (N + 1) / 2) 240 GOTO 270 250 LET mid = array(N/2) + array(N/2+1) 260 LET mid = mid/2 270 PRINT "Median", mid 1000 REM end

63

Lander 10 REM Lunar lander program. 20 30 40 50

LET LET LET LET

dist = 100 v=1 fuel = 1000 mass = 1000

60 PRINT 70 PRINT 80 PRINT 90 PRINT negative" 100 110 115 116 117 120 130 140 150 160 170 180 190 200 210 220

"You are a in control of a lunar lander." "You are drifiting towards the surface of the moon." "Each turn you must decide how much fuel to burn." "To accelerate enter a positive number, to decelerate a

PRINT "Distance", dist, "km", "velocity", v, "km/s", "Fuel", fuel INPUT burn IF ABS(burn) <= fuel THEN 120 PRINT "You don't have that much fuel" GOTO 100 LET v = v + burn * 10 / (fuel + mass) LET fuel = fuel - ABS(burn) LET dist = dist - v IF dist > 0 THEN 100 PRINT "You have hit the surface" IF v < 3 THEN 210 PRINT "Hit surface too fast (", v,")km/s" PRINT "You Crash" GOTO 220 PRINT "Well done" REM END

64

/* driver file for MiniBasic by Malcolm Mclean Leeds University */ #include <stdio.h> #include <stdlib.h> #include "basic.h" char *loadfile(char *path); /* here is a simple script to play with */ char *script = "10 REM Test Script\n" "20 REM Tests the Interpreter\n" "30 REM By Malcolm Mclean\n" "35 PRINT \"HERE\" \n" "40 PRINT INSTR(\"FRED\", \"ED\", 4)\n" "50 PRINT VALLEN(\"12a\"), VALLEN(\"xyz\")\n" "60 LET x = SQRT(3.0) * SQRT(3.0)\n" "65 LET x = INT(x + 0.5)\n" "70 PRINT MID$(\"1234567890\", x, -1)\n" ; void usage(void) { printf("MiniBasic: a BASIC interpreter\n"); printf("usage:\n"); printf("Basic <script>\n"); printf("See documentation for BASIC syntax.\n"); exit(EXIT_FAILURE); } /* call with the name of the Minibasic script file */ int main(int argc, char **argv) { char *scr; if(argc == 1) { /* comment out usage call to run test script */ usage(); basic(script, stdin, stdout, stderr); }

65

else { scr = loadfile(argv[1]); if(scr) { basic(scr, stdin, stdout, stderr); free(scr); } } return 0; } /* function to slurp in an ASCII file Params: path - path to file Returns: malloced string containing whole file */ char *loadfile(char *path) { FILE *fp; int ch; long i = 0; long size = 0; char *answer; fp = fopen(path, "r"); if(!fp) { printf("Can't open %s\n", path); return 0; } fseek(fp, 0, SEEK_END); size = ftell(fp); fseek(fp, 0, SEEK_SET); answer = malloc(size + 100); if(!answer) { printf("Out of memory\n"); fclose(fp); return 0; } while( (ch = fgetc(fp)) != EOF) answer[i++] = ch; answer[i++] = 0; fclose(fp); return answer; }

66

#ifndef basic_h #define basic_h /* Minibasic header file By Malcolm Mclean */ int basic(const char *script, FILE *in, FILE *out, FILE *err); #endif

67

/******************************************************** * Mini BASIC * * by Malcolm McLean * * version 1.0 * ********************************************************/ #include #include #include #include #include #include #include #include

<stdio.h> <stdlib.h> <string.h> <stdarg.h> <math.h>

/* tokens defined */ #define EOS 0 #define VALUE 1 #define PI 2 #define E 3 #define #define #define #define #define #define #define #define #define

DIV 10 MULT 11 OPAREN 12 CPAREN 13 PLUS 14 MINUS 15 SHRIEK 16 COMMA 17 MOD 200

#define #define #define #define #define #define #define #define #define #define #define

ERROR 20 EOL 21 EQUALS 22 STRID 23 FLTID 24 DIMFLTID 25 DIMSTRID 26 QUOTE 27 GREATER 28 LESS 29 SEMICOLON 30

#define #define #define #define #define

PRINT 100 LET 101 DIM 102 IF 103 THEN 104

68

#define #define #define #define #define #define #define #define #define

AND 105 OR 106 GOTO 107 INPUT 108 REM 109 FOR 110 TO 111 NEXT 112 STEP 113

#define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define

SIN 5 COS 6 TAN 7 LN 8 POW 9 SQRT 18 ABS 201 LEN 202 ASCII 203 ASIN 204 ACOS 205 ATAN 206 INT 207 RND 208 VAL 209 VALLEN 210 INSTR 211

#define #define #define #define #define #define

CHRSTRING 300 STRSTRING 301 LEFTSTRING 302 RIGHTSTRING 303 MIDSTRING 304 STRINGSTRING 305

/* relational operators defined */ #define #define #define #define #define #define

ROP_EQ 1 ROP_NEQ 2 ROP_LT 3 ROP_LTE 4 ROP_GT 5 ROP_GTE 6

/* /* /* /* /* /*

equals */ doesn't equal */ less than */ less than or equals */ greater than */ greater than or equals */

/* error codes (in BASIC script) defined */ #define ERR_CLEAR 0 #define ERR_SYNTAX 1 #define ERR_OUTOFMEMORY 2 #define ERR_IDTOOLONG 3 #define ERR_NOSUCHVARIABLE 4 #define ERR_BADSUBSCRIPT 5

69

#define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define

ERR_TOOMANYDIMS 6 ERR_TOOMANYINITS 7 ERR_BADTYPE 8 ERR_TOOMANYFORS 9 ERR_NONEXT 10 ERR_NOFOR 11 ERR_DIVIDEBYZERO 12 ERR_NEGLOG 13 ERR_NEGSQRT 14 ERR_BADSINCOS 15 ERR_EOF 16 ERR_ILLEGALOFFSET 17 ERR_TYPEMISMATCH 18 ERR_INPUTTOOLONG 19 ERR_BADVALUE 20 ERR_NOTINT 21

#define MAXFORS 32 typedef struct { int no; const char *str; }LINE; typedef struct { char id[32]; double dval; char *sval; (malloced) */ } VARIABLE; typedef struct { char id[32]; int type; int ndims; int dim[5]; char **str; double *dval; } DIMVAR; typedef struct { int type; or FLTID or ERROR) */ char **sval; double *dval; } LVALUE;

/* maximum number of nested fors */

/* line number */ /* points to start of line */

/* id of variable */ /* its value if a real */ /* its value if a string

/* /* /* /* /* /*

id of dimensioned variable */ its type, STRID or FLTID */ number of dimensions */ dimensions in x y order */ pointer to string data */ pointer to real data */

/* type of variable (STRID /* pointer to string data */ /* pointer to real data */

70

typedef struct { char id[32]; int nextline; control passes */ double toval; double step; } FORLOOP;

/* id of control variable */ /* line below FOR to which /* terminal value */ /* step size */

static FORLOOP forstack[MAXFORS]; control */ static int nfors; stack */

/* stack for for loop /* number of fors on

static VARIABLE *variables; variables */ static int nvariables; */

/* the script's

static DIMVAR *dimvariables; */ static int ndimvariables; dimensioned arrays */

/* dimensioned arrays

/* number of variables

/* number of

static LINE *lines; starts */ static int nlines; BASIC lines in program */

/* list of line /* number of

static FILE *fpin; static FILE *fpout; static FILE *fperr;

/* input stream */ /* output strem */ /* error stream */

static const char *string; parsing */ static int token; (lookahead) */ static int errorflag; input encountered */

/* string we are /* current token /* set when error in

static int setup(const char *script); static void cleanup(void); static void reporterror(int lineno); static int findline(int no); static static static static

int line(void); void doprint(void); void dolet(void); void dodim(void);

71

static static static static static static

int doif(void); int dogoto(void); void doinput(void); void dorem(void); int dofor(void); int donext(void);

static void lvalue(LVALUE *lv); static int boolexpr(void); static int boolfactor(void); static int relop(void);

static static static static static static

double double double double double double

expr(void); term(void); factor(void); instr(void); variable(void); dimvariable(void);

static static static static static static static

VARIABLE *findvariable(const char *id); DIMVAR *finddimvar(const char *id); DIMVAR *dimension(const char *id, int ndims, ...); void *getdimvar(DIMVAR *dv, ...); VARIABLE *addfloat(const char *id); VARIABLE *addstring(const char *id); DIMVAR *adddimvar(const char *id);

static static static static static static static static static static

char char char char char char char char char char

*stringexpr(void); *chrstring(void); *strstring(void); *leftstring(void); *rightstring(void); *midstring(void); *stringstring(void); *stringdimvar(void); *stringvar(void); *stringliteral(void);

static int integer(double x); static static static static static

void match(int tok); void seterror(int errorcode); int getnextline(const char *str); int gettoken(const char *str); int tokenlen(const char *str, int token);

static int isstring(int token); static double getvalue(const char *str, int *len);

72

static void getid(const char *str, char *out, int *len); static static static static static *cat); static

void mystrgrablit(char *dest, const char *src); char *mystrend(const char *str, char quote); int mystrcount(const char *str, char ch); char *mystrdup(const char *str); char *mystrconcat(const char *str, const char double factorial(double x);

/* Interpret a BASIC script Params: script - the script to run in - input stream out - output stream err - error stream Returns: 0 on success, 1 on error condition. */ int basic(const char *script, FILE *in, FILE *out, FILE *err) { int curline = 0; int nextline; int answer = 0; fpin = in; fpout = out; fperr = err; if( setup(script) == -1 ) return 1; while(curline != -1) { string = lines[curline].str; token = gettoken(string); errorflag = 0; nextline = line(); if(errorflag) { reporterror(lines[curline].no); answer = 1; break; } if(nextline == -1) break; if(nextline == 0)

73

{ curline++; if(curline == nlines) break; } else { curline = findline(nextline); if(curline == -1) { if(fperr) fprintf(fperr, "line %d not found\n", nextline); answer = 1; break; } } } cleanup(); return answer; } /* Sets up all our globals, including the list of lines. Params: script - the script passed by the user Returns: 0 on success, -1 on failure */ static int setup(const char *script) { int i; nlines = mystrcount(script, '\n'); lines = malloc(nlines * sizeof(LINE)); if(!lines) { if(fperr) fprintf(fperr, "Out of memory\n"); return -1; } for(i=0;i

74

nlines--; } script = strchr(script, '\n'); script++; } if(!nlines) { if(fperr) fprintf(fperr, "Can't read program\n"); free(lines); return -1; } for(i=1;i

75

{ if(dimvariables[i].type == STRID) { if(dimvariables[i].str) { size = 1; for(ii=0;ii

76

fprintf(fperr, "Syntax error line %d\n", lineno); break; case ERR_OUTOFMEMORY: fprintf(fperr, "Out of memory line %d\n", lineno); break; case ERR_IDTOOLONG: fprintf(fperr, "Identifier too long line %d\n", lineno); break; case ERR_NOSUCHVARIABLE: fprintf(fperr, "No such variable line %d\n", lineno); break; case ERR_BADSUBSCRIPT: fprintf(fperr, "Bad subscript line %d\n", lineno); break; case ERR_TOOMANYDIMS: fprintf(fperr, "Too many dimensions line %d\n", lineno); break; case ERR_TOOMANYINITS: fprintf(fperr, "Too many initialisers line %d\n", lineno); break; case ERR_BADTYPE: fprintf(fperr, "Illegal type line %d\n", lineno); break; case ERR_TOOMANYFORS: fprintf(fperr, "Too many nested fors line %d\n", lineno); break; case ERR_NONEXT: fprintf(fperr, "For without matching next line %d\n", lineno); break; case ERR_NOFOR: fprintf(fperr, "Next without matching for line %d\n", lineno); break; case ERR_DIVIDEBYZERO: fprintf(fperr, "Divide by zero lne %d\n", lineno); break; case ERR_NEGLOG: fprintf(fperr, "Negative logarithm line %d\n", lineno); break; case ERR_NEGSQRT: fprintf(fperr, "Negative square root line %d\n", lineno); break; case ERR_BADSINCOS:

77

fprintf(fperr, "Sine or cosine out of range line %d\n", lineno); break; case ERR_EOF: fprintf(fperr, "End of input file %d\n", lineno); break; case ERR_ILLEGALOFFSET: fprintf(fperr, "Illegal offset line %d\n", lineno); break; case ERR_TYPEMISMATCH: fprintf(fperr, "Type mismatch line %d\n", lineno); break; case ERR_INPUTTOOLONG: fprintf(fperr, "Input too long line %d\n", lineno); break; case ERR_BADVALUE: fprintf(fperr, "Bad value at line %d\n", lineno); break; case ERR_NOTINT: fprintf(fperr, "Not an integer at line %d\n", lineno); break; default: fprintf(fperr, "ERROR line %d\n", lineno); break; } } /* binary search for a line Params: no - line number to find Returns: index of the line, or -1 on fail. */ static int findline(int no) { int high; int low; int mid; low = 0; high = nlines-1; while(high > low + 1) { mid = (high + low)/2; if(lines[mid].no == no) return mid; if(lines[mid].no > no) high = mid; else

78

low = mid; } if(lines[low].no == no) mid = low; else if(lines[high].no == no) mid = high; else mid = -1; return mid; } /* Parse a line. High level parse function */ static int line(void) { int answer = 0; const char *str; match(VALUE); switch(token) { case PRINT: doprint(); break; case LET: dolet(); break; case DIM: dodim(); break; case IF: answer = doif(); break; case GOTO: answer = dogoto(); break; case INPUT: doinput(); break; case REM: dorem(); return 0; break; case FOR: answer = dofor(); break; case NEXT:

79

answer = donext(); break; default: seterror(ERR_SYNTAX); break; } if(token != EOS) { /*match(VALUE);*/ /* check for a newline */ str = string; while(isspace(*str)) { if(*str == '\n') break; str++; } if(*str != '\n') seterror(ERR_SYNTAX); } return answer; } /* the PRINT statement */ static void doprint(void) { char *str; double x; match(PRINT); while(1) { if(isstring(token)) { str = stringexpr(); if(str) { fprintf(fpout, "%s", str); free(str); } } else { x = expr(); fprintf(fpout, "%g", x);

80

} if(token == COMMA) { fprintf(fpout, " "); match(COMMA); } else break; } if(token == SEMICOLON) { match(SEMICOLON); fflush(fpout); } else fprintf(fpout, "\n"); } /* the LET statement */ static void dolet(void) { LVALUE lv; char *temp; match(LET); lvalue(&lv); match(EQUALS); switch(lv.type) { case FLTID: *lv.dval = expr(); break; case STRID: temp = *lv.sval; *lv.sval = stringexpr(); if(temp) free(temp); break; default: break; } }

81

/* the DIM statement */ static void dodim(void) { int ndims = 0; double dims[6]; char name[32]; int len; DIMVAR *dimvar; int i; int size = 1; match(DIM); switch(token) { case DIMFLTID: case DIMSTRID: getid(string, name, &len); match(token); dims[ndims++] = expr(); while(token == COMMA) { match(COMMA); dims[ndims++] = expr(); if(ndims > 5) { seterror(ERR_TOOMANYDIMS); return; } } match(CPAREN); for(i=0;i

82

dimvar = dimension(name, 2, (int) dims[0], (int) dims[1]); break; case 3: dimvar = dimension(name, 3, (int) dims[0], (int) dims[1], (int) dims[2]); break; case 4: dimvar = dimension(name, 4, (int) dims[0], (int) dims[1], (int) dims[2], (int) dims[3]); break; case 5: dimvar = dimension(name, 5, (int) dims[0], (int) dims[1], (int) dims[2], (int) dims[3], (int) dims[4]); break; } break; default: seterror(ERR_SYNTAX); return; } if(dimvar == 0) { /* out of memory */ seterror(ERR_OUTOFMEMORY); return; }

if(token == EQUALS) { match(EQUALS); for(i=0;indims;i++) size *= dimvar->dim[i]; switch(dimvar->type) { case FLTID: i = 0; dimvar->dval[i++] = expr(); while(token == COMMA && i < size) { match(COMMA); dimvar->dval[i++] = expr(); if(errorflag) break; } break; case STRID:

83

i = 0; if(dimvar->str[i]) free(dimvar->str[i]); dimvar->str[i++] = stringexpr(); while(token == COMMA && i < size) { match(COMMA); if(dimvar->str[i]) free(dimvar->str[i]); dimvar->str[i++] = stringexpr(); if(errorflag) break; } break; } if(token == COMMA) seterror(ERR_TOOMANYINITS); } } /* the IF statement. if jump taken, returns new line no, else returns 0 */ static int doif(void) { int condition; int jump; match(IF); condition = boolexpr(); match(THEN); jump = integer( expr() ); if(condition) return jump; else return 0; } /* the GOTO satement returns new line number */ static int dogoto(void) { match(GOTO); return integer( expr() ); }

84

/* The FOR statement. Pushes the for stack. Returns line to jump to, or -1 to end program */ static int dofor(void) { LVALUE lv; char id[32]; char nextid[32]; int len; double initval; double toval; double stepval; const char *savestring; int answer; match(FOR); getid(string, id, &len); lvalue(&lv); if(lv.type != FLTID) { seterror(ERR_BADTYPE); return -1; } match(EQUALS); initval = expr(); match(TO); toval = expr(); if(token == STEP) { match(STEP); stepval = expr(); } else stepval = 1.0; *lv.dval = initval; if(nfors > MAXFORS - 1) { seterror(ERR_TOOMANYFORS); return -1; } if(stepval < 0 && initval < toval || stepval > 0 && initval > toval)

85

{ savestring = string; while(string = strchr(string, '\n')) { errorflag = 0; token = gettoken(string); match(VALUE); if(token == NEXT) { match(NEXT); if(token == FLTID || token == DIMFLTID) { getid(string, nextid, &len); if(!strcmp(id, nextid)) { answer = getnextline(string); string = savestring; token = gettoken(string); return answer ? answer : -1; } } } } seterror(ERR_NONEXT); return -1; } else { strcpy(forstack[nfors].id, id); forstack[nfors].nextline = getnextline(string); forstack[nfors].step = stepval; forstack[nfors].toval = toval; nfors++; return 0; } } /* the NEXT statement updates the counting index, and returns line to jump to */ static int donext(void) { char id[32]; int len; LVALUE lv; match(NEXT);

86

if(nfors) { getid(string, id, &len); lvalue(&lv); if(lv.type != FLTID) { seterror(ERR_BADTYPE); return -1; } *lv.dval += forstack[nfors-1].step; if( (forstack[nfors-1].step < 0 && *lv.dval < forstack[nfors-1].toval) || (forstack[nfors-1].step > 0 && *lv.dval > forstack[nfors-1].toval) ) { nfors--; return 0; } else { return forstack[nfors-1].nextline; } } else { seterror(ERR_NOFOR); return -1; } }

/* the INPUT statement */ static void doinput(void) { LVALUE lv; char buff[1024]; char *end; match(INPUT); lvalue(&lv); switch(lv.type) { case FLTID: while(fscanf(fpin, "%lf", lv.dval) != 1) { fgetc(fpin); if(feof(fpin)) {

87

seterror(ERR_EOF); return; } } break; case STRID: if(*lv.sval) { free(*lv.sval); *lv.sval = 0; } if( fgets(buff, sizeof(buff), fpin) == 0) { seterror(ERR_EOF); return; } end = strchr(buff, '\n'); if(!end) { seterror(ERR_INPUTTOOLONG); return; } *end = 0; *lv.sval = mystrdup(buff); if(!*lv.sval) { seterror(ERR_OUTOFMEMORY); return; } break; default: return; } } /* the REM statement. Note is unique as the rest of the line is not parsed */ static void dorem(void) { match(REM); return; }

88

/* Get an lvalue from the environment Params: lv - structure to fill. Notes: missing variables (but not out of range subscripts) are added to the variable list. */ static void lvalue(LVALUE *lv) { char name[32]; int len; VARIABLE *var; DIMVAR *dimvar; int index[5]; void *valptr = 0; int type; lv->type = ERROR; lv->dval = 0; lv->sval = 0; switch(token) { case FLTID: getid(string, name, &len); match(FLTID); var = findvariable(name); if(!var) var = addfloat(name); if(!var) { seterror(ERR_OUTOFMEMORY); return; } lv->type = FLTID; lv->dval = &var->dval; lv->sval = 0; break; case STRID: getid(string, name, &len); match(STRID); var = findvariable(name); if(!var) var = addstring(name); if(!var) { seterror(ERR_OUTOFMEMORY); return; }

89

lv->type = STRID; lv->sval = &var->sval; lv->dval = 0; break; case DIMFLTID: case DIMSTRID: type = (token == DIMFLTID) ? FLTID : STRID; getid(string, name, &len); match(token); dimvar = finddimvar(name); if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA); index[3] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2], index[3]); break; case 5: index[0] = integer( expr() );

90

match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA); index[3] = integer( expr() ); match(COMMA); index[4] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2], index[3]); break; } match(CPAREN); } else { seterror(ERR_NOSUCHVARIABLE); return; } if(valptr) { lv->type = type; if(type == FLTID) lv->dval = valptr; else if(type == STRID) lv->sval = valptr; else assert(0); } break; default: seterror(ERR_SYNTAX); } } /* parse a boolean expression consists of expressions or strings and relational operators, and parentheses */ static int boolexpr(void) { int left; int right; left = boolfactor(); while(1) {

91

switch(token) { case AND: match(AND); right = boolexpr(); return (left && right) ? 1 : 0; case OR: match(OR); right = boolexpr(); return (left || right) ? 1 : 0; default: return left; } } } /* boolean factor, consists of expression relop expression or string relop string, or ( boolexpr() ) */ static int boolfactor(void) { int answer; double left; double right; int op; char *strleft; char *strright; int cmp; switch(token) { case OPAREN: match(OPAREN); answer = boolexpr(); match(CPAREN); break; default: if(isstring(token)) { strleft = stringexpr(); op = relop(); strright = stringexpr(); if(!strleft || !strright) { if(strleft) free(strleft); if(strright) free(strright); return 0; }

92

cmp = strcmp(strleft, strright); switch(op) { case ROP_EQ: answer = cmp == 0 ? 1 : 0; break; case ROP_NEQ: answer = cmp == 0 ? 0 : 1; break; case ROP_LT: answer = cmp < 0 ? 1 : 0; break; case ROP_LTE: answer = cmp <= 0 ? 1 : 0; break; case ROP_GT: answer = cmp > 0 ? 1 : 0; break; case ROP_GTE: answer = cmp >= 0 ? 1 : 0; break; default: answer = 0; } free(strleft); free(strright); } else { left = expr(); op = relop(); right = expr(); switch(op) { case ROP_EQ: answer = break; case ROP_NEQ: answer = break; case ROP_LT: answer = break; case ROP_LTE: answer = break; case ROP_GT: answer = break; case ROP_GTE: answer =

(left == right) ? 1 : 0;

(left != right) ? 1 : 0;

(left < right) ? 1 : 0;

(left <= right) ? 1 : 0;

(left > right) ? 1 : 0;

(left >= right) ? 1 : 0;

93

break; default: errorflag = 1; return 0; } } } return answer; } /* get a relational operator returns operator parsed or ERROR */ static int relop(void) { switch(token) { case EQUALS: match(EQUALS); return ROP_EQ; case GREATER: match(GREATER); if(token == EQUALS) { match(EQUALS); return ROP_GTE; } return ROP_GT; case LESS: match(LESS); if(token == EQUALS) { match(EQUALS); return ROP_LTE; } else if(token == GREATER) { match(GREATER); return ROP_NEQ; } return ROP_LT; default: seterror(ERR_SYNTAX); return ERROR; } }

94

/* parses an expression */ static double expr(void) { double left; double right; left = term(); while(1) { switch(token) { case PLUS: match(PLUS); right = term(); left += right; break; case MINUS: match(MINUS); right = term(); left -= right; break; default: return left; } } } /* parses a term */ static double term(void) { double left; double right; left = factor(); while(1) { switch(token) { case MULT: match(MULT); right = factor(); left *= right; break; case DIV: match(DIV);

95

right = factor(); if(right != 0.0) left /= right; else seterror(ERR_DIVIDEBYZERO); break; case MOD: match(MOD); right = factor(); left = fmod(left, right); break; default: return left; } } } /* parses a factor */ static double factor(void) { double answer = 0; char *str; char *end; int len; switch(token) { case OPAREN: match(OPAREN); answer = expr(); match(CPAREN); break; case VALUE: answer = getvalue(string, &len); match(VALUE); break; case MINUS: match(MINUS); answer = -factor(); break; case FLTID: answer = variable(); break; case DIMFLTID: answer = dimvariable(); break; case E: answer = exp(1.0);

96

match(E); break; case PI: answer = acos(0.0) * 2.0; match(PI); break; case SIN: match(SIN); match(OPAREN); answer = expr(); match(CPAREN); answer = sin(answer); break; case COS: match(COS); match(OPAREN); answer = expr(); match(CPAREN); answer = cos(answer); break; case TAN: match(TAN); match(OPAREN); answer = expr(); match(CPAREN); answer = tan(answer); break; case LN: match(LN); match(OPAREN); answer = expr(); match(CPAREN); if(answer > 0) answer = log(answer); else seterror(ERR_NEGLOG); break; case POW: match(POW); match(OPAREN); answer = expr(); match(COMMA); answer = pow(answer, expr()); match(CPAREN); break; case SQRT: match(SQRT); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= 0.0)

97

answer = sqrt(answer); else seterror(ERR_NEGSQRT); break; case ABS: match(ABS); match(OPAREN); answer = expr(); match(CPAREN); answer = fabs(answer); break; case LEN: match(LEN); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = strlen(str); free(str); } else answer = 0; break; case ASCII: match(ASCII); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = *str; free(str); } else answer = 0; break; case ASIN: match(ASIN); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= -1 && answer <= 1) answer = asin(answer); else seterror(ERR_BADSINCOS); break; case ACOS: match(ACOS); match(OPAREN); answer = expr();

98

match(CPAREN); if(answer >= -1 && answer <= 1) answer = acos(answer); else seterror(ERR_BADSINCOS); break; case ATAN: match(ATAN); match(OPAREN); answer = expr(); match(CPAREN); answer = atan(answer); break; case INT: match(INT); match(OPAREN); answer = expr(); match(CPAREN); answer = floor(answer); break; case RND: match(RND); match(OPAREN); answer = expr(); match(CPAREN); answer = integer(answer); if(answer > 1) answer = floor(rand()/(RAND_MAX + 1.0) * answer); else if(answer == 1) answer = rand()/(RAND_MAX + 1.0); else { if(answer < 0) srand( (unsigned) -answer); answer = 0; } break; case VAL: match(VAL); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = strtod(str, 0); free(str); } else answer = 0; break;

99

case VALLEN: match(VALLEN); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { strtod(str, &end); answer = end - str; free(str); } else answer = 0.0; break; case INSTR: answer = instr(); break; default: if(isstring(token)) seterror(ERR_TYPEMISMATCH); else seterror(ERR_SYNTAX); break; } while(token == SHRIEK) { match(SHRIEK); answer = factorial(answer); } return answer; } /* calcualte the INSTR() function. */ static double instr(void) { char *str; char *substr; char *end; double answer = 0; int offset; match(INSTR); match(OPAREN); str = stringexpr(); match(COMMA); substr = stringexpr(); match(COMMA);

100

offset = integer( expr() ); offset--; match(CPAREN); if(!str || ! substr) { if(str) free(str); if(substr) free(substr); return 0; } if(offset >= 0 && offset < (int) strlen(str)) { end = strstr(str + offset, substr); if(end) answer = end - str + 1.0; } free(str); free(substr); return answer; } /* get the value of a scalar variable from string matches FLTID */ static double variable(void) { VARIABLE *var; char id[32]; int len; getid(string, id, &len); match(FLTID); var = findvariable(id); if(var) return var->dval; else { seterror(ERR_NOSUCHVARIABLE); return 0.0; } }

101

/* get value of a dimensioned variable from string. matches DIMFLTID */ static double dimvariable(void) { DIMVAR *dimvar; char id[32]; int len; int index[5]; double *answer; getid(string, id, &len); match(DIMFLTID); dimvar = finddimvar(id); if(!dimvar) { seterror(ERR_NOSUCHVARIABLE); return 0.0; } if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); answer = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA);

102

index[2] = integer( expr() match(COMMA); index[3] = integer( expr() answer = getdimvar(dimvar, index[2], index[3]); break; case 5: index[0] = integer( expr() match(COMMA); index[1] = integer( expr() match(COMMA); index[2] = integer( expr() match(COMMA); index[3] = integer( expr() match(COMMA); index[4] = integer( expr() answer = getdimvar(dimvar, index[2], index[3], index[4]); break;

); ); index[0], index[1],

); ); ); ); ); index[0], index[1],

} match(CPAREN); } if(answer) return *answer; return 0.0; } /* find a scalar variable invariables list Params: id - id to get Returns: pointer to that entry, 0 on fail */ static VARIABLE *findvariable(const char *id) { int i; for(i=0;i

103

/* get a dimensioned array by name Params: id (includes opening parenthesis) Returns: pointer to array entry or 0 on fail */ static DIMVAR *finddimvar(const char *id) { int i; for(i=0;i 5) return 0; dv = finddimvar(id); if(!dv) dv = adddimvar(id); if(!dv) { seterror(ERR_OUTOFMEMORY); return 0; } if(dv->ndims) { for(i=0;indims;i++) oldsize *= dv->dim[i];

104

} else oldsize = 0; va_start(vargs, ndims); for(i=0;itype) { case FLTID: dtemp = realloc(dv->dval, size * sizeof(double)); if(dtemp) dv->dval = dtemp; else { seterror(ERR_OUTOFMEMORY); return 0; } break; case STRID: if(dv->str) { for(i=size;istr[i]) { free(dv->str[i]); dv->str[i] = 0; } } stemp = realloc(dv->str, size * sizeof(char *)); if(stemp) { dv->str = stemp; for(i=oldsize;i<size;i++) dv->str[i] = 0; } else { for(i=0;istr[i]) { free(dv->str[i]); dv->str[i] = 0; } seterror(ERR_OUTOFMEMORY); return 0;

105

} break; default: assert(0); } for(i=0;i<5;i++) dv->dim[i] = dimensions[i]; dv->ndims = ndims; return dv; } /* get the address of a dimensioned array element. works for both string and real arrays. Params: dv - the array's entry in variable list ... - integers telling which array element to get Returns: the address of that element, 0 on fail */ static void *getdimvar(DIMVAR *dv, ...) { va_list vargs; int index[5]; int i; void *answer = 0; va_start(vargs, dv); for(i=0;indims;i++) { index[i] = va_arg(vargs, int); index[i]--; } va_end(vargs); for(i=0;indims;i++) if(index[i] >= dv->dim[i] || index[i] < 0) { seterror(ERR_BADSUBSCRIPT); return 0; } if(dv->type == FLTID) { switch(dv->ndims) { case 1: answer = &dv->dval[ index[0] ]; break; case 2:

106

answer = &dv->dval[ index[1] * dv->dim[0] + index[0] ]; break; case 3: answer = &dv->dval[ index[2] * (dv->dim[0] * dv>dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; case 4: answer = &dv->dval[ index[3] * (dv->dim[0] + dv->dim[1] + dv->dim[2]) + index[2] * (dv->dim[0] * dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; case 5: answer = &dv->dval[ index[4] * (dv->dim[0] + dv->dim[1] + dv->dim[2] + dv->dim[3]) + index[3] * (dv->dim[0] + dv->dim[1] + dv>dim[2]) + index[2] * (dv->dim[0] + dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; } } else if(dv->type = STRID) { switch(dv->ndims) { case 1: answer = &dv->str[ index[0] ]; break; case 2: answer = &dv->str[ index[1] * dv->dim[0] + index[0] ]; break; case 3: answer = &dv->str[ index[2] * (dv->dim[0] * dv>dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; case 4: answer = &dv->str[ index[3] * (dv->dim[0] + dv>dim[1] + dv->dim[2]) + index[2] * (dv->dim[0] * dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; case 5:

107

answer = &dv->str[ index[4] * (dv->dim[0] + dv>dim[1] + dv->dim[2] + dv->dim[3]) + index[3] * (dv->dim[0] + dv->dim[1] + dv>dim[2]) + index[2] * (dv->dim[0] + dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; } } return answer; } /* add a real varaible to our variable list Params: id - id of varaible to add. Returns: pointer to new entry in table */ static VARIABLE *addfloat(const char *id) { VARIABLE *vars; vars = realloc(variables, (nvariables + 1) * sizeof(VARIABLE)); if(vars) { variables = vars; strcpy(variables[nvariables].id, id); variables[nvariables].dval = 0; variables[nvariables].sval = 0; nvariables++; return &variables[nvariables-1]; } else seterror(ERR_OUTOFMEMORY); return 0; }

108

/* add a string variable to table. Params: id - id of variable to get (including trailing $) Retruns: pointer to new entry in table, 0 on fail. */ static VARIABLE *addstring(const char *id) { VARIABLE *vars; vars = realloc(variables, (nvariables + 1) * sizeof(VARIABLE)); if(vars) { variables = vars; strcpy(variables[nvariables].id, id); variables[nvariables].sval = 0; variables[nvariables].dval = 0; nvariables++; return &variables[nvariables-1]; } else seterror(ERR_OUTOFMEMORY); return 0; } /* add a new array to our symbol table. Params: id - id of array (include leading () Returns: pointer to new entry, 0 on fail. */ static DIMVAR *adddimvar(const char *id) { DIMVAR *vars; vars = realloc(dimvariables, (ndimvariables + 1) * sizeof(DIMVAR)); if(vars) { dimvariables = vars; strcpy(dimvariables[ndimvariables].id, id); dimvariables[ndimvariables].dval = 0; dimvariables[ndimvariables].str = 0; dimvariables[ndimvariables].ndims = 0; dimvariables[ndimvariables].type = strchr(id, '$') ? STRID : FLTID; ndimvariables++; return &dimvariables[ndimvariables-1];

109

} else seterror(ERR_OUTOFMEMORY); return 0; } /* high level string parsing function. Returns: a malloced pointer, or 0 on error condition. caller must free! */ static { char char char

char *stringexpr(void) *left; *right; *temp;

switch(token) { case DIMSTRID: left = mystrdup(stringdimvar()); break; case STRID: left = mystrdup(stringvar()); break; case QUOTE: left = stringliteral(); break; case CHRSTRING: left = chrstring(); break; case STRSTRING: left = strstring(); break; case LEFTSTRING: left = leftstring(); break; case RIGHTSTRING: left = rightstring(); break; case MIDSTRING: left = midstring(); break; case STRINGSTRING: left = stringstring(); break; default: if(!isstring(token)) seterror(ERR_TYPEMISMATCH); else

110

seterror(ERR_SYNTAX); return mystrdup(""); } if(!left) { seterror(ERR_OUTOFMEMORY); return 0; } switch(token) { case PLUS: match(PLUS); right = stringexpr(); if(right) { temp = mystrconcat(left, right); free(right); if(temp) { free(left); left = temp; } else seterror(ERR_OUTOFMEMORY); } else seterror(ERR_OUTOFMEMORY); break; default: return left; } return left; } /* parse the CHR$ token */ static char *chrstring(void) { double x; char buff[6]; char *answer; match(CHRSTRING); match(OPAREN); x = integer( expr() ); match(CPAREN);

111

buff[0] = (char) x; buff[1] = 0; answer = mystrdup(buff); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the STR$ token */ static char *strstring(void) { double x; char buff[64]; char *answer; match(STRSTRING); match(OPAREN); x = expr(); match(CPAREN); sprintf(buff, "%g", x); answer = mystrdup(buff); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the LEFT$ token */ static char *leftstring(void) { char *str; int x; char *answer; match(LEFTSTRING); match(OPAREN); str = stringexpr(); if(!str) return 0; match(COMMA); x = integer( expr() ); match(CPAREN); if(x > (int) strlen(str)) return str;

112

if(x < 0) { seterror(ERR_ILLEGALOFFSET); return str; } str[x] = 0; answer = mystrdup(str); free(str); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the RIGHT$ token */ static char *rightstring(void) { int x; char *str; char *answer; match(RIGHTSTRING); match(OPAREN); str = stringexpr(); if(!str) return 0; match(COMMA); x = integer( expr() ); match(CPAREN); if( x > (int) strlen(str)) return str; if(x < 0) { seterror(ERR_ILLEGALOFFSET); return str; } answer = mystrdup( &str[strlen(str) - x] ); free(str); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; }

113

/* parse the MID$ token */ static char *midstring(void) { char *str; int x; int len; char *answer; char *temp; match(MIDSTRING); match(OPAREN); str = stringexpr(); match(COMMA); x = integer( expr() ); match(COMMA); len = integer( expr() ); match(CPAREN); if(!str) return 0; if(len == -1) len = strlen(str) - x + 1; if( x > (int) strlen(str) || len < 1) { free(str); answer = mystrdup(""); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } if(x < 1.0) { seterror(ERR_ILLEGALOFFSET); return str; } temp = &str[x-1]; answer = malloc(len + 1); if(!answer) { seterror(ERR_OUTOFMEMORY); return str; } strncpy(answer, temp, len);

114

answer[len] = 0; free(str); return answer; } /* parse the string$ token */ static char *stringstring(void) { int x; char *str; char *answer; int len; int N; int i; match(STRINGSTRING); match(OPAREN); x = integer( expr() ); match(COMMA); str = stringexpr(); match(CPAREN); if(!str) return 0; N = x; if(N < 1) { free(str); answer = mystrdup(""); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } len = strlen(str); answer = malloc( N * len + 1 ); if(!answer) { free(str); seterror(ERR_OUTOFMEMORY); return 0; } for(i=0; i < N; i++) { strcpy(answer + len * i, str); }

115

free(str); return answer; } /* read a dimensioned string variable from input. Returns: pointer to string (not malloced) */ static char *stringdimvar(void) { char id[32]; int len; DIMVAR *dimvar; char **answer; int index[5]; getid(string, id, &len); match(DIMSTRID); dimvar = finddimvar(id); if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); answer = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA);

116

index[3] = integer( expr() answer = getdimvar(dimvar, index[2], index[3]); break; case 5: index[0] = integer( expr() match(COMMA); index[1] = integer( expr() match(COMMA); index[2] = integer( expr() match(COMMA); index[3] = integer( expr() match(COMMA); index[4] = integer( expr() answer = getdimvar(dimvar, index[2], index[3], index[4]); break;

); index[0], index[1],

); ); ); ); ); index[0], index[1],

} match(CPAREN); } else seterror(ERR_NOSUCHVARIABLE); if(!errorflag) if(*answer) return *answer; return ""; } /* parse a string variable. Returns: pointer to string (not malloced) */ static char *stringvar(void) { char id[32]; int len; VARIABLE *var; getid(string, id, &len); match(STRID); var = findvariable(id); if(var) { if(var->sval) return var->sval; return ""; }

117

seterror(ERR_NOSUCHVARIABLE); return ""; } /* parse a string literal Returns: malloced string literal Notes: newlines aren't allwed in literals, but blind concatenation across newlines is. */ static char *stringliteral(void) { int len = 1; char *answer = 0; char *temp; char *substr; char *end; while(token == QUOTE) { while(isspace(*string)) string++; end = mystrend(string, '"'); if(end) { len = end - string; substr = malloc(len); if(!substr) { seterror(ERR_OUTOFMEMORY); return answer; } mystrgrablit(substr, string); if(answer) { temp = mystrconcat(answer, substr); free(substr); free(answer); answer = temp; if(!answer) { seterror(ERR_OUTOFMEMORY); return answer; } } else answer = substr; string = end; } else

118

{ seterror(ERR_SYNTAX); return answer; } match(QUOTE); } return answer; } /* cast a double to an integer, triggering errors if out of range */ static int integer(double x) { if( x < INT_MIN || x > INT_MAX ) seterror( ERR_BADVALUE ); if( x != floor(x) ) seterror( ERR_NOTINT ); return (int) x; } /* check that we have a token of the passed type (if not set the errorflag) Move parser on to next token. Sets token and string. */ static void match(int tok) { if(token != tok) { seterror(ERR_SYNTAX); return; } while(isspace(*string)) string++; string += tokenlen(string, token); token = gettoken(string); if(token == ERROR) seterror(ERR_SYNTAX); }

119

/* set the errorflag. Params: errorcode - the error. Notes: ignores error cascades */ static void seterror(int errorcode) { if(errorflag == 0 || errorcode == 0) errorflag = errorcode; } /* get the next line number Params: str - pointer to parse string Returns: line no of next line, 0 if end Notes: goes to newline, then finds first line starting with a digit. */ static int getnextline(const char *str) { while(*str) { while(*str && *str != '\n') str++; if(*str == 0) return 0; str++; if(isdigit(*str)) return atoi(str); } return 0; } /* get a token from the string Params: str - string to read token from Notes: ignores white space between tokens */ static int gettoken(const char *str) { while(isspace(*str)) str++; if(isdigit(*str)) return VALUE; switch(*str) { case 0:

120

return EOS; case '\n': return EOL; case '/': return DIV; case '*': return MULT; case '(': return OPAREN; case ')': return CPAREN; case '+': return PLUS; case '-': return MINUS; case '!': return SHRIEK; case ',': return COMMA; case ';': return SEMICOLON; case '"': return QUOTE; case '=': return EQUALS; case '<': return LESS; case '>': return GREATER; default: if(!strncmp(str, "e", 1) && !isalnum(str[1])) return E; if(isupper(*str)) { if(!strncmp(str, "SIN", 3) && !isalnum(str[3])) return SIN; if(!strncmp(str, "COS", 3) && !isalnum(str[3])) return COS; if(!strncmp(str, "TAN", 3) && !isalnum(str[3])) return TAN; if(!strncmp(str, "LN", 2) && !isalnum(str[2])) return LN; if(!strncmp(str, "POW", 3) && !isalnum(str[3])) return POW; if(!strncmp(str, "PI", 2) && !isalnum(str[2])) return PI; if(!strncmp(str, "SQRT", 4) && !isalnum(str[4])) return SQRT; if(!strncmp(str, "PRINT", 5) && !isalnum(str[5])) return PRINT; if(!strncmp(str, "LET", 3) && !isalnum(str[3]))

121

return LET; if(!strncmp(str, return DIM; if(!strncmp(str, return IF; if(!strncmp(str, return THEN; if(!strncmp(str, return AND; if(!strncmp(str, return OR; if(!strncmp(str, return GOTO; if(!strncmp(str, return INPUT; if(!strncmp(str, return REM; if(!strncmp(str, return FOR; if(!strncmp(str, return TO; if(!strncmp(str, return NEXT; if(!strncmp(str, return STEP;

"DIM", 3) && !isalnum(str[3])) "IF", 2) && !isalnum(str[2])) "THEN", 4) && !isalnum(str[4])) "AND", 3) && !isalnum(str[3])) "OR", 2) && !isalnum(str[2])) "GOTO", 4) && !isalnum(str[4])) "INPUT", 5) && !isalnum(str[5])) "REM", 3) && !isalnum(str[3])) "FOR", 3) && !isalnum(str[3])) "TO", 2) && !isalnum(str[2])) "NEXT", 4) && !isalnum(str[4])) "STEP", 4) && !isalnum(str[4]))

if(!strncmp(str, "MOD", 3) && !isalnum(str[3])) return MOD; if(!strncmp(str, "ABS", 3) && !isalnum(str[3])) return ABS; if(!strncmp(str, "LEN", 3) && !isalnum(str[3])) return LEN; if(!strncmp(str, "ASCII", 5) && !isalnum(str[5])) return ASCII; if(!strncmp(str, "ASIN", 4) && !isalnum(str[4])) return ASIN; if(!strncmp(str, "ACOS", 4) && !isalnum(str[4])) return ACOS; if(!strncmp(str, "ATAN", 4) && !isalnum(str[4])) return ATAN; if(!strncmp(str, "INT", 3) && !isalnum(str[3])) return INT; if(!strncmp(str, "RND", 3) && !isalnum(str[3])) return RND; if(!strncmp(str, "VAL", 3) && !isalnum(str[3])) return VAL; if(!strncmp(str, "VALLEN", 6) && !isalnum(str[6])) return VALLEN; if(!strncmp(str, "INSTR", 5) && !isalnum(str[5])) return INSTR;

122

if(!strncmp(str, "CHR$", 4)) return CHRSTRING; if(!strncmp(str, "STR$", 4)) return STRSTRING; if(!strncmp(str, "LEFT$", 5)) return LEFTSTRING; if(!strncmp(str, "RIGHT$", 6)) return RIGHTSTRING; if(!strncmp(str, "MID$", 4)) return MIDSTRING; if(!strncmp(str, "STRING$", 7)) return STRINGSTRING; } /* end isupper() */ if(isalpha(*str)) { while(isalnum(*str)) str++; switch(*str) { case '$': return str[1] == '(' ? DIMSTRID : STRID; case '(': return DIMFLTID; default: return FLTID; } } return ERROR; } } /* get the length of a token. Params: str - pointer to the string containing the token token - the type of the token read Returns: length of the token, or 0 for EOL to prevent it being read past. */ static int tokenlen(const char *str, int token) { int len = 0; char buff[32]; switch(token) { case EOS: return 0;

123

case EOL: return 1; case VALUE: getvalue(str, &len); return len; case DIMSTRID: case DIMFLTID: case STRID: getid(str, buff, &len); return len; case FLTID: getid(str, buff, &len); return len; case PI: return 2; case E: return 1; case SIN: return 3; case COS: return 3; case TAN: return 3; case LN: return 2; case POW: return 3; case SQRT: return 4; case DIV: return 1; case MULT: return 1; case OPAREN: return 1; case CPAREN: return 1; case PLUS: return 1; case MINUS: return 1; case SHRIEK: return 1; case COMMA: return 1; case QUOTE: return 1; case EQUALS: return 1; case LESS: return 1;

124

case GREATER: return 1; case SEMICOLON: return 1; case ERROR: return 0; case PRINT: return 5; case LET: return 3; case DIM: return 3; case IF: return 2; case THEN: return 4; case AND: return 3; case OR: return 2; case GOTO: return 4; case INPUT: return 5; case REM: return 3; case FOR: return 3; case TO: return 2; case NEXT: return 4; case STEP: return 4; case MOD: return 3; case ABS: return 3; case LEN: return 3; case ASCII: return 5; case ASIN: return 4; case ACOS: return 4; case ATAN: return 4; case INT: return 3; case RND:

125

return 3; case VAL: return 3; case VALLEN: return 6; case INSTR: return 5; case CHRSTRING: return 4; case STRSTRING: return 4; case LEFTSTRING: return 5; case RIGHTSTRING: return 6; case MIDSTRING: return 4; case STRINGSTRING: return 7; default: assert(0); return 0; } } /* test if a token represents a string expression Params: token - token to test Returns: 1 if a string, else 0 */ static int isstring(int token) { if(token == STRID || token == QUOTE || token == DIMSTRID || token == CHRSTRING || token == STRSTRING || token == LEFTSTRING || token == RIGHTSTRING || token == MIDSTRING || token == STRINGSTRING) return 1; return 0; }

126

/* get a numerical value from the parse string Params: str - the string to search len - return pinter for no chars read Retuns: the value of the string. */ static double getvalue(const char *str, int *len) { double answer; char *end; answer = strtod(str, &end); assert(end != str); *len = end - str; return answer; } /* getid - get an id from the parse string: Params: str - string to search out - id output [32 chars max ] len - return pointer for id length Notes: triggers an error if id > 31 chars the id includes the $ and ( qualifiers. */ static void getid(const char *str, char *out, int *len) { int nread = 0; while(isspace(*str)) str++; assert(isalpha(*str)); while(isalnum(*str)) { if(nread < 31) out[nread++] = *str++; else { seterror(ERR_IDTOOLONG); break; } } if(*str == '$') { if(nread < 31) out[nread++] = *str++; else seterror(ERR_IDTOOLONG); } if(*str == '(') {

127

if(nread < 31) out[nread++] = *str++; else seterror(ERR_IDTOOLONG); } out[nread] = 0; *len = nread; }

/* grab a literal from the parse string. Params: dest - destination string src - source string Notes: strings are in quotes, double quotes the escape */ static void mystrgrablit(char *dest, const char *src) { assert(*src == '"'); src++; while(*src) { if(*src == '"') { if(src[1] == '"') { *dest++ = *src; src++; src++; } else break; } else *dest++ = *src++; } *dest++ = 0; }

128

/* find where a source string literal ends Params: src - string to check (must point to quote) quote - character to use for quotation Returns: pointer to quote which ends string Notes: quotes escape quotes */ static char *mystrend(const char *str, char quote) { assert(*str == quote); str++; while(*str) { while(*str != quote) { if(*str == '\n' || *str == 0) return 0; str++; } if(str[1] == quote) str += 2; else break; } return (char *) (*str? str : 0); } /* Count the instances of ch in str Params: str - string to check ch - character to count Returns: no time chs occurs in str. */ static int mystrcount(const char *str, char ch) { int answer = 0; while(*str) { if(*str++ == ch) answer++; } return answer; }

129

/* duplicate a string: Params: str - string to duplicate Returns: malloced duplicate. */ static char *mystrdup(const char *str) { char *answer; answer = malloc(strlen(str) + 1); if(answer) strcpy(answer, str); return answer; } /* concatenate two strings Params: str - firsts string cat - second string Returns: malloced string. */ static char *mystrconcat(const char *str, const char *cat) { int len; char *answer; len = strlen(str) + strlen(cat); answer = malloc(len + 1); if(answer) { strcpy(answer, str); strcat(answer, cat); } return answer; }

130

/* compute x! */ static double factorial(double x) { double answer = 1.0; double t; if( x > 1000.0) x = 1000.0; for(t=1;t<=x;t+=1.0) answer *= t; return answer; }

131

How to write a BASIC Interpreter By Malcolm Mclean

© Copyright all rights reserved (except for permission to use source code as described in text)

Introduction MiniBasic is designed as a simple programming language, based on BASIC. If you already know BASIC then you are well on your way to learning MiniBasic, if you don’t then MiniBasic is one of the simplest programming languages to learn. MiniBasic programs are written in ASCII script. They are then interpreted by the computer. This is in contrast to most “serious” languages, which are compiled, that is, translated into machine instructions and then run. Interpreted languages are slower than compiled languages, but they have several advantages. One major one is that they are portable – a MiniBasic script will run on any computer that has a MiniBasic interpreter installed. Another advantage, especially for beginners, is that errors are much easier to identify. Finally, MiniBasic is not really intended as a standalone program, except for teaching purposes. It is meant for incorporation into other products, where the user is expected to provide functions in a general-purpose programming language. An example might be a desk calculator which can be extended to provide user-defined functions like the Fibonnaci series, or an adventure game for which the user can design his own levels. For technical reasons, this is much easier to implement as an interpreted rather than a compiled language. One design goal of MiniBasic was that it should be easy to learn. Millions of people already know some BASIC from school or through having a microcomputer in the 1980s. The second design goal was that it should be easy to implement. The interpreter is written in portable ANSI C, and is freely available. It is in a single, reasonable-length source, and is available for incorporation into user programs. The final goal is that the interpreter should be what is technically known as “Turing equivalent”. This means that it is possible to implement any algorithm in MiniBasic. This required one major extension to common Basic – the ability to redimension arrays. It is impossible to implement graphics commands in portable ANSI C, so sound, graphics, and mice are not supported in MiniBasic. Interaction with the user in the standalone model is via the console. However, where MiniBasic is incorporated into another program, generally there will not be direct interaction with the user. The caller will create temporary files for input and output.

2

The first program You are now ready to write your first program in MiniBasic. Traditionally, this is “Hello World”. Firstly you need to install the MiniBasic interpreter. On a PC this is done by copying the executable MiniBasic.exe to your hard drive. Then you open a text editor and type 10 PRINT “Hello World”

Remember to terminate with a newline. Save as “Hello.mb” (the extension is optional). You then call the interpreter by typing “MiniBasic Hello.mb” in a command prompt. You should see the output Hello World All MiniBasic programs have line numbers. Execution begins with the first line and ends with the last line. Lines must be in order and every statement must have a number. The number must be the first character in the line. However, we can spread long strings (sequences of characters) over several lines. This second program 10 PRINT “In the beginning God created the heavens and the Earth” “and the Earth was without form and void “

Will output a string too long to easily fit on one line. Note that the second line must begin with a space character, to indicate that it is a continuation of the first line.

3

The second program There is very little point in a program that outputs something but has no input. So for our second program we will use the command INPUT. 10 20 30 40 50

PRINT INPUT PRINT INPUT PRINT

“Input first number “ x “Input second number” y “X + Y is”, x + y

INPUT will get two numbers that you type in the command prompt. It ignores any non-numeric characters, and translates the first number that you see. The comma separates items to print, and also tells the computer to insert a space. It is also possible to input strings of characters. To do this, we use what is called a “string variable”. A string variable always ends with the dollar character ($), and contains text rather than numbers. 10 PRINT “What is your name?” 20 INPUT n$ 30 PRINT “Hello”, n$

When inputting a string, INPUT reads up to the newline (which it discards). We can use the ‘+’ operator, but not any others, on string variables. 10 20 30 40 50

PRINT INPUT PRINT INPUT PRINT

“What is your first name?” fname$ “What is your second name?” sname$ “Hello”, fname$ + sname$

Notice that this program has a bug. The ‘+’ operator doesn’t insert a space, so unless you inadvertently added a space the program prints “FREDBLOGGS”. Try modifying the program by inserting a space between the two names.

4

The third program Now we need to get to the core of MiniBasic, the “LET” statement. MiniBasic will evaluate arbitrarily complicated arithmetical expressions. The operators allowed are the familiar ‘+’ ‘-‘ ‘*’ and ‘/’, and also MOD (modulus). Use parentheses ‘(‘ ‘)’ to disambiguate the order of evaluation. 10 20 30 40

PRINT “Enter temperature in Fahrenheit” INPUT f LET c = (f – 32) / 1.8 PRINT “=”, c, “Celsius”

As well as these, there are a large number of mathematical functions built into MiniBasic, for example POW(x,y), which does exponentiation, SQRT(x) (square root), SIN(x), COS(x) and TAN(x), sine, cosine and tangent. All the trigonometric functions take or return radians. The logarithm function, LN(x), takes a natural logarithm. There are also two mathematical constants, PI and e (Euler’s number). Be careful not to use these as variable names. To convert radians to degrees, divide by 2 * PI and multiply by 360. To convert a natural, base e log to log10, divide by LN(10). LET also works on string variables. String variables always end with the character ‘$’, as do string functions. 10 20 30 40

PRINT “What is your name?” INPUT name$ LET name$ = “SIR” + “ “ + name$ PRINT “Arise,”, name$

Note that expressions such as LET x = x + 1

are legal and are in fact very useful. Variable names must be shorter than 31 characters, and mustn’t duplicate any MiniBasic keywords.

5

The fourth program Programs often need to make branching decisions. In MiniBasic this is provided by the IF ... THEN statement. 10 20 30 40 50 60

REM Square root program. PRINT “Enter a number” INPUT x REM Square root of negative is not allowed IF x < 0 THEN 20 PRINT “Square root of”, x, “is”, SQRT(x)

We have also introduced the REM statement. This simply adds comments to make the program easier for a human to understand. REM statements are ignored by the interpreter. This program is actually not very effective. Really we should tell the user what is wrong. For this, we need the GOTO statement. A GOTO simply executes an unconditional jump. 10 20 30 40 50 60 70 80

REM Improved square root program PRINT “Enter a number” INPUT x REM Prompt user if negative IF x >= 0 THEN 80 PRINT “Number may not be negative” GOTO 30 PRINT “Square root of ”, x, “is”, SQRT(x)

An IF ... THEN statement always takes a line number as an argument. This can be of the form IF x < y THEN b, as long as b holds a valid line number. The operators recognised by the IF ... THEN statement are ‘=’, ‘<>’ (not equals), ‘>’, ‘<’, ‘>=’, ‘<=’. We can also use AND and OR to build up complex tests IF age >= 18 AND age < 65 THEN x

Use parentheses to disambiguate lengthy tests. IF age >= 18 AND (age < 65 OR job$ = “Caretaker”)

The IF ... THEN operators can also be applied to string variables. In this case the strings are ordered alphabetically.

6

The fifth program To enable MiniBasic to compute complicated functions we need access to arbitrary amounts of memory. For this we have the DIM statement. It creates a special type of variable known as an array. 10 REM Calendar program. 20 DIM months$(12) = "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" 30 PRINT "Day of birth?" 40 INPUT day 50 PRINT "Month?" 60 INPUT month 70 REM Make sure day and month are legal 80 IF day >= 1 AND day <= 31 AND month >= 1 AND month <= 12 THEN 110 90 PRINT "That's impossible" 100 GOTO 30 110 IF day <> INT(day) OR month <> INT(month) THEN 90 120 PRINT "Your birthday is", day, months$(month)

You might want to modify this program to contain another array, this time a numerical one, containing the lengths of the months. For the ambitious, you could also input the year, and check for February 29th. Arrays can have up to five dimensions. For instance you might want to hold a chessboard in a 2d array DIM board(8,8)

It is possible to redimension arrays. For a 2d or higher array this effectively scrambles the contents, but one dimensional arrays are preserved. For instance this program will enter an arbitrary number of values into an array 10 20 30 40 50 60 70 80 90

REM Median program. LET N = 0 DIM array(N+1) PRINT "Enter a number, q to quit" INPUT line$ IF line$ = "q" THEN 100 LET N = N + 1 LET array(N) = VAL(line$) GOTO 30

7

100 PRINT N, "numbers entered" 105 IF N = 0 THEN 1000 106 IF N = 1 THEN 210 110 REM Bubble sort the numbers 120 LET flag = 0 130 LET i = 1 140 IF array(i) <= array(i+1) THEN 190 150 LET flag = 1 160 LET temp = array(i) 170 LET array(i) = array(i+1) 180 LET array(i+1) = temp 190 LET i = i + 1 195 IF i < N THEN 140 200 IF flag = 1 THEN 120 210 REM print out the middle 220 IF N MOD 2 = 0 THEN 250 230 LET mid = array( (N + 1) / 2) 240 GOTO 270 250 LET mid = array(N/2) + array(N/2+1) 260 LET mid = mid/2 270 PRINT "Median", mid 1000 REM end

8

The sixth program It is possible to manipulate arrays of values just using IF ... THEN and GOTO, but it soon becomes very clumsy. For this reason MiniBasic includes FOR NEXT loops. Say we want to print out an array 10 REM Prints the days of the month 20 DIM months$(12) = “Jan”, “Feb”, “Mar”, “Apr”, “May”, “Jun”, “Jul”, “Aug”, “Sep”, “Oct”, “Nov”, “Dec” 30 FOR I = 1 TO 12 40 PRINT Month”, I, months$(I) 50 NEXT I 60 PRINT “Done”

FOR loops can be nested, 10 FOR I = 1 TO 8 20 FOR J = 1 TO 8 30 PRINT I, J 40 NEXT I 50 NEXT J

It is also possible to provide a STEP value other than one. FOR I = 100 TO 0 STEP –2

If you specify a null loop, such as FOR I = 10 TO 0, control will pass over the loop body and go to the next matching NEXT statement. The control variable must always be in the NEXT statement. FOR ... TO loops can take complex expressions, such as FOR I = x TO x * x, in these cases the initial value, the end value, and the step value are calculated once and then never modified. FOR ... NEXT loops can be nested up to 32 deep. If you attempt to jump out of a loop then you are likely to trigger errors

9

10 20 30 40 50 60

REM Bad use of a for loop LET X = 0 FOR I = 1 TO 10 INPUT Y IF Y < 0 THEN 10 NEXT I

However you may alter the counting variable within the loop. This can be used to force premature loop termination.

10

The seventh program We are now ready to put everything together. MiniBasic has facilities for input, output, mathematical and lexical calculation, flow control, and multi-dimensional arrays. String handling may be different to what you are used to in other versions of BASIC. In MiniBasic all functions that return a string end with the character ‘$’. These include CHR$(), STR$(), LEFT$(), RIGHT$(), MID$() and STRING$(). There is no limit to string length other than the computer’s memory. Some functions take a string argument, but return a numerical variable. These include LEN(), the length of the string, and ASCII() – the ASCII code of the first character of the string, also VAL() – the numerical value of string of digits. Internally, the NUL character, ASCII value 0, is used to terminate strings. The empty string “” consists of a single NUL. Here is a program which inputs a full name, checks for validity, and stores it in an array of variables. 10 20 30 40 50

REM String-handling program REM Inputs a name, tests for validity REM and breaks up into parts. PRINT "Enter your full name" INPUT name$

60 REM First check for non-English characters 70 LET flag = 0 80 FOR I = 1 TO LEN(name$) 90 LET ch$ = MID$(name$, I,1) 100 IF (ch$ >= "A" AND ch$ <= "z") OR ch$ = " " THEN 140 110 LET flag = 1 120 REM This forces the loop to stop 130 LET I = LEN(name$) 140 NEXT I 150 IF flag = 0 THEN 180 160 PRINT "Non-English letter,", ch$ 170 GOTO 40 180 190 200 210 220

REM Jump to subroutine LET return = 210 GOTO 1000 IF name$ = "" THEN 280 LET return = 240

11

230 240 250 260 270

GOTO 2000 LET N = N + 1 DIM out$(N) LET out$(N) = word$ GOTO 180

280 285 290 300 310 320 330

REM Print out the name PRINT "Name accepted" FOR I = 1 TO N PRINT out$(I) + " "; NEXT I PRINT "" GOTO 3000

1000 1010 1020 1030

REM strips the leading space IF LEFT$(name$, 1) <> " " THEN return LET name$ = MID$(name$, 2, -1) GOTO 1010

2000 2010 2020 2030 2040 2050 2060

REM get the leading word and put it in word$ LET word$ = "" LET ch$ = LEFT$(name$, 1) IF ch$ < "A" OR ch$ > "z" THEN return LET word$ = word$ + ch$ LET name$ = MID$(name$, 2, -1) GOTO 2020

3000 REM END

12

Keywords by type Arithmetical operators + - / * () ! MOD

Mathematical constants PI, e

Mathematical functions SIN, COS, TAN, ASIN, ACOS, ATAN, LN, POW, SQRT, INT RND

String functions that return a numerical value LEN, VAL, ASCII, INSTR, VALLEN

Statements PRINT, LET, DIM, IF, GOTO, INPUT, REM, FOR, NEXT

Auxiliary keywords THEN, AND, OR, TO, STEP

Functions that return a string CHR$, STR$, LEFT$, RIGHT$, MID$, STRING$

13

Keywords alphabetically e ACOS AND ASCII ASIN ATAN CHR$ COS DIM FOR GOTO IF INPUT INSTR INT LEFT$ LEN LET LN MID$ MOD NEXT OR PI POW PRINT REM RIGHT$ RND SIN SQRT STEP STR$ STRING$ TAN THEN TO VAL VALLEN

14

1) Expressions. All expressions are evaluated using floating-point arithmetic. The + and – operators have lower precedence than *, / and MOD (modulus), which have equal precedence and are evaluated left to right. ! (factorial) has the highest precedence. There is no exponentiation operator (use the POW() function instead). There are two mathematical constants, e, Euler’s number, 2.71281828, and PI, 3.14159265. It is possible to use variables or dimensioned variables in expressions. Typical expressions are 10

- absolute value 10

x (x + y) * 2 array(1,2)

- value of variable x - add x to y and multiply by two - value of array element 1, 2

POW( x + y, 2) + array(1, LEN(A$)) - raise x + y to the power 2 and

add an element of the array “array” given by 1 and the length of A$. MOD calculates the floating point modulus of a number. Both sides of the expression should be of the same sign. x MOD 0 is an error. Division by zero is also an error. Floating point arithmetic is not exact, so expressions such as SQRT(2.0) * SQRT(2.0) may not be exactly whole numbers. Using the function INT(x + 0.5) you can force an expression to be the nearest exact integer. Arrays are stored with the x dimension in the first column. This matters when initialising an 2d array with a list of values. 10 DIM a(4,4) = 1, 2, 3, 4, 5, 6, 7, 8 9, 10, 11, 12, 13, 14, 15, 16

15

will create an ascending list in a(x,y) order. Array indices start from 1 and finish at the dimension size. Thus a(1,1) is the first element of the array, and a(4.4) the highest. Note that there must be no space between an array name and the first parenthesis.

16

2) String expressions. All strings are stored internally in ASCII format, as NUL-terminated arrays. Use of extremely large strings is likely to slow down the program, since most operations involve internal copying of strings. A string literal consists of one or more concatenated quotes. A string can be spread over several lines, but the newline character is not allowed inside quotes. To enclose a quotation mark in astring, use double quotes. 10 LET A$ = “And God said “”Let there be light”” “ “and there was light.” “And God saw the light, that it was good.”

is an example of a legal string. Note that the start of the second line contains white space at the beginning to tell the interpreter it is a continuation of the previous line. To add a newline or other control character, use the CHR$() function. Note that CHR$(0) will prematurely terminate the string. Use the ASCII() function to perform numerical manipulation on characters. e.g LET B$ = CHR$( ASCII(B$) + 1) will set B$ to the next letter of the alphabet. The ‘+’ operator will concatenate strings. 10 PRINT “Fred” + “Bloggs” + CHR$(42) + x$

Will print FredBloggs* followed by the contents of x$ Functions with names ending in ‘$’ always return strings. Parentheses are not optional.

17

3) Relational expressions. Relational expressions are used only in IF ... THEN statements to make conditional jumps. A relational expression evaluates to either true or false. The allowed operators a =, <> (doesn’t equal), >, >=, <, <=. With expressions the comparison is numerical, and with strings it is alphabetical. Both sides of a relational operator must be of the same type. Relational expressions can contain the keywords AND and OR. Order of evaluation is left to right, but parentheses should always be used to disambiguate mixed expressions. Examples of use 10 IF (x <= 5 AND x > 0) OR x = 10 THEN 100

18

Alphabetical list of keywords Each keyword in MiniBasic is listed alphabetically, with a brief description of its function and how to use it.

e – Euler’s number. The mathematical constant e, 2.71281828... The base of natural logarithms, and used in many formulae. Usage e 10 LET y = POW(e, -x)

ACOS – arc-cosine. Calculates the inverse cosine of a number. The result is in radians. Input must be between –1.0 and 1.0. Usage num = ACOS(numeric) 10 LET rad = ACOS(x)

AND – logical operator. Used in IF ... THEN statements to perform two tests. If both are true then the test succeeds. Note it cannot be used as a bitwise AND operator, as in some programming languages. Usage IF relational AND relational THEN numeric 10 IF name$ = “Fred” AND age > 18 AND age <= 65 THEN 100

19

ASCII – get the numerical code for a character. Calculates the computer’s internal code for the first character in a string, or 0 if the empty string is passed. It is useful for performing direct manipulations on the representation, for instance testing for newlines (code 13). Usage num = ASCII( string ) LET x = ASCII(“*abc”)

x now contains the code for an asterisk, or 42.

ASIN – arc-sine. Calculates the inverse sine of a number. The output is in radians. The input must be between –1.0 and 1.0 Usage num = ASIN(numeric) 10 LET rad = ASIN(x)

ATAN – arc-tangent Calculates the inverse tangent of a number. The output is in radians. Note that for very extreme values accuracy may be lost. Usage num = ATAN(numeric); 10 LET rad = ATAN(x)

20

CHR$ - convert ASCII value to a string. Converts the computer’s internal numerical character code to a MiniBasic string of one letter. It is useful for performing numerical manipulations with the code. For instance, to insert a newline call CHR$(13). Usage str = CHR$(numeric) 10 LET X$ = “Line 1” + CHR$(13) + “Line2”

X$ now contains “Line 1” and “Line 2” separated by a newline character.

COS – cosine Calculates the cosine of an angle. The input must be in radians. Usage num = COS(numeric) 10 LET x = COS(degrees/180 * PI)

DIM – dimension an array. Use DIM to create a named list of numbers or strings. This is extremely useful when dealing with large amounts of data. For instance, if you a writing a program for a company with several employees, you can DIM an array to hold all their names. Arrays can have up to five dimensions. In practise, even on modern computers, memory fills up very fast with big arrays, and three dimensions is the maximum recommended. There must be no space between the name of the dimensioned variable and the opening parenthesis. 10 DIM name$(100)

Creates an array of 100 names.

21

10 DIM map(width, height)

Creates a 2d array of width * height entries, maybe representing grid squares on a map. Array elements are in the range 1 – maximum, so 20 LET map(1,10) = 2.0

sets the top left element to 2.0. map(width, height) is the bottom right element. If you try to access out-of-range elements the computer will throw an error. MiniBasic allows you to resize an array at any point by calling DIM on it again. If the array is one-dimensional, elements will be preserved. If the array has higher dimensions then the elements will be scrambled. Resizing an array is useful if, say, you are inputting a list of employee names and don’t know how many there will be. Arrays of zero dimensions may not be declared. MiniBasic also allows you to initialise arrays when you dimension them. This 10 DIM days$[7] = “Mon”, “Tue”, “Wed”, “Thur”, “Fri”, “Sat”, “Sun”

will declare an array of days of the week. This method is useful for defining data. For 2d arrays, the first dimension is the lowest (x) dimension, so DIM name$(2, 4) = “Fred”, “Bloggs”, “Joe”, “Sixpack” “Homer”, “Simpson” “John”, “Doe”

is the correct order. Dimensioned variables are intimately connected with FOR ... NEXT loops. Use the loop counter to index into your array.

22

Usage DIM id(numeric, numeric) 10 DIM array(10)

Creates a single-dimensioned array of 10 numerical elements 10 DIM dictionary$(2, N)

Creates a 2-dimensional array of N * 2 strings 10 DIM factorial(10) = 1!, 2!, 3!, 4!, 5!, 6!, 7!, 8!, 9!, 10!

Creates a list of the first ten factorials

FOR -

start a for loop

FOR ... NEXT loops are extremely useful in programming. The FOR statement consists of three parts, the initial set-up value, a TO value, and an optional STEP value. 10 20 30 40

DIM array(100) FOR I = 1 TO 100 INPUT array(I) NEXT I

will input a hundred values into the array. The variable in the NEXT statement must be the same as that in the matching FOR. FOR loops may be nested to a maximum depth of 32. 10 20 30 40 50 60

DIM chess(8,8) FOR I = 1 TO 8 FOR J = 1 TO 8 chess(j,i) = 1.0 NEXT J NEXT I

23

The step value does not need to be 1, and may be negative. For instance 10 20 30 40 50 60

FOR I = 1 TO 10 STEP 2 PRINT I NEXT I FOR I = 10 TO 1 STEP –0.3 PRINT I NEXT I

The initial, to, and step values are calculated once on entering the FOR loop, they are then constant. 10 LET x = 10 20 FOR I = 1 TO x STEP x/5 30 PRINT I 35 REM Next line has no effect LET x = x + 1 50 NEXT I

In MiniBasic, if the TO value is lower than the initial value (or higher if the STEP value is negative) then the loop does not execute. Control passes to the first matching NEXT. It is important not to jump out of FOR ... NEXT loops, or get the nesting order wrong, otherwise MiniBasic’s control flow will become confused. The “Too many FORs” error is likely to be caused by jumping out of a loop. To terminate a loop prematurely, set the counter to the TO value, and jump to the matching NEXT Usage FOR id = numeric TO numeric STEP numeric ... NEXT id

GOTO – unconditional jump GOTO executes a jump to another line. The line number is usually a constant, but GOTO x is supported. GOTO is not considered good programming practise, but is essential in MiniBasic because flow control is so simple.

24

Usage GOTO numeric 10 GOTO 100 10 GOTO x

IF – conditional jump The IF ... THEN construct allows a MiniBasic program to make decisions. It can emulate any other control structure. If the test condition is true, then control jumps to the line indicated after the THEN keyword. If false, control passes to the next line. No statements other than a line number may appear after the THEN keyword, though the form IF y < 10 THEN x

is supported The relational operators are =, <> (not equal) , >, >=, < and <=. They can be aplied to strings or to numerical expresions. The AND and OR logical operators can also be used. MiniBasic does not perform lazy evaluation – all expressions will be evaluated so all array indices etc must be legal. Usage IF relational THEN numeric 10 IF x < 10 THEN 100 10 IF a$ <> “OK” AND a$ <> “YES” THEN x

25

INPUT – input a number or a string. To input data, use the INPUT function. In a test environment this will usually be typed by the user, if MiniBasic is a component of another program input will be provided by caller. INPUT x

inputs a variable. Any non-numerical characters are skipped over until a number appears. INPUT n$

inputs a string. Characters are read up to the first newline, which is discarded. Most consoles provide data a line at a time, so input you type will not be available until you press ENTER. If the input stream comes to an end, the program will fail with an error message. Usage INPUT id 10 INPUT x 20 PRINT x

INSTR – in string Looks for occurrences of a substring within a string. The first argument is the string to search, the second argument the string to search for, and the third argument the position at which to start. The return value is 0 if the string is not found, or else the offset of the first occurence. For instance x = INSTR(“zigzag”, “zag”, 1)

will return 4 26

It is very useful for string manipulation. For instance, to test if the onecharacter string ch is a digit we could write 10 IF INSTR(“0123456789”, ch, 1) <> 0 THEN 100

Usage num = INSTR(string, string, numeric) 10 LET x = INSTR(sentence, “and”, 1)

INT – convert real to integer. All MiniBasic numerical variables are stored as floating point. INT() returns the lower integer potion of the number. Thus INT(1.9) = 1. To round, call INT(x + 0.5). INT() is also useful for getting rid of small errors caused by floating-point calculation. Usage num = INT(numeric) 10 LET x = INT(x/2)

LEFT$ - returns the left portion of a string. To take the leftmost characters of a string, call LEFT$. If the string is too short to contain that number of characters, it returns the whole string. Usage str = LEFT$(string, numeric) 10 LET hello$ = LEFT$(“Hello World”, 5)

27

LEN – returns the length of a string. To find the length of a string in characters, call LEN(). The empty string “” returns 0. It is often necessary to examine each character of a string for processing. 5 REM PRINT A$, omitting the letter “x” 6 INPUT A$ 10 FOR I = 1 TO LEN(A$) 20 LET ch$ = MID$(A$, 1, 1) 30 IF ch$ = “x” THEN 50 40 PRINT ch$; 50 NEXT I 60 PRINT

Usage num = LEN(string) 10 LET x = LEN(“This is a string”)

LET – assignment The LET statement assigns a variable a value. If the variable does not exist it is created. 10 LET x = 10 10 LET name$ = fname$ + “ “ + sname$

Plain variables like x or length are always numerical, string variables like name$ always end with a dollar sign. It is illegal to try to assign a variable of the wrong type. LET will not create or increase the size of a dimensioned variable 10 15 20 25 30

DIM array(2,2) REM Legal LET array(1,2) = 10 REM Illegal out of bounds LET array(1,3) = 0

The form 10 LET x = x + 1

28

is legal and is often very useful. It is even legal is x has not been created (it is initialised to zero). Usage LET id = numeric LET id$ = string 10 LET x = 10 10 LET x = x + 1 10 LET a$ = CHR$(13)

LN – natural logarithm Computes the natural logarithm of a number, which must be greater than zero. Natural logarithms are to the base e. To convert to a base 10 logarithm, LET log10 = LN(x)/LN(10)

To convert to a base 2 logarithm LET log2 = LN(x)/LN(2)

Usage num = LN(numeric) 10 LET log = LN(x)

MID$ - middle string function. Use this function to obtain a string from the middle of another string. The first argument is the target string, the second argument the offset (1 – based) and the third argument the length of the substring to extract. LET x$ = MID$(“Distraction”, 4, 5)

sets x$ to “tract” 29

If the length is too long for the target string, the answer is truncated. By passing –1 for the length, we tell the function to extract the remainder of the string. LET x$ = MID$(“Distraction”, 4, -1)

sets x$ to “traction”. Usage str – MID$(string, numeric, numeric) 10 LET x$ = MID$(y$, 3, 4) 10 LET x$ = MID$(y$, 3, -1)

MOD – modulus. Modulus is not a function but an arithmetical operator. It calculates the remainder after division. LET x = 12 MOD 5

sets x to 2. Both sides of the MOD operator should be of the same sign. MOD 0 is an error. MOD also works for fractional values. 0.75 MOD 0.5 equals 0.25 Usage num = numeric MOD numeric 10 LET X = Y MOD 10 10 LET X = Y MOD –0.1

30

NEXT – terminates a FOR ... NEXT loop For description see FOR Usage NEXT id 10 FOR I = TO 10 20 PRINT I 30 NEXT I

OR – logical operator. Used in IF ... THEN statements to perform both tests. If either one is true, then the test succeeds and the jump is taken. IF job$ = “caretaker” OR age < 65 THEN 100

Note that lazy evaluation is not performed. Both sides of the expression will always be evaluated. When used in conjunction with AND use parentheses to disambiguate. Usage IF relational OR relational THEN numeric 10 IF x = y OR x = z OR x < 0 THEN 100

31

PI – mathematical constant. The mathematical constant PI, or 3.14159265... This is the ratio of a circle’s circumference to its diameter and is used in many mathematical formulae Usage PI 10 LET area = radius * radius * PI

POW – exponentiation function. POW() raises x to the power y. Fractional and negative powers are supported. LET x = POW(10, 2)

Will set x to 100. By passing 1/y as the exponent, we can obtain the yth root of x. For example, to obtain the cube root of two pass LET x = POW(2, 1/3)

Passing a negative power calculates the reciprocal. For example LET y = POW(x, -3)

sets y to 1/(x^3) Some values are illegal. For instance, POW(-1,1/2) will produce an indefinite result. Usage num = POW(numeric, numeric) 10 LET a = POW(x,y)

32

PRINT – output statement All of MiniBasic’s output is via the PRINT statement. It is used to print both numbers and strings. PRINT “Hello World”

Will output the string “Hello World”, followed by a newline. PRINT x

Will print the value of x in a human-readable format. It is possible to print many values in one line by separating them with commas. PRINT “Your salary is”, x, “Mr” name$

The comma will automatically insert a space. To suppress the newline, terminate the PRINT statement with a semicolon. 10 20 30 40

LET x = 10 LET y = 25 PRINT x; PRINT y

will output the string “1025” To print a bare newline, use the empty string 10 PRINT “”

Usage PRINT numeric or string, numeric or string ; (optional) 10 10 10 10 10

PRINT PRINT PRINT PRINT PRINT

“Hello World” x “Hello”, name$ “Enter your telephone number”; “” 33

REM – remarks This statement is purely for adding comments to programs so that a human reader can understand them. It is also frequently used for “commenting out” code – prefixing with a REM so it is not executed. MiniBasic allows for multi-line comments, as long as the first character of every continued line is a space. Usage REM any comments 10 REM Demonstration program by Malcolm McLean 10 REM This is an extremely long comment, which is spread over two lines. 10 REM PRINT “This PRINT statement is commented out”

RIGHT$ - get rightmost characters of a string. This is the twin to LEFT$. It takes the rightmost characters of a string. For instance LET A$ = RIGHT$(“Beholden”, 3)

Would set A$ to “den”. If the target string isn’t long enough, all of the string is copied. Usage str = RIGHT$(string, numeric). 10 LET A$ = RIGHT$(B$, 10)

34

RND – Random number generator Many applications need random numbers. RND() provides a pseudorandom number generator. The argument, which should be an integer, tells RND() to generate a random integer in the range 0 to N –1 10 FOR I = 1 TO 100 20 PRINT RND(10) 30 NEXT I

will output a stream of random digits in the range 0 – 9. If we pass RND() the value 1 the number generated is a floating point value in the range 0 – (slightly below) 1. The random number generator is deterministic. To force a certain behaviour, call RND() with a negative argument. This will “seed” the random number generator. LET dummy = RND(-10)

will give us numbers based from the seed 10 Calling RND(0) will always return 0. Usage num = RND(numeric) 10 LET die = RND(6) + 1 10 LET dummy = RND(-10) 10 LET p = RND(1)

SIN – sine Returns the sine of a number. The argument must be in radians. Usage 10 LET s = SIN(theta) 10 LET s = SIN(degrees/180 * PI)

35

SQRT – square root Calculates the square root of its argument, which must be positive. Usage num = SQRT(numeric) 10 LET root2 = SQRT(2) 10 LET dist = SQRT( (x1-x2) * (x1-x2) + (y1 – y2) * (y1 – y2))

STEP – increment for a FOR loop. STEP is by default 1, but can be any value, positive or negative. It is evaluated once when the FOR ... NEXT loop is entered. For further details see FOR Usage FOR id = numeric TO numeric STEP numeric 10 FOR i = 1 TO 100 STEP 10 10 FOR i = 100 TO 1 STEP –1 10 FOR i = min TO max STEP delta

STR$ - convert numerical value to string. The numerical value x = 10 and the string value x$ = “10” are two different things. To convert a number into a human-readable string use STR$ Usage str = STR$(numeric) 10 LET reg$ = “ABC” + STR$(num)

36

STRING$ - tandem string. If we need to create a string that consists of a shorter string duplicated many times, use STRING$. LET stars$ = STRING$(“*”, 20)

will create a string of twenty asterisks. A common use is creating variable numbers of spaces for output formatting. Usage str = STRING$(string, numeric) 10 PRINT STRING$(“ “, 10), out$

TAN - tangent Calculates the tangent of an angle, which must be in radians. Usage num = TAN(numeric) 10 LET x = TAN(theta) 10 LET x = TAN(degrees/180 * PI)

THEN – component of IF statement. THEN introduces the jump destination which is taken if the expression in the IF statement is true. The expression is always evaluated, even if the branch is not taken. For further details see IF Usage IF relational THEN numeric

37

TO – component for FOR ... NEXT loop In a FOR ... NEXT loop, TO indroduces the terminal value. When it is exceeded, the loop terminates at the next NEXT statement. For further details see FOR Usage FOR id = numeric TO numeric

VAL – calculate the numerical value of a string. This function converts a human-readable string containing numbers to a numerical variable. The string may contain numbers in scientific notation e.g. 1.5e20. The string is read up until the first non-numerical character is encountered. If the string does start with a number, 0 is returned Usage num = VAL(string) 10 LET x = VAL(“1024”) 10 LET x = VAL(“1.5e20”)

VALLEN – length of value This function is designed for use with VAL() to tell the caller how many numerical characters were translated. This is useful if stepping through a string containing many numbers. It also tells the caller whether a string is numerical or not – if non-numerical it returns 0. 10 20 30 40 50 60

INPUT a$ IF VALLEN(a$) <> 0 THEN 50 PRINT “You must enter a number” GOTO 10 LET x= VAL(a$) LET a$ = MID$(VALLEN(A$), -1)

38

This code will read a number from the input, and prompt is valid input is not entered. Usage: num = VALLEN(string) 10 LET slen = VALLEN(“121 dalmations”)

39

Errors Sometimes MiniBasic will terminate with an error message. Usually these are due to typing mistakes or logic errors in the bASIC program. Occasionally they may be caused by the computer running out of resources, by illegal input, or by internal errors in the MiniBasic interpreter. Can’t read program You have called MiniBasic with something it cannot recognise as a MiniBasic program at all, for instance with a text file containing a nursery rhyme. Program lines not in order Lines have to be in numerical order. If lines are out of order, you will receive this error. Line not found You have tried to jump to a non-existent line.

Syntax error line This means that the computer has encountered a line it cannot understand. It is a catch all error, incorporating things such as identifiers starting with digits, or lines not terminated with a newline. Out of memory The computer has run out of memory. This may occur when you try to dimension a huge array, or it may occur at any time if the computer is low on resources, since MiniBasic uses memory internally. Be particularly careful when dimensioning arrays with variables.

40

Identifier too long An identifier (variable name) is allowed to be only 31 characters long, including the $ for a string identifier. For dimensioned variables the number is one less. No such variable You have attempted to use a variable that has not been initialised. Bad subscript You have tried to access a dimensioned array beyond its dimensioned size. Too many dimensions You have tried to dimension an array with more than five dimensions. Too many initialisers In initialising a dimensioned array, you have tried to list more values than you have space for. Illegal type You have tried to use a string variable as the counter for a for loop Too many nested fors line Maximum depth of FOR .. NEXT loops is 32. Exceeding this limit is probably due to problems with jumping out of FOR ... NEXT loops. For without matching next You have declared a FOR statement but not a matching NEXT Next without matching for You have declared a NEXT statement without a matching FOR

41

Divide by zero You have attempted to divide by zero. This is a mathematical error Negative logarithm You have attempted to take the logarithm of zero or a negative number. This is a mathematical error. Negative square root You have tried to take the square root of a negative number. This is a mathematical error Sine or Cosine out of range You have attempted to pass a value not in the range –1.0 to 1.0 to the ASIN() or ACOS() functions. End of input file An INPUT statement has encountered an end of file condition. This could be due to some problem with the computer’s system. Illegal offset A string function has received an illegal value for a string offset, such as a negative second argument to LEFT$() Type mismatch You have entered a string expression where MiniBasic was expecting a numeric expression, or a numeric expression where it was expecting a string. Input too long Input lines can be a maximum of 1023 characters long. Lines longer than this are almost certainly either errors or malicious attempts to exploit the system, so they are rejected.

42

Bad value There has been an internal overflow. Usually this is caused by trying to calculate with ridiculously large value like 10. Not an integer A non-integer was used as an array index or to a function ( like RND() ) which naturally expects an integer. Note that floating point arithmetic is not exact so expressions like SQRT(3.0) * SQRT(3.0) may not return exactly 3.0. Use the INT() function to force a number to an exact integer. ERROR Unspecified error has occurred. This probably represents some internal problem.

43

How to write a BASIC interpreter Take out your pocket calculator and type in 1+2x3 Unless you have a really good one, the output will probably be 9. The calculator moves from left to right, evaluating the expression. A mathematician, on the other hand, uses the rule BODMAS (brackets, of, divide, multiply, add, subtract). So the result of the expression should be 7, or 1 + (2 * 3) This is the basic problem in writing an expression parser to interpret human-meaningful programming languages, like BASIC. The interpreter cannot simply bolt through the input. The secret is to store the state of the expression on a stack. Consider this simple problem. We have a programming language the uses the symbols ‘(‘,‘)’ ‘[‘. ‘]’, ‘{‘ ‘}’. The brackets have to match each other. So { fred[15] * (x+1) } Would be legal { fred[15 *)x + 1(} Would not be, because the square bracket is unclosed and the round brackets are the wrong way round. { fred[15 * (x+1])} would also be illegal, because the square bracket has closed whilst the round brackets are still open. Every opening bracket has to be matched with a corresponding closed bracket, in order. The solution is to use a stack. When you hit an opening bracket, push it onto the stack. When you hit a closing bracket, pop the stack. If the symbol on the top doesn’t match, you know you have an error. Finally, if the stack is not empty at the end of the expression, you have an unclosed bracket. 44

Now one of the nice things about C is that it allows recursive functions. Mathematical expressions are naturally recursive. If we have the expression y=x+(…) anything can go inside the parentheses, as long as it is a legal expression. So we could have y=x+(2*3) y=x+(x*3) y = x + ( x + (2 * 3) ) we can nest as deeply as we like. So if we are writing a parser, the algorithm is to parse the expression from left to right, until we hit an opening parenthesis. Then we parse another expression. Then we check for a closing parethesis. If we find it, we calculate the result and pass it to the enclosing expression. If we don’t, there must be an error y=x+(3 would be an error So would y=x+(3 +) because the contents of the bracket are not a full expression. There are three basic levels of arithmetical expression: the factor, the term, and the expression. An expression consists of several terms – a term being numbers that are added or subtracted. A term consists of several factors – a factor being a number that is multiplied or divided. 45

A factor consists of either a number, or a opening bracket, an expression, and a closing bracket. The subroutine that evaluates the factor therefore has to call the subroutine that evaluates the expression, if it hits an opening bracket. Hence expression interpreters are naturally mutually recursive. Generally it is a bad thing to have mutually recursive functions in your programs, but here is an exception. Another thing that is normally a bad idea is a global variable. However in MiniBasic you will notice several global variables. This is for several reasons, but mainly because each function needs to keep track of the state of the input. We read the input from left to right, one token at a time. The token that we have just read is stored, and is available for reference. Once we process it, it is done – we never backtrack in our reading. This method is called the Look-Ahead Left Right (LALR) parser algorithm with one token of lookahead. The vast majority of programming languages can be parsed this way. There are just a few human-meaningful expressions which don’t lend themselves to this treatment – this becomes a serious problem if you are writing a natural language parser, which is a much more difficult proposition than a BASIC interpreter. So we want to write a master function double expr()

which parses the input, and returns a number as an answer. If it hits an error it sets some flag somewhere. It also updates the input globally. Intutively, it might seem nice to take the input as a parameter double expression(char *str)

this is OK for the highest level function, but not for the functions that are called recursively. The reason is that, if expression() is called recursively, the caller needs to know how many tokens it has consumed. So we keep the input, and the number of tokens read, global. This leads us to the question of what is a token. It would be perfectly workable to say that each ASCII character is a token. In fact this leads to a bit of a nuisance, because it is easier to treat numbers separately rather than writing a grammar to build them up from their digits. When we come to add keywords, it will also be a lot easier to say that each keyword is a separate token. It is also convenient to just ignore spaces.

46

So for now, let’s define our tokens as the arithmetical symbols ‘+’, ‘-‘, ‘*’, ‘/’ the parentheses ‘(‘ and ‘)’, and VALUE, meaning any sequence of digits. Finall we also need a EOS taoken, to tell us when we have reached the end on the input. The tokenizer is called the lexical analyser. The important function is called int getoken(const char *str)

str is a const, because gettoken() never actually consumes any input. It simply tells us which token is waiting for us in the input stream. At this simple level, gettoken() is very easy to write. Look at the input, and skip leading spaces. If the character waiting is an operator, return that. If it is a digit, return VALUE, if it is the terminating NUL, return EOS, and otherwise flag an error. So we call gettoken() to set up the first token, and then call expr(). The key is that we suppress error parsing in this high-level function. An expression consists of one or more terms, held together by pluses or minuses. When we run out plus or minus tokens, we stop. /* parses an expression */ static double expr(void) { double left; double right; left = term(); while(1) { switch(token) { case PLUS: match(PLUS); right = term(); left += right; break; case MINUS: match(MINUS); right = term(); left -= right; break;

47

default: return left; } } }

The match() function is the other part of the lexical analyser. It flags an error if the variable token doesn’t match whatever is passed it to match, then it discards that token and calls gettoken() to read another one. In this function, the check is redundant because we are already checking token. Note that we do not discard any symbols the expression parser doesn’t understand. The term() function is almost the same as the expr() function. An expression is a series of terms, whilst a term is a series of factors. I have removed the modulus keyword for simplicity. /* parses a term */ static double term(void) { double left; double right; left = factor(); while(1) { switch(token) { case MULT: match(MULT); right = factor(); left *= right; break; case DIV: match(DIV); right = factor(); if(right != 0.0) left /= right; else seterror(ERR_DIVIDEBYZERO); break; } }

48

Note that terms can contain errors – it is illegal to divide by zero. When we hit an error, we want to flag it and terminate. This could be achieved by setjmp() and lngjmp(), but that gets ugly. It is better to use a sticky error. seterror() keeps a note of the first error it is informed of. We then allow the parser to trigger other calls to seterror(), which are ignored, until it returns control to the top level function. /* parses a factor */ static double factor(void) { double answer = 0; char *str; char *end; int len; switch(token) { case OPAREN: match(OPAREN); answer = expr(); match(CPAREN); break; case VALUE: answer = getvalue(string, &len); match(VALUE); break; case MINUS: match(MINUS); answer = -factor(); break; case SQRT: match(SQRT); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= 0.0) answer = sqrt(answer); else seterror(ERR_NEGSQRT); break; default: seterror(ERR_SYNTAX); break; } return answer; } 49

This function, which is much simplified from the actual code, is a bit different. A factor consists of a number, most simply. So if we have the token VALUE, we examine the digits to obtain the number, match it to move the lexical anlyser on, and return the result. However a factor can also be a opening bracket, an expression, followed by a closing bracket. Therefore we have to call expr() recursively. Another complication is unary minus. A factor can be a minus sign, followed by another factor. We are disallowing unary plus, but it could be added in the same way. Finally, I have allowed for another complication, a function call to SQRT(). All the other mathematical functions can be added to the factor() function in a similar way. With the expr() function, we have our basic logic for an expression parser. The high-level function, double expression(const char *str)

would clear the error state, and set up the lexical analyser with the first token. It then calls expr() to get the result, and match(EOF). It then checks the error state, and if everything is correct, returns the result. Otherwise, it reports the error. The expression parser is the skeleton round which MiniBasic is built. It is trivial enough to add the factorial and MOD operators, more functions, and a few bells and whistles like e and PI. The next major complication comes when we allow the user to add variables. We need to allow program of the form LET x = 1 + 2 LET y = x * x

To implement this, we need the concept of the lvalue. An lvalue is something which can be assigned. Since BASIC, unlike C, does not require declaration of variables, a lvalue can be either a pre-existing variable, or one we have not encountered before.

50

If the lexical analyser hits an alphanumerical string that is not a keyword, it reports it as a FLTID ( floating point identifier ). We maintain a list of all scalar variables in the system, in the array variables. For convenience, string variables share the same space. When control reaches a LET statement, we check the identifier to see if it is already in use. If not, we add it. Then we assign it the value on the right hand side of the equals sign. The structure of a LET statement is therefore LET lvalue = expression To allow for expansion into dimensioned variables, the LVALUE structure contains a pointer to the data item to assign. Once we have assigned the variable, it becomes available as a part of an expression. Therefore the factor() routine has to be expanded to accommodate a FLTID. The function variable() matches a float identifier. Failure to find in the context of an expression is an error. It will be noted that the interpreter searches the expression list in a linear fashion. This could easily be the focus for algorithmic improvements. String expressions are basically simpler than arithmetical expressions. They do not introduce any new concepts, except that the parser has to know whether it is parsing a string or an arithmetical expression. This is where the LALR model could break down. PRINT x could require us to parse x as a numerical expression, or as a string expression. MiniBasic, like normal BASIC, gets round this by requiring all string variables to end with the character ‘$’. We therefore know whether we are dealing with a string or a numerical expression. Strings are allocated using malloc(). This is slow, but it allows for arbitrary length strings without gobbling too much memory. The next problem is flow control. Flow control is what distinguishes a programming language from a calculator. The BASIC method is to use line numbers. When the function is called, we do an initial pass through the script, to index all the line numbers. This is easy because every line must begin with a line number and end with a newline character, except that 51

MiniBasic allows for continuation lines, which are blank. If the lines are not in ascending order, we reject the program. Indexing lines this way allow for reasonably efficient jumps – otherwise we would have to read through the whole script in order to find the destination. Execution starts at the first line. We store the line number, in internal consecutive numbering, in curline. We parse one line at a time, and return the destination line number, or zero in the normal case of control simply incrementing. An expression can be converted to an internal line number, by doing a binary search on the line list. Therefore the GOTO statement consists of GOTO expression And the expression is simply evaluated and returned. It is actually simpler to allow for arbitrary numerical expressions in jump destinations, including ones computed at run time, though it wouldn’t be if we were writing a compiler rather than an interpreter. IF is the slightly more complicated form of the GOTO statement. I chose to use the canonical BASIC form of IF … THEN linenumber, because it is familiar to microcomputer programmers, though in fact it is a pain to use and the more modern IF … ENDIF is a lot more intuitive. The IF statement requires the introduction of relational expressions. Similarly to numerical expressions, these have precedence of ANDs and ORs, together with relational operators like ‘>’ and ‘=’. They also have to allow nested parentheses. A relational expression parser can be built in exactly the same way as an expression parser. In fact it needs to call the numerical expression parser IF expression > expression is a perfectly legitimate and unexceptional form of use. The other essential for a programming language is the use of vector variables. A huge number of operations, such as calculations of the mean, or sorting, rely on lists. It is of course easy to emulate any multi-dimensional array with a onedimensional array. In fact in C most data which is inherently two dimensional, such as image rasters, has to be treated as one-dimensional 52

because of limitations in the language. However BASIC programmers expect to be able to use multi-dimensional arrays. The DIM statement, of course, just calls malloc() internally. To simplify coding I restrict arrays to at most five dimensions, which is about the maximum that can be written out by hand. Even a big computer will quickly run out of memory if arrays get much bigger than this anyway. Allowing arbitrary dimensions forces you to roll the indexing into a loop over the dimensions, which gets headachy. This does complicate the variable system, because we have both scalar and dimensioned variables. However they can be distinguished by requiring all dimensioned variables to end with an opening parenthesis. This is done at the level of the lexical analyser. The use of the LVALUE structure helps to keep things under control – it simply points into the array. The whole point of dimensioning arrays is to iterate over them. This can be done using IF statements and keeping a counter, but it is clumsy. Unfortunately FOR … NEXT loops introduce other problems into the interpreter. They can be nested, so a stack of control structures has to be maintained. Then the loop is exited at the terminating NEXT statement, but the control is in the FOR, which leads to other problems, particularly because the user may not enter a script with nicely nested loops. Finally, there is the issue of what to do if a user jumps out of a loop. The solution, which is simple but not the most elegant, is to allow the user to mess around with flow control, but keep the FOR stack relatively small, and insist on the matching NEXT being labelled with the control variable. So bad control will rapidly either overflow the stack or cause a mismatch error. The problem is that there then is no way to break out of the FOR loop legitimately. The FOR loop searches for a matching NEXT if the loop is null, the advantage is that it allows for cleaner user scripts, though the fiddliness is probably more effort than it is worth. The step size and terminating expression is evaluated once only, on loop entry. The cost of changing this and providing C-style fors is that you then need two FOR-evaluation routines, one for loop entry and one for each update. Finally, every program needs IO. Microcomputers didn’t have any sort of worthwhile backing store, so the canonical file-handling functions were not very good. In the UNIX world, it is quite common to take everything from standard input and direct 53

everything to standard output, redirecting by means of pipes. In the PC world, users expect graphical interfaces to their file system, which of course is way beyond our capacity to provide. The PRINT statement is built on top of ANSI fprintf(). The form is that the user can specify either a string or a numerical variable, so we need a function, isstring() to distinguish between the two. The INPUT statement suffers from the problem of what to do if input doesn’t match. The solution for inputting numbers is to ignore nonnumerical input until a number is found. It is implemented in terms of fscanf(). For string input, the string is defined as the line. This is then limited to 1024 characters to allow for the call to fgets(). In practise I suspect that most users of MiniBasic would want to provide their own IO extensions. For instance, if you want to control a plotter, you would provide instructions like PENUP, PENDOWN, PENMOVE and functions to query the pen position. However you could use the program as it is – printing the statements to stdout would be interpreted by a calling function and translated to pen commands, whilst changes to the pen state would go on standard input. MiniBasic is yours to use as you want. I would like to be acknowledged if you find the code useful, either as is or as the basis for redevelopment, but I don’t insist on this. If it makes your boss happy to think that the code was developed all by yourself in the five minutes it took to download this book, then that is fine by me. You can incorporate it in free or commercial products without charge. The only thing I insist on is that you do not try to restrict my rights in the code in any way, which means that I can make use of any enhancements, bug fixes, or derivative products as I see fit.

54

The design of MiniBasic MiniBasic is designed to allow non-programmers to add arbitrary functions to programs. Imagine we are attempting to SPAM-shield an email program. Different users have different ideas of what constitutes SPAM. By providing checkboxes we can go so far, but if a user want, say to, regard as SPAM everything with an attachment, unless it is under a certain size, unless it has come from a trusted list of addresses, then we are stuck. However by providing a MiniBasic interface, the user can input the relevant values and provide the logic. The calling program would do this by setting up the input stream with, say, the email address of the sender, the title of the email, the length of any attachment. The calling program then presents half of a MiniBasic program to the user, with the input set up eg 10 20 30 40 50 60 70 80

REM SPAM filter REM sender’s email INPUT address$ REM tile of email INPUT title$ REM length of attachment INPUT attachlen REM PRINT “Accept” to accept the email or “Reject” to reject

The user then provides the logic for his choice.

MiniBasic can of course also be used as a stand-alone console programming language. This is useful for teaching purposes, for testing MiniBasic programs, or if you simply want to write a “filter” program that accepts from standard input and writes to standard output. It was important that MiniBasic be simple to learn, and simple to implement. For this reason the syntax of the language has been kept as

55

close as possible to the type of BASIC used on microcomputers in the 1980s. Millions of people know a BASIC of this kind. Because of advances in computer power since the 1980s array initialisation was allowed. This allows us to eliminate the difficult to use READ and DATA statements. Re-dimensioning of arrays was also allowed, largely for theoretical reasons (it turns MiniBasic into a Turing machine). GOSUB was not included. It is not of much practical use without local variables and parameters, and a functional language isn’t very useful without some mechanism for passing and returning vectors. Adding these would have complicated the design of the interpreter considerably, and moved the language away from original BASIC. PEEK and POKE are obviously hardware-dependent, add potential security risks, and were also not included. The C-language source to MiniBasic is included. The interface is int basic(const char *script, FILE *in, FILE *out, FILE *err) In the standalone program these are called with stdin, stdout, stderr. In a component environment, these will usually be temporary memory files, and the input will be set up with the parameters to the function the user is to write. The function returns 0 on success or non-zero on failure. The source code is portable ANSI C. With the exception of the CHR$() and ASCII() functions, which rely on the execution character set being ASCII. The relational operators for strings also call the function strcmp() internally, which may have implications on non-ASCII systems. An interpreted language is obviously not a particularly efficient way of running functions. Variables are stored in linear lists, with an O(N) access time, so big programs are O(N*N). However because of the lack of support for subroutines MiniBasic is not very suitable for complex programs anyway. If you were to extend the scope of the program to run very large scripts, it would be necessary to replace the variable list with a hash table, binary tree, or other structure that supports fast searching. All MiniBasic keywords, with the exception of e, start with uppercase letters. This fact is exploited to allow faster recognition of identifiers starting 56

with lower case. Users can use this feature to gain some performance advantage. On a fast computer the efficiency of MiniBasic shouldn’t be a major problem unless users run very processor-intensive scripts, or if the function is in a time-critical portion of code. In these cases the answer would be to move to a pseudo-compiler system, where the MiniBasic scripts are translated into an intermediate bytecode that is similar to machine language. This is a project for a later date. Since MiniBasic is available as C source, it is possible to extend the language. Where possible extensions should be in the form of functions rather than new statements, to avoid changing the grammar of the language. To add a numerical function, foo, which takes a numerical and a string argument, write the function like this foo – check the first character of a string FOO(85, “Useless function”) double foo(void) { double answer; double x; char *str; int ch; match(FOO); /* match the token for the function */ match(OPAREN); /*opening parenthesis */ x = expr(); /* read the numerical argument */ match(COMMA); /* comma separates arguments */ str = stringexpr(); /* read the string argument */ match(CPAREN); /* match close */ f(str == NULL) /* computer can run out of memory */ return 0; /* stringexpr() will have signalled the error so no point in generating another */ ch = integer(x); /* signal error if x isn’t an integer*/ if( !isalpha(ch) ) seterror(ERR_BADVALUE); /* signal an error of your if ch isn’t valid */ if(str[0] == ch) answer = 1.0; else answer = 2.0; free(str); malloc(), so free */ return answer;

/* function logic */

/* str is allocated with

57

}

Once you have your function, add an identifier for it, FOO, to the token list. Then to the functions gettoken() and tokenlen() add the appropriate lines. Finally to the function factor() add the code for calling your function. For string functions, the procedure is similar, except that they must return an allocated string. The convention is that they end with a dollar sign, and the token id ends with the sequence “STRING”. Add the call to the function stringexpr(), and add your symbol to the function isstring() so that statements like PRINT know that it generates a string expression. To change the input and output model, you need only change the functions doprint() and doinput(). If you wish to change the error system then you need to look at the functions setup(), reporterror() and the toplevel function basic(). Currently the program takes FILE pointers, which should be flexible enough for most uses, but not if say you want to provide for interactive scripts.

58

Hello World 10 REM Hello World program 20 PRINT "Hello World"

59

Name Handling 10 20 30 40 50

REM String-handling program REM Inputs a name, tests for validity REM and breaks up into parts. PRINT "Enter your full name" INPUT name$

60 REM First check for non-English characters 70 LET flag = 0 80 FOR I = 1 TO LEN(name$) 90 LET ch$ = MID$(name$, I,1) 100 IF (ch$ >= "A" AND ch$ <= "z") OR ch$ = " " THEN 140 110 LET flag = 1 120 REM This forces the loop to stop 130 LET I = LEN(name$) 140 NEXT I 150 IF flag = 0 THEN 180 160 PRINT "Non-English letter,", ch$ 170 GOTO 40 180 190 200 210 220 230 240 250 260 270

REM Jump to subroutine LET return = 210 GOTO 1000 IF name$ = "" THEN 280 LET return = 240 GOTO 2000 LET N = N + 1 DIM out$(N) LET out$(N) = word$ GOTO 180

280 285 290 300 310 320 330

REM Print out the name PRINT "Name accepted" FOR I = 1 TO N PRINT out$(I) + " "; NEXT I PRINT "" GOTO 3000

1000 1010 1020 1030

REM strips the leading space IF LEFT$(name$, 1) <> " " THEN return LET name$ = MID$(name$, 2, -1) GOTO 1010

2000 REM get the leading word and put it in word$ 2010 LET word$ = ""

60

2020 2030 2040 2050 2060

LET ch$ = LEFT$(name$, 1) IF ch$ < "A" OR ch$ > "z" THEN return LET word$ = word$ + ch$ LET name$ = MID$(name$, 2, -1) GOTO 2020

3000 REM END

61

ROT13 10 REM ROT13 CODE 15 LET CODE$ = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz" 20 INPUT A$ 30 FOR I = 1 TO LEN(A$) 40 LET B$ = MID$(A$,I, 1) 50 LET TAR = INSTR(CODE$, B$, 1) 60 IF TAR = 0 THEN 90 70 LET TAR = (TAR + 26) MOD 52 80 LET B$ = MID$(CODE$, TAR, 1) 90 PRINT B$; 100 NEXT I 110 PRINT "" 120 GOTO 20

62

Median 10 REM Median program. 20 LET N = 0 30 DIM array(N+1) 40 PRINT "Enter a number, q to quit" 50 INPUT line$ 60 IF line$ = "q" THEN 100 70 LET N = N + 1 80 LET array(N) = VAL(line$) 90 GOTO 30 100 PRINT N, "numbers entered" 105 IF N = 0 THEN 1000 106 IF N = 1 THEN 210 110 REM Bubble sort the numbers 120 LET flag = 0 130 LET i = 1 140 IF array(i) <= array(i+1) THEN 190 150 LET flag = 1 160 LET temp = array(i) 170 LET array(i) = array(i+1) 180 LET array(i+1) = temp 190 LET i = i + 1 195 IF i < N THEN 140 200 IF flag = 1 THEN 120 210 REM print out the middle 220 IF N MOD 2 = 0 THEN 250 230 LET mid = array( (N + 1) / 2) 240 GOTO 270 250 LET mid = array(N/2) + array(N/2+1) 260 LET mid = mid/2 270 PRINT "Median", mid 1000 REM end

63

Lander 10 REM Lunar lander program. 20 30 40 50

LET LET LET LET

dist = 100 v=1 fuel = 1000 mass = 1000

60 PRINT 70 PRINT 80 PRINT 90 PRINT negative" 100 110 115 116 117 120 130 140 150 160 170 180 190 200 210 220

"You are a in control of a lunar lander." "You are drifiting towards the surface of the moon." "Each turn you must decide how much fuel to burn." "To accelerate enter a positive number, to decelerate a

PRINT "Distance", dist, "km", "velocity", v, "km/s", "Fuel", fuel INPUT burn IF ABS(burn) <= fuel THEN 120 PRINT "You don't have that much fuel" GOTO 100 LET v = v + burn * 10 / (fuel + mass) LET fuel = fuel - ABS(burn) LET dist = dist - v IF dist > 0 THEN 100 PRINT "You have hit the surface" IF v < 3 THEN 210 PRINT "Hit surface too fast (", v,")km/s" PRINT "You Crash" GOTO 220 PRINT "Well done" REM END

64

/* driver file for MiniBasic by Malcolm Mclean Leeds University */ #include <stdio.h> #include <stdlib.h> #include "basic.h" char *loadfile(char *path); /* here is a simple script to play with */ char *script = "10 REM Test Script\n" "20 REM Tests the Interpreter\n" "30 REM By Malcolm Mclean\n" "35 PRINT \"HERE\" \n" "40 PRINT INSTR(\"FRED\", \"ED\", 4)\n" "50 PRINT VALLEN(\"12a\"), VALLEN(\"xyz\")\n" "60 LET x = SQRT(3.0) * SQRT(3.0)\n" "65 LET x = INT(x + 0.5)\n" "70 PRINT MID$(\"1234567890\", x, -1)\n" ; void usage(void) { printf("MiniBasic: a BASIC interpreter\n"); printf("usage:\n"); printf("Basic <script>\n"); printf("See documentation for BASIC syntax.\n"); exit(EXIT_FAILURE); } /* call with the name of the Minibasic script file */ int main(int argc, char **argv) { char *scr; if(argc == 1) { /* comment out usage call to run test script */ usage(); basic(script, stdin, stdout, stderr); }

65

else { scr = loadfile(argv[1]); if(scr) { basic(scr, stdin, stdout, stderr); free(scr); } } return 0; } /* function to slurp in an ASCII file Params: path - path to file Returns: malloced string containing whole file */ char *loadfile(char *path) { FILE *fp; int ch; long i = 0; long size = 0; char *answer; fp = fopen(path, "r"); if(!fp) { printf("Can't open %s\n", path); return 0; } fseek(fp, 0, SEEK_END); size = ftell(fp); fseek(fp, 0, SEEK_SET); answer = malloc(size + 100); if(!answer) { printf("Out of memory\n"); fclose(fp); return 0; } while( (ch = fgetc(fp)) != EOF) answer[i++] = ch; answer[i++] = 0; fclose(fp); return answer; }

66

#ifndef basic_h #define basic_h /* Minibasic header file By Malcolm Mclean */ int basic(const char *script, FILE *in, FILE *out, FILE *err); #endif

67

/******************************************************** * Mini BASIC * * by Malcolm McLean * * version 1.0 * ********************************************************/ #include #include #include #include #include #include #include #include

<stdio.h> <stdlib.h> <string.h> <stdarg.h> <math.h>

/* tokens defined */ #define EOS 0 #define VALUE 1 #define PI 2 #define E 3 #define #define #define #define #define #define #define #define #define

DIV 10 MULT 11 OPAREN 12 CPAREN 13 PLUS 14 MINUS 15 SHRIEK 16 COMMA 17 MOD 200

#define #define #define #define #define #define #define #define #define #define #define

ERROR 20 EOL 21 EQUALS 22 STRID 23 FLTID 24 DIMFLTID 25 DIMSTRID 26 QUOTE 27 GREATER 28 LESS 29 SEMICOLON 30

#define #define #define #define #define

PRINT 100 LET 101 DIM 102 IF 103 THEN 104

68

#define #define #define #define #define #define #define #define #define

AND 105 OR 106 GOTO 107 INPUT 108 REM 109 FOR 110 TO 111 NEXT 112 STEP 113

#define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define

SIN 5 COS 6 TAN 7 LN 8 POW 9 SQRT 18 ABS 201 LEN 202 ASCII 203 ASIN 204 ACOS 205 ATAN 206 INT 207 RND 208 VAL 209 VALLEN 210 INSTR 211

#define #define #define #define #define #define

CHRSTRING 300 STRSTRING 301 LEFTSTRING 302 RIGHTSTRING 303 MIDSTRING 304 STRINGSTRING 305

/* relational operators defined */ #define #define #define #define #define #define

ROP_EQ 1 ROP_NEQ 2 ROP_LT 3 ROP_LTE 4 ROP_GT 5 ROP_GTE 6

/* /* /* /* /* /*

equals */ doesn't equal */ less than */ less than or equals */ greater than */ greater than or equals */

/* error codes (in BASIC script) defined */ #define ERR_CLEAR 0 #define ERR_SYNTAX 1 #define ERR_OUTOFMEMORY 2 #define ERR_IDTOOLONG 3 #define ERR_NOSUCHVARIABLE 4 #define ERR_BADSUBSCRIPT 5

69

#define #define #define #define #define #define #define #define #define #define #define #define #define #define #define #define

ERR_TOOMANYDIMS 6 ERR_TOOMANYINITS 7 ERR_BADTYPE 8 ERR_TOOMANYFORS 9 ERR_NONEXT 10 ERR_NOFOR 11 ERR_DIVIDEBYZERO 12 ERR_NEGLOG 13 ERR_NEGSQRT 14 ERR_BADSINCOS 15 ERR_EOF 16 ERR_ILLEGALOFFSET 17 ERR_TYPEMISMATCH 18 ERR_INPUTTOOLONG 19 ERR_BADVALUE 20 ERR_NOTINT 21

#define MAXFORS 32 typedef struct { int no; const char *str; }LINE; typedef struct { char id[32]; double dval; char *sval; (malloced) */ } VARIABLE; typedef struct { char id[32]; int type; int ndims; int dim[5]; char **str; double *dval; } DIMVAR; typedef struct { int type; or FLTID or ERROR) */ char **sval; double *dval; } LVALUE;

/* maximum number of nested fors */

/* line number */ /* points to start of line */

/* id of variable */ /* its value if a real */ /* its value if a string

/* /* /* /* /* /*

id of dimensioned variable */ its type, STRID or FLTID */ number of dimensions */ dimensions in x y order */ pointer to string data */ pointer to real data */

/* type of variable (STRID /* pointer to string data */ /* pointer to real data */

70

typedef struct { char id[32]; int nextline; control passes */ double toval; double step; } FORLOOP;

/* id of control variable */ /* line below FOR to which /* terminal value */ /* step size */

static FORLOOP forstack[MAXFORS]; control */ static int nfors; stack */

/* stack for for loop /* number of fors on

static VARIABLE *variables; variables */ static int nvariables; */

/* the script's

static DIMVAR *dimvariables; */ static int ndimvariables; dimensioned arrays */

/* dimensioned arrays

/* number of variables

/* number of

static LINE *lines; starts */ static int nlines; BASIC lines in program */

/* list of line /* number of

static FILE *fpin; static FILE *fpout; static FILE *fperr;

/* input stream */ /* output strem */ /* error stream */

static const char *string; parsing */ static int token; (lookahead) */ static int errorflag; input encountered */

/* string we are /* current token /* set when error in

static int setup(const char *script); static void cleanup(void); static void reporterror(int lineno); static int findline(int no); static static static static

int line(void); void doprint(void); void dolet(void); void dodim(void);

71

static static static static static static

int doif(void); int dogoto(void); void doinput(void); void dorem(void); int dofor(void); int donext(void);

static void lvalue(LVALUE *lv); static int boolexpr(void); static int boolfactor(void); static int relop(void);

static static static static static static

double double double double double double

expr(void); term(void); factor(void); instr(void); variable(void); dimvariable(void);

static static static static static static static

VARIABLE *findvariable(const char *id); DIMVAR *finddimvar(const char *id); DIMVAR *dimension(const char *id, int ndims, ...); void *getdimvar(DIMVAR *dv, ...); VARIABLE *addfloat(const char *id); VARIABLE *addstring(const char *id); DIMVAR *adddimvar(const char *id);

static static static static static static static static static static

char char char char char char char char char char

*stringexpr(void); *chrstring(void); *strstring(void); *leftstring(void); *rightstring(void); *midstring(void); *stringstring(void); *stringdimvar(void); *stringvar(void); *stringliteral(void);

static int integer(double x); static static static static static

void match(int tok); void seterror(int errorcode); int getnextline(const char *str); int gettoken(const char *str); int tokenlen(const char *str, int token);

static int isstring(int token); static double getvalue(const char *str, int *len);

72

static void getid(const char *str, char *out, int *len); static static static static static *cat); static

void mystrgrablit(char *dest, const char *src); char *mystrend(const char *str, char quote); int mystrcount(const char *str, char ch); char *mystrdup(const char *str); char *mystrconcat(const char *str, const char double factorial(double x);

/* Interpret a BASIC script Params: script - the script to run in - input stream out - output stream err - error stream Returns: 0 on success, 1 on error condition. */ int basic(const char *script, FILE *in, FILE *out, FILE *err) { int curline = 0; int nextline; int answer = 0; fpin = in; fpout = out; fperr = err; if( setup(script) == -1 ) return 1; while(curline != -1) { string = lines[curline].str; token = gettoken(string); errorflag = 0; nextline = line(); if(errorflag) { reporterror(lines[curline].no); answer = 1; break; } if(nextline == -1) break; if(nextline == 0)

73

{ curline++; if(curline == nlines) break; } else { curline = findline(nextline); if(curline == -1) { if(fperr) fprintf(fperr, "line %d not found\n", nextline); answer = 1; break; } } } cleanup(); return answer; } /* Sets up all our globals, including the list of lines. Params: script - the script passed by the user Returns: 0 on success, -1 on failure */ static int setup(const char *script) { int i; nlines = mystrcount(script, '\n'); lines = malloc(nlines * sizeof(LINE)); if(!lines) { if(fperr) fprintf(fperr, "Out of memory\n"); return -1; } for(i=0;i

74

nlines--; } script = strchr(script, '\n'); script++; } if(!nlines) { if(fperr) fprintf(fperr, "Can't read program\n"); free(lines); return -1; } for(i=1;i

75

{ if(dimvariables[i].type == STRID) { if(dimvariables[i].str) { size = 1; for(ii=0;ii

76

fprintf(fperr, "Syntax error line %d\n", lineno); break; case ERR_OUTOFMEMORY: fprintf(fperr, "Out of memory line %d\n", lineno); break; case ERR_IDTOOLONG: fprintf(fperr, "Identifier too long line %d\n", lineno); break; case ERR_NOSUCHVARIABLE: fprintf(fperr, "No such variable line %d\n", lineno); break; case ERR_BADSUBSCRIPT: fprintf(fperr, "Bad subscript line %d\n", lineno); break; case ERR_TOOMANYDIMS: fprintf(fperr, "Too many dimensions line %d\n", lineno); break; case ERR_TOOMANYINITS: fprintf(fperr, "Too many initialisers line %d\n", lineno); break; case ERR_BADTYPE: fprintf(fperr, "Illegal type line %d\n", lineno); break; case ERR_TOOMANYFORS: fprintf(fperr, "Too many nested fors line %d\n", lineno); break; case ERR_NONEXT: fprintf(fperr, "For without matching next line %d\n", lineno); break; case ERR_NOFOR: fprintf(fperr, "Next without matching for line %d\n", lineno); break; case ERR_DIVIDEBYZERO: fprintf(fperr, "Divide by zero lne %d\n", lineno); break; case ERR_NEGLOG: fprintf(fperr, "Negative logarithm line %d\n", lineno); break; case ERR_NEGSQRT: fprintf(fperr, "Negative square root line %d\n", lineno); break; case ERR_BADSINCOS:

77

fprintf(fperr, "Sine or cosine out of range line %d\n", lineno); break; case ERR_EOF: fprintf(fperr, "End of input file %d\n", lineno); break; case ERR_ILLEGALOFFSET: fprintf(fperr, "Illegal offset line %d\n", lineno); break; case ERR_TYPEMISMATCH: fprintf(fperr, "Type mismatch line %d\n", lineno); break; case ERR_INPUTTOOLONG: fprintf(fperr, "Input too long line %d\n", lineno); break; case ERR_BADVALUE: fprintf(fperr, "Bad value at line %d\n", lineno); break; case ERR_NOTINT: fprintf(fperr, "Not an integer at line %d\n", lineno); break; default: fprintf(fperr, "ERROR line %d\n", lineno); break; } } /* binary search for a line Params: no - line number to find Returns: index of the line, or -1 on fail. */ static int findline(int no) { int high; int low; int mid; low = 0; high = nlines-1; while(high > low + 1) { mid = (high + low)/2; if(lines[mid].no == no) return mid; if(lines[mid].no > no) high = mid; else

78

low = mid; } if(lines[low].no == no) mid = low; else if(lines[high].no == no) mid = high; else mid = -1; return mid; } /* Parse a line. High level parse function */ static int line(void) { int answer = 0; const char *str; match(VALUE); switch(token) { case PRINT: doprint(); break; case LET: dolet(); break; case DIM: dodim(); break; case IF: answer = doif(); break; case GOTO: answer = dogoto(); break; case INPUT: doinput(); break; case REM: dorem(); return 0; break; case FOR: answer = dofor(); break; case NEXT:

79

answer = donext(); break; default: seterror(ERR_SYNTAX); break; } if(token != EOS) { /*match(VALUE);*/ /* check for a newline */ str = string; while(isspace(*str)) { if(*str == '\n') break; str++; } if(*str != '\n') seterror(ERR_SYNTAX); } return answer; } /* the PRINT statement */ static void doprint(void) { char *str; double x; match(PRINT); while(1) { if(isstring(token)) { str = stringexpr(); if(str) { fprintf(fpout, "%s", str); free(str); } } else { x = expr(); fprintf(fpout, "%g", x);

80

} if(token == COMMA) { fprintf(fpout, " "); match(COMMA); } else break; } if(token == SEMICOLON) { match(SEMICOLON); fflush(fpout); } else fprintf(fpout, "\n"); } /* the LET statement */ static void dolet(void) { LVALUE lv; char *temp; match(LET); lvalue(&lv); match(EQUALS); switch(lv.type) { case FLTID: *lv.dval = expr(); break; case STRID: temp = *lv.sval; *lv.sval = stringexpr(); if(temp) free(temp); break; default: break; } }

81

/* the DIM statement */ static void dodim(void) { int ndims = 0; double dims[6]; char name[32]; int len; DIMVAR *dimvar; int i; int size = 1; match(DIM); switch(token) { case DIMFLTID: case DIMSTRID: getid(string, name, &len); match(token); dims[ndims++] = expr(); while(token == COMMA) { match(COMMA); dims[ndims++] = expr(); if(ndims > 5) { seterror(ERR_TOOMANYDIMS); return; } } match(CPAREN); for(i=0;i

82

dimvar = dimension(name, 2, (int) dims[0], (int) dims[1]); break; case 3: dimvar = dimension(name, 3, (int) dims[0], (int) dims[1], (int) dims[2]); break; case 4: dimvar = dimension(name, 4, (int) dims[0], (int) dims[1], (int) dims[2], (int) dims[3]); break; case 5: dimvar = dimension(name, 5, (int) dims[0], (int) dims[1], (int) dims[2], (int) dims[3], (int) dims[4]); break; } break; default: seterror(ERR_SYNTAX); return; } if(dimvar == 0) { /* out of memory */ seterror(ERR_OUTOFMEMORY); return; }

if(token == EQUALS) { match(EQUALS); for(i=0;i

83

i = 0; if(dimvar->str[i]) free(dimvar->str[i]); dimvar->str[i++] = stringexpr(); while(token == COMMA && i < size) { match(COMMA); if(dimvar->str[i]) free(dimvar->str[i]); dimvar->str[i++] = stringexpr(); if(errorflag) break; } break; } if(token == COMMA) seterror(ERR_TOOMANYINITS); } } /* the IF statement. if jump taken, returns new line no, else returns 0 */ static int doif(void) { int condition; int jump; match(IF); condition = boolexpr(); match(THEN); jump = integer( expr() ); if(condition) return jump; else return 0; } /* the GOTO satement returns new line number */ static int dogoto(void) { match(GOTO); return integer( expr() ); }

84

/* The FOR statement. Pushes the for stack. Returns line to jump to, or -1 to end program */ static int dofor(void) { LVALUE lv; char id[32]; char nextid[32]; int len; double initval; double toval; double stepval; const char *savestring; int answer; match(FOR); getid(string, id, &len); lvalue(&lv); if(lv.type != FLTID) { seterror(ERR_BADTYPE); return -1; } match(EQUALS); initval = expr(); match(TO); toval = expr(); if(token == STEP) { match(STEP); stepval = expr(); } else stepval = 1.0; *lv.dval = initval; if(nfors > MAXFORS - 1) { seterror(ERR_TOOMANYFORS); return -1; } if(stepval < 0 && initval < toval || stepval > 0 && initval > toval)

85

{ savestring = string; while(string = strchr(string, '\n')) { errorflag = 0; token = gettoken(string); match(VALUE); if(token == NEXT) { match(NEXT); if(token == FLTID || token == DIMFLTID) { getid(string, nextid, &len); if(!strcmp(id, nextid)) { answer = getnextline(string); string = savestring; token = gettoken(string); return answer ? answer : -1; } } } } seterror(ERR_NONEXT); return -1; } else { strcpy(forstack[nfors].id, id); forstack[nfors].nextline = getnextline(string); forstack[nfors].step = stepval; forstack[nfors].toval = toval; nfors++; return 0; } } /* the NEXT statement updates the counting index, and returns line to jump to */ static int donext(void) { char id[32]; int len; LVALUE lv; match(NEXT);

86

if(nfors) { getid(string, id, &len); lvalue(&lv); if(lv.type != FLTID) { seterror(ERR_BADTYPE); return -1; } *lv.dval += forstack[nfors-1].step; if( (forstack[nfors-1].step < 0 && *lv.dval < forstack[nfors-1].toval) || (forstack[nfors-1].step > 0 && *lv.dval > forstack[nfors-1].toval) ) { nfors--; return 0; } else { return forstack[nfors-1].nextline; } } else { seterror(ERR_NOFOR); return -1; } }

/* the INPUT statement */ static void doinput(void) { LVALUE lv; char buff[1024]; char *end; match(INPUT); lvalue(&lv); switch(lv.type) { case FLTID: while(fscanf(fpin, "%lf", lv.dval) != 1) { fgetc(fpin); if(feof(fpin)) {

87

seterror(ERR_EOF); return; } } break; case STRID: if(*lv.sval) { free(*lv.sval); *lv.sval = 0; } if( fgets(buff, sizeof(buff), fpin) == 0) { seterror(ERR_EOF); return; } end = strchr(buff, '\n'); if(!end) { seterror(ERR_INPUTTOOLONG); return; } *end = 0; *lv.sval = mystrdup(buff); if(!*lv.sval) { seterror(ERR_OUTOFMEMORY); return; } break; default: return; } } /* the REM statement. Note is unique as the rest of the line is not parsed */ static void dorem(void) { match(REM); return; }

88

/* Get an lvalue from the environment Params: lv - structure to fill. Notes: missing variables (but not out of range subscripts) are added to the variable list. */ static void lvalue(LVALUE *lv) { char name[32]; int len; VARIABLE *var; DIMVAR *dimvar; int index[5]; void *valptr = 0; int type; lv->type = ERROR; lv->dval = 0; lv->sval = 0; switch(token) { case FLTID: getid(string, name, &len); match(FLTID); var = findvariable(name); if(!var) var = addfloat(name); if(!var) { seterror(ERR_OUTOFMEMORY); return; } lv->type = FLTID; lv->dval = &var->dval; lv->sval = 0; break; case STRID: getid(string, name, &len); match(STRID); var = findvariable(name); if(!var) var = addstring(name); if(!var) { seterror(ERR_OUTOFMEMORY); return; }

89

lv->type = STRID; lv->sval = &var->sval; lv->dval = 0; break; case DIMFLTID: case DIMSTRID: type = (token == DIMFLTID) ? FLTID : STRID; getid(string, name, &len); match(token); dimvar = finddimvar(name); if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA); index[3] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2], index[3]); break; case 5: index[0] = integer( expr() );

90

match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA); index[3] = integer( expr() ); match(COMMA); index[4] = integer( expr() ); if(errorflag == 0) valptr = getdimvar(dimvar, index[0], index[1], index[2], index[3]); break; } match(CPAREN); } else { seterror(ERR_NOSUCHVARIABLE); return; } if(valptr) { lv->type = type; if(type == FLTID) lv->dval = valptr; else if(type == STRID) lv->sval = valptr; else assert(0); } break; default: seterror(ERR_SYNTAX); } } /* parse a boolean expression consists of expressions or strings and relational operators, and parentheses */ static int boolexpr(void) { int left; int right; left = boolfactor(); while(1) {

91

switch(token) { case AND: match(AND); right = boolexpr(); return (left && right) ? 1 : 0; case OR: match(OR); right = boolexpr(); return (left || right) ? 1 : 0; default: return left; } } } /* boolean factor, consists of expression relop expression or string relop string, or ( boolexpr() ) */ static int boolfactor(void) { int answer; double left; double right; int op; char *strleft; char *strright; int cmp; switch(token) { case OPAREN: match(OPAREN); answer = boolexpr(); match(CPAREN); break; default: if(isstring(token)) { strleft = stringexpr(); op = relop(); strright = stringexpr(); if(!strleft || !strright) { if(strleft) free(strleft); if(strright) free(strright); return 0; }

92

cmp = strcmp(strleft, strright); switch(op) { case ROP_EQ: answer = cmp == 0 ? 1 : 0; break; case ROP_NEQ: answer = cmp == 0 ? 0 : 1; break; case ROP_LT: answer = cmp < 0 ? 1 : 0; break; case ROP_LTE: answer = cmp <= 0 ? 1 : 0; break; case ROP_GT: answer = cmp > 0 ? 1 : 0; break; case ROP_GTE: answer = cmp >= 0 ? 1 : 0; break; default: answer = 0; } free(strleft); free(strright); } else { left = expr(); op = relop(); right = expr(); switch(op) { case ROP_EQ: answer = break; case ROP_NEQ: answer = break; case ROP_LT: answer = break; case ROP_LTE: answer = break; case ROP_GT: answer = break; case ROP_GTE: answer =

(left == right) ? 1 : 0;

(left != right) ? 1 : 0;

(left < right) ? 1 : 0;

(left <= right) ? 1 : 0;

(left > right) ? 1 : 0;

(left >= right) ? 1 : 0;

93

break; default: errorflag = 1; return 0; } } } return answer; } /* get a relational operator returns operator parsed or ERROR */ static int relop(void) { switch(token) { case EQUALS: match(EQUALS); return ROP_EQ; case GREATER: match(GREATER); if(token == EQUALS) { match(EQUALS); return ROP_GTE; } return ROP_GT; case LESS: match(LESS); if(token == EQUALS) { match(EQUALS); return ROP_LTE; } else if(token == GREATER) { match(GREATER); return ROP_NEQ; } return ROP_LT; default: seterror(ERR_SYNTAX); return ERROR; } }

94

/* parses an expression */ static double expr(void) { double left; double right; left = term(); while(1) { switch(token) { case PLUS: match(PLUS); right = term(); left += right; break; case MINUS: match(MINUS); right = term(); left -= right; break; default: return left; } } } /* parses a term */ static double term(void) { double left; double right; left = factor(); while(1) { switch(token) { case MULT: match(MULT); right = factor(); left *= right; break; case DIV: match(DIV);

95

right = factor(); if(right != 0.0) left /= right; else seterror(ERR_DIVIDEBYZERO); break; case MOD: match(MOD); right = factor(); left = fmod(left, right); break; default: return left; } } } /* parses a factor */ static double factor(void) { double answer = 0; char *str; char *end; int len; switch(token) { case OPAREN: match(OPAREN); answer = expr(); match(CPAREN); break; case VALUE: answer = getvalue(string, &len); match(VALUE); break; case MINUS: match(MINUS); answer = -factor(); break; case FLTID: answer = variable(); break; case DIMFLTID: answer = dimvariable(); break; case E: answer = exp(1.0);

96

match(E); break; case PI: answer = acos(0.0) * 2.0; match(PI); break; case SIN: match(SIN); match(OPAREN); answer = expr(); match(CPAREN); answer = sin(answer); break; case COS: match(COS); match(OPAREN); answer = expr(); match(CPAREN); answer = cos(answer); break; case TAN: match(TAN); match(OPAREN); answer = expr(); match(CPAREN); answer = tan(answer); break; case LN: match(LN); match(OPAREN); answer = expr(); match(CPAREN); if(answer > 0) answer = log(answer); else seterror(ERR_NEGLOG); break; case POW: match(POW); match(OPAREN); answer = expr(); match(COMMA); answer = pow(answer, expr()); match(CPAREN); break; case SQRT: match(SQRT); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= 0.0)

97

answer = sqrt(answer); else seterror(ERR_NEGSQRT); break; case ABS: match(ABS); match(OPAREN); answer = expr(); match(CPAREN); answer = fabs(answer); break; case LEN: match(LEN); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = strlen(str); free(str); } else answer = 0; break; case ASCII: match(ASCII); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = *str; free(str); } else answer = 0; break; case ASIN: match(ASIN); match(OPAREN); answer = expr(); match(CPAREN); if(answer >= -1 && answer <= 1) answer = asin(answer); else seterror(ERR_BADSINCOS); break; case ACOS: match(ACOS); match(OPAREN); answer = expr();

98

match(CPAREN); if(answer >= -1 && answer <= 1) answer = acos(answer); else seterror(ERR_BADSINCOS); break; case ATAN: match(ATAN); match(OPAREN); answer = expr(); match(CPAREN); answer = atan(answer); break; case INT: match(INT); match(OPAREN); answer = expr(); match(CPAREN); answer = floor(answer); break; case RND: match(RND); match(OPAREN); answer = expr(); match(CPAREN); answer = integer(answer); if(answer > 1) answer = floor(rand()/(RAND_MAX + 1.0) * answer); else if(answer == 1) answer = rand()/(RAND_MAX + 1.0); else { if(answer < 0) srand( (unsigned) -answer); answer = 0; } break; case VAL: match(VAL); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { answer = strtod(str, 0); free(str); } else answer = 0; break;

99

case VALLEN: match(VALLEN); match(OPAREN); str = stringexpr(); match(CPAREN); if(str) { strtod(str, &end); answer = end - str; free(str); } else answer = 0.0; break; case INSTR: answer = instr(); break; default: if(isstring(token)) seterror(ERR_TYPEMISMATCH); else seterror(ERR_SYNTAX); break; } while(token == SHRIEK) { match(SHRIEK); answer = factorial(answer); } return answer; } /* calcualte the INSTR() function. */ static double instr(void) { char *str; char *substr; char *end; double answer = 0; int offset; match(INSTR); match(OPAREN); str = stringexpr(); match(COMMA); substr = stringexpr(); match(COMMA);

100

offset = integer( expr() ); offset--; match(CPAREN); if(!str || ! substr) { if(str) free(str); if(substr) free(substr); return 0; } if(offset >= 0 && offset < (int) strlen(str)) { end = strstr(str + offset, substr); if(end) answer = end - str + 1.0; } free(str); free(substr); return answer; } /* get the value of a scalar variable from string matches FLTID */ static double variable(void) { VARIABLE *var; char id[32]; int len; getid(string, id, &len); match(FLTID); var = findvariable(id); if(var) return var->dval; else { seterror(ERR_NOSUCHVARIABLE); return 0.0; } }

101

/* get value of a dimensioned variable from string. matches DIMFLTID */ static double dimvariable(void) { DIMVAR *dimvar; char id[32]; int len; int index[5]; double *answer; getid(string, id, &len); match(DIMFLTID); dimvar = finddimvar(id); if(!dimvar) { seterror(ERR_NOSUCHVARIABLE); return 0.0; } if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); answer = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA);

102

index[2] = integer( expr() match(COMMA); index[3] = integer( expr() answer = getdimvar(dimvar, index[2], index[3]); break; case 5: index[0] = integer( expr() match(COMMA); index[1] = integer( expr() match(COMMA); index[2] = integer( expr() match(COMMA); index[3] = integer( expr() match(COMMA); index[4] = integer( expr() answer = getdimvar(dimvar, index[2], index[3], index[4]); break;

); ); index[0], index[1],

); ); ); ); ); index[0], index[1],

} match(CPAREN); } if(answer) return *answer; return 0.0; } /* find a scalar variable invariables list Params: id - id to get Returns: pointer to that entry, 0 on fail */ static VARIABLE *findvariable(const char *id) { int i; for(i=0;i

103

/* get a dimensioned array by name Params: id (includes opening parenthesis) Returns: pointer to array entry or 0 on fail */ static DIMVAR *finddimvar(const char *id) { int i; for(i=0;i

104

} else oldsize = 0; va_start(vargs, ndims); for(i=0;i

105

} break; default: assert(0); } for(i=0;i<5;i++) dv->dim[i] = dimensions[i]; dv->ndims = ndims; return dv; } /* get the address of a dimensioned array element. works for both string and real arrays. Params: dv - the array's entry in variable list ... - integers telling which array element to get Returns: the address of that element, 0 on fail */ static void *getdimvar(DIMVAR *dv, ...) { va_list vargs; int index[5]; int i; void *answer = 0; va_start(vargs, dv); for(i=0;i

106

answer = &dv->dval[ index[1] * dv->dim[0] + index[0] ]; break; case 3: answer = &dv->dval[ index[2] * (dv->dim[0] * dv>dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; case 4: answer = &dv->dval[ index[3] * (dv->dim[0] + dv->dim[1] + dv->dim[2]) + index[2] * (dv->dim[0] * dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; case 5: answer = &dv->dval[ index[4] * (dv->dim[0] + dv->dim[1] + dv->dim[2] + dv->dim[3]) + index[3] * (dv->dim[0] + dv->dim[1] + dv>dim[2]) + index[2] * (dv->dim[0] + dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; } } else if(dv->type = STRID) { switch(dv->ndims) { case 1: answer = &dv->str[ index[0] ]; break; case 2: answer = &dv->str[ index[1] * dv->dim[0] + index[0] ]; break; case 3: answer = &dv->str[ index[2] * (dv->dim[0] * dv>dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; case 4: answer = &dv->str[ index[3] * (dv->dim[0] + dv>dim[1] + dv->dim[2]) + index[2] * (dv->dim[0] * dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; case 5:

107

answer = &dv->str[ index[4] * (dv->dim[0] + dv>dim[1] + dv->dim[2] + dv->dim[3]) + index[3] * (dv->dim[0] + dv->dim[1] + dv>dim[2]) + index[2] * (dv->dim[0] + dv->dim[1]) + index[1] * dv->dim[0] + index[0] ]; break; } } return answer; } /* add a real varaible to our variable list Params: id - id of varaible to add. Returns: pointer to new entry in table */ static VARIABLE *addfloat(const char *id) { VARIABLE *vars; vars = realloc(variables, (nvariables + 1) * sizeof(VARIABLE)); if(vars) { variables = vars; strcpy(variables[nvariables].id, id); variables[nvariables].dval = 0; variables[nvariables].sval = 0; nvariables++; return &variables[nvariables-1]; } else seterror(ERR_OUTOFMEMORY); return 0; }

108

/* add a string variable to table. Params: id - id of variable to get (including trailing $) Retruns: pointer to new entry in table, 0 on fail. */ static VARIABLE *addstring(const char *id) { VARIABLE *vars; vars = realloc(variables, (nvariables + 1) * sizeof(VARIABLE)); if(vars) { variables = vars; strcpy(variables[nvariables].id, id); variables[nvariables].sval = 0; variables[nvariables].dval = 0; nvariables++; return &variables[nvariables-1]; } else seterror(ERR_OUTOFMEMORY); return 0; } /* add a new array to our symbol table. Params: id - id of array (include leading () Returns: pointer to new entry, 0 on fail. */ static DIMVAR *adddimvar(const char *id) { DIMVAR *vars; vars = realloc(dimvariables, (ndimvariables + 1) * sizeof(DIMVAR)); if(vars) { dimvariables = vars; strcpy(dimvariables[ndimvariables].id, id); dimvariables[ndimvariables].dval = 0; dimvariables[ndimvariables].str = 0; dimvariables[ndimvariables].ndims = 0; dimvariables[ndimvariables].type = strchr(id, '$') ? STRID : FLTID; ndimvariables++; return &dimvariables[ndimvariables-1];

109

} else seterror(ERR_OUTOFMEMORY); return 0; } /* high level string parsing function. Returns: a malloced pointer, or 0 on error condition. caller must free! */ static { char char char

char *stringexpr(void) *left; *right; *temp;

switch(token) { case DIMSTRID: left = mystrdup(stringdimvar()); break; case STRID: left = mystrdup(stringvar()); break; case QUOTE: left = stringliteral(); break; case CHRSTRING: left = chrstring(); break; case STRSTRING: left = strstring(); break; case LEFTSTRING: left = leftstring(); break; case RIGHTSTRING: left = rightstring(); break; case MIDSTRING: left = midstring(); break; case STRINGSTRING: left = stringstring(); break; default: if(!isstring(token)) seterror(ERR_TYPEMISMATCH); else

110

seterror(ERR_SYNTAX); return mystrdup(""); } if(!left) { seterror(ERR_OUTOFMEMORY); return 0; } switch(token) { case PLUS: match(PLUS); right = stringexpr(); if(right) { temp = mystrconcat(left, right); free(right); if(temp) { free(left); left = temp; } else seterror(ERR_OUTOFMEMORY); } else seterror(ERR_OUTOFMEMORY); break; default: return left; } return left; } /* parse the CHR$ token */ static char *chrstring(void) { double x; char buff[6]; char *answer; match(CHRSTRING); match(OPAREN); x = integer( expr() ); match(CPAREN);

111

buff[0] = (char) x; buff[1] = 0; answer = mystrdup(buff); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the STR$ token */ static char *strstring(void) { double x; char buff[64]; char *answer; match(STRSTRING); match(OPAREN); x = expr(); match(CPAREN); sprintf(buff, "%g", x); answer = mystrdup(buff); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the LEFT$ token */ static char *leftstring(void) { char *str; int x; char *answer; match(LEFTSTRING); match(OPAREN); str = stringexpr(); if(!str) return 0; match(COMMA); x = integer( expr() ); match(CPAREN); if(x > (int) strlen(str)) return str;

112

if(x < 0) { seterror(ERR_ILLEGALOFFSET); return str; } str[x] = 0; answer = mystrdup(str); free(str); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } /* parse the RIGHT$ token */ static char *rightstring(void) { int x; char *str; char *answer; match(RIGHTSTRING); match(OPAREN); str = stringexpr(); if(!str) return 0; match(COMMA); x = integer( expr() ); match(CPAREN); if( x > (int) strlen(str)) return str; if(x < 0) { seterror(ERR_ILLEGALOFFSET); return str; } answer = mystrdup( &str[strlen(str) - x] ); free(str); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; }

113

/* parse the MID$ token */ static char *midstring(void) { char *str; int x; int len; char *answer; char *temp; match(MIDSTRING); match(OPAREN); str = stringexpr(); match(COMMA); x = integer( expr() ); match(COMMA); len = integer( expr() ); match(CPAREN); if(!str) return 0; if(len == -1) len = strlen(str) - x + 1; if( x > (int) strlen(str) || len < 1) { free(str); answer = mystrdup(""); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } if(x < 1.0) { seterror(ERR_ILLEGALOFFSET); return str; } temp = &str[x-1]; answer = malloc(len + 1); if(!answer) { seterror(ERR_OUTOFMEMORY); return str; } strncpy(answer, temp, len);

114

answer[len] = 0; free(str); return answer; } /* parse the string$ token */ static char *stringstring(void) { int x; char *str; char *answer; int len; int N; int i; match(STRINGSTRING); match(OPAREN); x = integer( expr() ); match(COMMA); str = stringexpr(); match(CPAREN); if(!str) return 0; N = x; if(N < 1) { free(str); answer = mystrdup(""); if(!answer) seterror(ERR_OUTOFMEMORY); return answer; } len = strlen(str); answer = malloc( N * len + 1 ); if(!answer) { free(str); seterror(ERR_OUTOFMEMORY); return 0; } for(i=0; i < N; i++) { strcpy(answer + len * i, str); }

115

free(str); return answer; } /* read a dimensioned string variable from input. Returns: pointer to string (not malloced) */ static char *stringdimvar(void) { char id[32]; int len; DIMVAR *dimvar; char **answer; int index[5]; getid(string, id, &len); match(DIMSTRID); dimvar = finddimvar(id); if(dimvar) { switch(dimvar->ndims) { case 1: index[0] = integer( expr() ); answer = getdimvar(dimvar, index[0]); break; case 2: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1]); break; case 3: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); answer = getdimvar(dimvar, index[0], index[1], index[2]); break; case 4: index[0] = integer( expr() ); match(COMMA); index[1] = integer( expr() ); match(COMMA); index[2] = integer( expr() ); match(COMMA);

116

index[3] = integer( expr() answer = getdimvar(dimvar, index[2], index[3]); break; case 5: index[0] = integer( expr() match(COMMA); index[1] = integer( expr() match(COMMA); index[2] = integer( expr() match(COMMA); index[3] = integer( expr() match(COMMA); index[4] = integer( expr() answer = getdimvar(dimvar, index[2], index[3], index[4]); break;

); index[0], index[1],

); ); ); ); ); index[0], index[1],

} match(CPAREN); } else seterror(ERR_NOSUCHVARIABLE); if(!errorflag) if(*answer) return *answer; return ""; } /* parse a string variable. Returns: pointer to string (not malloced) */ static char *stringvar(void) { char id[32]; int len; VARIABLE *var; getid(string, id, &len); match(STRID); var = findvariable(id); if(var) { if(var->sval) return var->sval; return ""; }

117

seterror(ERR_NOSUCHVARIABLE); return ""; } /* parse a string literal Returns: malloced string literal Notes: newlines aren't allwed in literals, but blind concatenation across newlines is. */ static char *stringliteral(void) { int len = 1; char *answer = 0; char *temp; char *substr; char *end; while(token == QUOTE) { while(isspace(*string)) string++; end = mystrend(string, '"'); if(end) { len = end - string; substr = malloc(len); if(!substr) { seterror(ERR_OUTOFMEMORY); return answer; } mystrgrablit(substr, string); if(answer) { temp = mystrconcat(answer, substr); free(substr); free(answer); answer = temp; if(!answer) { seterror(ERR_OUTOFMEMORY); return answer; } } else answer = substr; string = end; } else

118

{ seterror(ERR_SYNTAX); return answer; } match(QUOTE); } return answer; } /* cast a double to an integer, triggering errors if out of range */ static int integer(double x) { if( x < INT_MIN || x > INT_MAX ) seterror( ERR_BADVALUE ); if( x != floor(x) ) seterror( ERR_NOTINT ); return (int) x; } /* check that we have a token of the passed type (if not set the errorflag) Move parser on to next token. Sets token and string. */ static void match(int tok) { if(token != tok) { seterror(ERR_SYNTAX); return; } while(isspace(*string)) string++; string += tokenlen(string, token); token = gettoken(string); if(token == ERROR) seterror(ERR_SYNTAX); }

119

/* set the errorflag. Params: errorcode - the error. Notes: ignores error cascades */ static void seterror(int errorcode) { if(errorflag == 0 || errorcode == 0) errorflag = errorcode; } /* get the next line number Params: str - pointer to parse string Returns: line no of next line, 0 if end Notes: goes to newline, then finds first line starting with a digit. */ static int getnextline(const char *str) { while(*str) { while(*str && *str != '\n') str++; if(*str == 0) return 0; str++; if(isdigit(*str)) return atoi(str); } return 0; } /* get a token from the string Params: str - string to read token from Notes: ignores white space between tokens */ static int gettoken(const char *str) { while(isspace(*str)) str++; if(isdigit(*str)) return VALUE; switch(*str) { case 0:

120

return EOS; case '\n': return EOL; case '/': return DIV; case '*': return MULT; case '(': return OPAREN; case ')': return CPAREN; case '+': return PLUS; case '-': return MINUS; case '!': return SHRIEK; case ',': return COMMA; case ';': return SEMICOLON; case '"': return QUOTE; case '=': return EQUALS; case '<': return LESS; case '>': return GREATER; default: if(!strncmp(str, "e", 1) && !isalnum(str[1])) return E; if(isupper(*str)) { if(!strncmp(str, "SIN", 3) && !isalnum(str[3])) return SIN; if(!strncmp(str, "COS", 3) && !isalnum(str[3])) return COS; if(!strncmp(str, "TAN", 3) && !isalnum(str[3])) return TAN; if(!strncmp(str, "LN", 2) && !isalnum(str[2])) return LN; if(!strncmp(str, "POW", 3) && !isalnum(str[3])) return POW; if(!strncmp(str, "PI", 2) && !isalnum(str[2])) return PI; if(!strncmp(str, "SQRT", 4) && !isalnum(str[4])) return SQRT; if(!strncmp(str, "PRINT", 5) && !isalnum(str[5])) return PRINT; if(!strncmp(str, "LET", 3) && !isalnum(str[3]))

121

return LET; if(!strncmp(str, return DIM; if(!strncmp(str, return IF; if(!strncmp(str, return THEN; if(!strncmp(str, return AND; if(!strncmp(str, return OR; if(!strncmp(str, return GOTO; if(!strncmp(str, return INPUT; if(!strncmp(str, return REM; if(!strncmp(str, return FOR; if(!strncmp(str, return TO; if(!strncmp(str, return NEXT; if(!strncmp(str, return STEP;

"DIM", 3) && !isalnum(str[3])) "IF", 2) && !isalnum(str[2])) "THEN", 4) && !isalnum(str[4])) "AND", 3) && !isalnum(str[3])) "OR", 2) && !isalnum(str[2])) "GOTO", 4) && !isalnum(str[4])) "INPUT", 5) && !isalnum(str[5])) "REM", 3) && !isalnum(str[3])) "FOR", 3) && !isalnum(str[3])) "TO", 2) && !isalnum(str[2])) "NEXT", 4) && !isalnum(str[4])) "STEP", 4) && !isalnum(str[4]))

if(!strncmp(str, "MOD", 3) && !isalnum(str[3])) return MOD; if(!strncmp(str, "ABS", 3) && !isalnum(str[3])) return ABS; if(!strncmp(str, "LEN", 3) && !isalnum(str[3])) return LEN; if(!strncmp(str, "ASCII", 5) && !isalnum(str[5])) return ASCII; if(!strncmp(str, "ASIN", 4) && !isalnum(str[4])) return ASIN; if(!strncmp(str, "ACOS", 4) && !isalnum(str[4])) return ACOS; if(!strncmp(str, "ATAN", 4) && !isalnum(str[4])) return ATAN; if(!strncmp(str, "INT", 3) && !isalnum(str[3])) return INT; if(!strncmp(str, "RND", 3) && !isalnum(str[3])) return RND; if(!strncmp(str, "VAL", 3) && !isalnum(str[3])) return VAL; if(!strncmp(str, "VALLEN", 6) && !isalnum(str[6])) return VALLEN; if(!strncmp(str, "INSTR", 5) && !isalnum(str[5])) return INSTR;

122

if(!strncmp(str, "CHR$", 4)) return CHRSTRING; if(!strncmp(str, "STR$", 4)) return STRSTRING; if(!strncmp(str, "LEFT$", 5)) return LEFTSTRING; if(!strncmp(str, "RIGHT$", 6)) return RIGHTSTRING; if(!strncmp(str, "MID$", 4)) return MIDSTRING; if(!strncmp(str, "STRING$", 7)) return STRINGSTRING; } /* end isupper() */ if(isalpha(*str)) { while(isalnum(*str)) str++; switch(*str) { case '$': return str[1] == '(' ? DIMSTRID : STRID; case '(': return DIMFLTID; default: return FLTID; } } return ERROR; } } /* get the length of a token. Params: str - pointer to the string containing the token token - the type of the token read Returns: length of the token, or 0 for EOL to prevent it being read past. */ static int tokenlen(const char *str, int token) { int len = 0; char buff[32]; switch(token) { case EOS: return 0;

123

case EOL: return 1; case VALUE: getvalue(str, &len); return len; case DIMSTRID: case DIMFLTID: case STRID: getid(str, buff, &len); return len; case FLTID: getid(str, buff, &len); return len; case PI: return 2; case E: return 1; case SIN: return 3; case COS: return 3; case TAN: return 3; case LN: return 2; case POW: return 3; case SQRT: return 4; case DIV: return 1; case MULT: return 1; case OPAREN: return 1; case CPAREN: return 1; case PLUS: return 1; case MINUS: return 1; case SHRIEK: return 1; case COMMA: return 1; case QUOTE: return 1; case EQUALS: return 1; case LESS: return 1;

124

case GREATER: return 1; case SEMICOLON: return 1; case ERROR: return 0; case PRINT: return 5; case LET: return 3; case DIM: return 3; case IF: return 2; case THEN: return 4; case AND: return 3; case OR: return 2; case GOTO: return 4; case INPUT: return 5; case REM: return 3; case FOR: return 3; case TO: return 2; case NEXT: return 4; case STEP: return 4; case MOD: return 3; case ABS: return 3; case LEN: return 3; case ASCII: return 5; case ASIN: return 4; case ACOS: return 4; case ATAN: return 4; case INT: return 3; case RND:

125

return 3; case VAL: return 3; case VALLEN: return 6; case INSTR: return 5; case CHRSTRING: return 4; case STRSTRING: return 4; case LEFTSTRING: return 5; case RIGHTSTRING: return 6; case MIDSTRING: return 4; case STRINGSTRING: return 7; default: assert(0); return 0; } } /* test if a token represents a string expression Params: token - token to test Returns: 1 if a string, else 0 */ static int isstring(int token) { if(token == STRID || token == QUOTE || token == DIMSTRID || token == CHRSTRING || token == STRSTRING || token == LEFTSTRING || token == RIGHTSTRING || token == MIDSTRING || token == STRINGSTRING) return 1; return 0; }

126

/* get a numerical value from the parse string Params: str - the string to search len - return pinter for no chars read Retuns: the value of the string. */ static double getvalue(const char *str, int *len) { double answer; char *end; answer = strtod(str, &end); assert(end != str); *len = end - str; return answer; } /* getid - get an id from the parse string: Params: str - string to search out - id output [32 chars max ] len - return pointer for id length Notes: triggers an error if id > 31 chars the id includes the $ and ( qualifiers. */ static void getid(const char *str, char *out, int *len) { int nread = 0; while(isspace(*str)) str++; assert(isalpha(*str)); while(isalnum(*str)) { if(nread < 31) out[nread++] = *str++; else { seterror(ERR_IDTOOLONG); break; } } if(*str == '$') { if(nread < 31) out[nread++] = *str++; else seterror(ERR_IDTOOLONG); } if(*str == '(') {

127

if(nread < 31) out[nread++] = *str++; else seterror(ERR_IDTOOLONG); } out[nread] = 0; *len = nread; }

/* grab a literal from the parse string. Params: dest - destination string src - source string Notes: strings are in quotes, double quotes the escape */ static void mystrgrablit(char *dest, const char *src) { assert(*src == '"'); src++; while(*src) { if(*src == '"') { if(src[1] == '"') { *dest++ = *src; src++; src++; } else break; } else *dest++ = *src++; } *dest++ = 0; }

128

/* find where a source string literal ends Params: src - string to check (must point to quote) quote - character to use for quotation Returns: pointer to quote which ends string Notes: quotes escape quotes */ static char *mystrend(const char *str, char quote) { assert(*str == quote); str++; while(*str) { while(*str != quote) { if(*str == '\n' || *str == 0) return 0; str++; } if(str[1] == quote) str += 2; else break; } return (char *) (*str? str : 0); } /* Count the instances of ch in str Params: str - string to check ch - character to count Returns: no time chs occurs in str. */ static int mystrcount(const char *str, char ch) { int answer = 0; while(*str) { if(*str++ == ch) answer++; } return answer; }

129

/* duplicate a string: Params: str - string to duplicate Returns: malloced duplicate. */ static char *mystrdup(const char *str) { char *answer; answer = malloc(strlen(str) + 1); if(answer) strcpy(answer, str); return answer; } /* concatenate two strings Params: str - firsts string cat - second string Returns: malloced string. */ static char *mystrconcat(const char *str, const char *cat) { int len; char *answer; len = strlen(str) + strlen(cat); answer = malloc(len + 1); if(answer) { strcpy(answer, str); strcat(answer, cat); } return answer; }

130

/* compute x! */ static double factorial(double x) { double answer = 1.0; double t; if( x > 1000.0) x = 1000.0; for(t=1;t<=x;t+=1.0) answer *= t; return answer; }

131

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close