the nimf guide : strings

 ____ ____ ____ ____ 
||n |||i |||m |||f ||
||__|||__|||__|||__||
|/__\|/__\|/__\|/__\|

Strings are difficult in nimf. They work, but require some management and some thinking ahead.

String Literals

A string litteral is a string that is written in code as a string. For example, we have seen the following example in other parts of this guide:

A string litteral starts with ". Then the text can be written. Then " is written again to finish the word. This is pretty similar to other programming languages, but different from most forth languages. String literals can include escapes, but they can also include newlines, tabs, or anything else that you can get into your text file (strings can span multiple lines being the key part here).

The Temporary String Buffer

Any time a string literal is encountered, the text of that string will be stored into the temporary string buffer. This buffer resides at memory address 50. If youb inline the text module, you can use str.buf-addr to add the buffer address to the stack, or str.print-buf to print the current buffer data.

The name of this buffer having the word temporary in it is no accident. There is no guarantee that a string will remain in the buffer for long. Any time a word uses a string literal that new string gets moved into the buffer. As such, you will need to move temporary strings from the buffer into long term memory if you want to keep them around without worrying about them getting lost. It is never an issue to input a string literal and then call a word to output the temp buffer. Its presence can be relied upon for that time scale, and is often used in that way.

It is important to know that the temporary string buffer extends from memory address 50 to memory address 29999. If you need more memory than that, you will need to process your incoming string data in batches. Builtins that "read all" from a source (file or tcp) will truncate the data to fit if need be. If you are expecting a large amount of data, doing incremental reads is the better call. For a more detailed look at the nimf memory map please see the memory section of the nimf guide.

How Strings Are Stored In Memory

Many programmers familiar with C will be used to "null terminated strings". nimf does not use null terminated strings. It does, like C, think of them as character arrays though. As such, the address of a string in memory contains the length of the string. You can then use that length to determine offsets for utilizing the string data.

    Address: [  50  ,  51 ,  52 ,  53 ,  54 ,  55 ]
      Value: [   5  , 104 , 101 , 108 , 108 , 111 ]
Description: [length,  `h ,  `e ,  `l ,  `l ,  `o ]

This alows for quick access to the length of the string without having to read ahead. It also allows for using offsets easily. For example:

In the above example we first inline the text module, so that we have access to some convenient words. We then enter the string literal hello. That gets stored in the temporary string buffer. We print it by calling str.print-buf, which outputs hello. We then add the temporary string buffer's address to the stack twice (well, once... then we duplicate it). We eat the last value with @ to get the length and put it on top of the stack. We then add the top two stack values to add the length to the temporary string buffer address (50 + 5), resulting in 55 being TOS. We then call emit to output the value at that address.

That seems like a lot just to get the value of a character in a string, right? Just remember that it would be easy to define a word that takes a string address and an offset and either outputs the character at that address/offset or that adds its value to TOS. We are working intentionally low level in the guide to get the basics down.

Something else to note about the above example is that when we inlined the text module we did so with a string literal. That means that prior to us adding the string literal hello, the temporary string buffer had the word text in it. You could absolutely interact with that string before inlining. After inlining it is highly probable that the temporary string buffer had a different value in it. Since text also inlines other modules, the strings that were used to inline them would also go into the termporary string buffer.

Be sure to read the next section of this guide, which will cover variables, memory, and string variables. There are a number of techniques to work with strings in nimf, and most involve reserving extra memory and overwriting variables with new string values.

Strings and Builtins

A number of builtins (file.read, file.open, get-env, and tcp.read all come to mind) put data into the temporary string buffer. For example, if you use file.read to read a line from a file it will read the line into the temporary string buffer. From there you can move it into long term memory, analyze it, print/output it, modify it... anything you like. Then, when you call file.read again the next available line will replace the prior one in the temporary string buffer.

It is very possible to define your own words that write to the buffer as well. Just remember to update the length as needed so that words like str.print-buf do not try and read more or less than you expect them to.