Character string manipulation library

August, 1998. Beta version; documentation updated August, 1998

Introduction

This is a string library that is intended to be compatible with the class string library in the December 1996 draft of the C++ standard. My version is for strings of characters of type char only.

It is intended for people who do not have access to an official version of the string library or wish to use a version without templates.

It follows the standard class string as I understand it, except that a few functions that are relevant only to the template version are omitted, all the functions involving iterators are omitted and the input/output routines are not standard.

I use the name String rather than string to prevent conflicts with other string libraries (as in BC 5.0).

I claim copyright for this program. The initial version was taken from Tony Hansen's book The C++ answer book, but very little of Tony's code remains.

Permission is granted to use this, but not to sell. I take no responsibility for errors, omissions etc, but please tell me about them.

This library links into my exception package. You need to edit the file include.h to determine whether to use simulated exceptions or compiler supported exceptions or simply to disable exceptions. More information on the exception package is given in the documentation for my matrix library, newmat09.

The package uses a limited form of copy-on-write (see Tony Hansen's book for more details) and also attempts to avoid repeated reallocation of the string storage during a multiple sum. This results in some saving in space and time for some operations at the expense of an increase in the complexity of the program and an increase in the time used by a few operations. As with newmat09 it is still an open question whether the extra complexity is really warranted. Or under what circumstances it is really warranted.

 

Files in this package

The following files are included in this package

str.h header file for the string library
str.cpp function bodies
boolean.h simulation of the standard boolean type
myexcept.h header for the exceptions simulator
myexcept.cpp bodies for the exceptions simulator
include.h options header file (see documentation in newmat09)
strtst.cpp test program
strtst.txt output from the test program
test_exs.cpp test exceptions
test_exs.txt output from test_exs
readme.txt readme file
string.htm this file
st_gnu.mak make file for gnu c++
st_cc.mak make file for CC

 

Testing and getting started

I have tested this program on the Borland 5.0, 4.53 (32 bit only, test program won't run under 16 bit), 3.1, MS VC++ 5, Watcom 10a, gnu 2.7.2, 2.8.0 and Sun CC compilers.

For Borland 5.0, MS VC++ 5 and Gnu you need to edit include.h to disable my simulated Booleans.

CC compilers generate 14 error messages when running the strtst test program. I suspect these are due to a slightly different convention in deleting temporaries and don't matter.

For the indexes, lengths etc I use unsigned integer (typedefed to uint). This is instead of size_type in the official package. Using size_type (or size_t) as a type of variable seems too bizarre for me to use (as yet).

You will need to #include files include.h and str.h in your programs that use this package. Don't forget to edit include.h to determine whether exceptions are to be used, simulated or disabled. If you use the simulated exceptions you should turn off the exception capability of a compiler that does support exceptions. If your compiler supports bool variables edit the option in include.h to disable my simulated bool variables.

 

The public member functions

Static variable

static uint npos String::npos is the largest possible value of uint and is used to indicate that a find function has failed to find its target. All Strings must have length strictly less than String::npos

Constructors, destruction, operator=

String() construct a String of zero length
String(const String&str) copy constructor (not explicitly in standard)
String(const String&str, uint pos, uint n = npos) construct a String from str starting at location pos (first location = 0) and continuing for the length of the String or for n characters, whichever occurs first
String(const char* s, uint n) construct a String from s taking a maximum of n characters or the length of the String
String(const char* s) construct a String from s
String(uint n, char c) construct a String consisting of n copies of the character c
~String() the destructor
String& operator=(const String& str) copy a String (except that it may be able to avoid copying)
String& operator=(const char* s) set a String equal to a c-style character string pointed to by s
String& operator=(const char c) set a String equal to a character

Storage control

uint size() const the length of the String (does not include a trailing zero - in most cases there isn't one)
uint length() const same as size
uint max_size() const the maximum size of a String - not sure what the standard wants, I have set it to npos-1
void resize(uint n, char c = 0) change the size of a String, either by truncating or filling out with copies of character c (std does default separately)
uint capacity() const the total space allocated for a String (always >= size())
void reserve(uint res_arg = 0) change the capacity of a String to the maximum of res_arg and size(). This may be an increase or a decrease in the capacity.
void clear() erase the contents of the string
bool empty() const true if the String is empty; false otherwise

Character access

char operator[](uint pos) const return the pos-th character; return 0 if pos = size()
char& operator[](uint pos) return a reference to the pos-th character; undefined if pos>=size() - I throw an exception. This reference may become invalid after almost any manipulation of the String
char at(uint n) const same as operator[] const
char& at(uint n) same as operator[]. Throw an exception of pos >=size()

The editing functions

String& operator+=(const String& rhs) append rhs to a String (I don't invalidate pointers and references to the stored c-string if the new extended String will fit into the capacity of the old String - see policy on reallocation)
String& operator+=(const char* s) append the c-string defined by s to a String - see note above
String& operator+=(char c) append the character c to a String - see note above
String& append(const String& str) append str to a String - see note above
String& append(const String& str, uint pos, uint n = npos) append String(str,pos,npos) - see note above
String& append(const char* s, uint n) append String(s,n) - see note above
String& append(const char* s) append String(s) - see note above
String& append(uint n, char c = 0) append character c - see note above
String& assign(const String& str) replace the String by str - this and the following manipulation functions may invalidate pointers and references to the String storage - see the section policy on reallocation. (this function is not explicitly in the standard)
String& assign(const String& str, uint pos, uint n = npos) replace the String by String(str,pos,n)
String& assign(const char* s, uint n) replace the String by String(s, n)
String& assign(const char* s) replace the String by String(s)
String& assign(uint n, char c = 0) replace the String by String(c)
String& insert(uint pos1, const String& str) insert str before character pos1 (not explictly in standard)
String& insert(uint pos1, const String& str, uint pos2, uint n = npos) insert String(str,pos2,n) before character pos1
String& insert(uint pos, const char* s, uint n = npos) insert String(s,n) before character pos (std does default separately)
String& insert(uint pos, uint n, char c = 0) insert character c(s,n) before character pos
String& erase(uint pos = 0, uint n = npos) erase characters starting at pos and continuing for n characters or till the end of the String. This was originally called remove
String& replace(uint pos1, uint n1, const String& str) erase(pos1,n1); insert(pos1,str)
String& replace(uint pos1, uint n1, const String& str, uint pos2, uint n2 = npos) erase(pos1,n1); insert(pos1,str,pos2,n2)
String& replace(uint pos, uint n1, const char* s, uint n2 = npos) erase(pos,n1); insert(pos,s,n2); (std does default separately)
String& replace(uint pos, uint n1, uint n2, char c = 0) erase(pos,n1); insert(pos,n2,c)
uint copy(char* s, uint n, uint pos = 0) const copy a maximum of n characters from a string starting at position pos to memory starting at location given by s. Return the number of characters copied. I assume that the program has already allocated space for the characters
void swap(String&) a.swap(b) swaps the contents of Strings a and b. The standard also provides for a function swap(a,b) - see binary operators

Pointer to data

const char* c_str() const return a pointer to the contents of a String after appending (char)0 to the String. This pointer will be invalidated by almost any operation on the String
const char* data() const return a pointer to the contents of a String. This pointer will be invalidated by almost any operation on the String

The find functions

uint find(const String& str, uint pos = 0) const find the first location of str in a String starting at position pos. The location is relative to the beginning of the parent String. Return String::npos if not found
uint find(const char* s, uint pos, uint n) const find(String(s,n),pos)
uint find(const char* s, uint pos = 0) const find(String(s),pos)
uint find(const char c, uint pos = 0) const find(String(1,c),pos)
uint rfind(const String& str, uint pos = npos) const find the last location of str in a String starting at position pos. ie begin the search with the first character of str at position pos of the target String. The location is relative to the beginning of the parent String. Return String::npos if not found
uint rfind(const char* s, uint pos, uint n) const rfind(String(s,n),pos)
uint rfind(const char* s, uint pos = npos) const rfind(String(s),pos)
uint rfind(const char c, uint pos = npos) const rfind(String(1,c),pos)
uint find_first_of(const String& str, uint pos = 0) const find first of any element in str starting at pos. Return String::npos if not found
uint find_first_of(const char* s, uint pos, uint n) const find_first_of(String(s,n),pos)
uint find_first_of(const char* s, uint pos = 0) const find_first_of(String(s),pos)
uint find_first_of(const char c, uint pos = 0) const find_first_of(String(1,c),pos)
uint find_last_of(const String& str, uint pos = npos) const find last of any element in str starting at pos. Return String::npos if not found
uint find_last_of(const char* s, uint pos, uint n) const find_last_of(String(s,n),pos)
uint find_last_of(const char* s, uint pos = npos) const find_last_of(String(s),pos)
uint find_last_of(const char c, uint pos = npos) const find_last_of(String(1,c),pos)
uint find_first_not_of(const String& str, uint pos = 0) const find first of any element not in str starting at pos. Return String::npos if not found
uint find_first_not_of(const char* s, uint pos, uint n) const find_first_not_of(String(s,n),pos)
uint find_first_not_of(const char* s, uint pos = 0) const find_first_not_of(String(s),pos)
uint find_first_not_of(const char c, uint pos = 0) const find_first_not_of(String(1,c),pos)
uint find_last_not_of(const String& str, uint pos = npos) const find last of any element not in str starting at pos. Return String::npos if not found
uint find_last_not_of(const char* s, uint pos, uint n) const find_last_not_of(String(s,n),pos)
uint find_last_not_of(const char* s, uint pos = npos) const find_last_not_of(String(s),pos)
uint find_last_not_of(const char c, uint pos = npos) const find_last_not_of(String(1,c),pos)

The substring function

String substr(uint pos = 0, uint n = npos) const return String(*this, pos, n)

The compare functions

int compare(const String& str) const a.compare(b) compares a and b in normal sort order. Return -1, 0 or 1
int compare(uint pos, uint n, const String& str) const a.compare(pos,n,b) compares String(a,pos,n) and b in normal sort order. Return -1, 0 or 1
int compare(uint pos1, uint n1, const String& str, uint pos2, uint n2) const a.compare(pos1,n1,b,pos2,n2) compares String(a,pos1,n1) and String(b,pos2,n2) in normal sort order. Return -1, 0 or 1
int compare(const char* s) const return compare(String(s))
int compare(uint pos, uint n, const char* s) const return compare(pos, n, String(s))

The binary String functions

+ means concatenate, otherwise the meanings are obvious.

String operator+(const String& lhs, const String& rhs)
String operator+(const char* lhs, const String& rhs)
String operator+(char lhs, const String& rhs)
String operator+(const String& lhs, const char* rhs)
String operator+(const String& lhs, char rhs)
bool operator==(const String& lhs, const String& rhs)
bool operator==(const char* lhs, const String& rhs)
bool operator==(const String& lhs, const char* rhs)
bool operator!=(const String& lhs, const String& rhs)
bool operator!=(const char* lhs, const String& rhs)
bool operator!=(const String& lhs, const char* rhs)
bool operator<(const String& lhs, const String& rhs)
bool operator<(const char* lhs, const String& rhs)
bool operator<(const String& lhs, const char* rhs)
bool operator>(const String& lhs, const String& rhs)
bool operator>(const char* lhs, const String& rhs)
bool operator>(const String& lhs, const char* rhs)
bool operator<=(const String& lhs, const String& rhs)
bool operator<=(const char* lhs, const String& rhs)
bool operator<=(const String& lhs, const char* rhs)
bool operator>=(const String& lhs, const String& rhs)
bool operator>=(const char* lhs, const String& rhs)
bool operator>=(const String& lhs, const char* rhs)
void swap(const String& A, const String& B)

The stream functions - not properly implemented as yet:

istream& operator>>(istream& is, String& str)

   ... read token from istream - mine reads a whole line

ostream& operator<<(ostream& os, const String& str)

   ... output a String - mine ignores width setting

istream& getline(istream is, String& str, char delim = '\n')

   ... read a line - I haven't implemented this yet.

The policies

Reallocation policy

This section discusses under what circumstances the String data in a String object will be moved. It is unclear to me what the standard allows. Moving the String data invalidates the const char* returned by .data() and .c_str() and any reference returned by the non-const versions of .at() or operator[] (and any iterators refering to the string).

I describe here what my program does. Another standard String package may (and probably does) follow different rules.

The value returned by .c_str will most likely become invalid under almost any operation of the String which changes the value of the String. Also a call to .c_str will invalidate a const char* returned by .data() and any reference returned by .at() or operator[].

If A is a String that has been assigned a capacity with the reserve function then the following functions will not cause a reallocation (so the value returned by .data() etc. will remain valid)

   A += ...
   A.assign(...)
   A.append(...)
   A.insert(...)
   A.erase(...)
   A.replace(...)

where ... denotes a legitimate argument, providing the resulting String will fit in the assigned capacity (as set by a call to reserve).

If the resulting String will not fit into the assigned capacity the String data will be moved (so the value returned by .data() etc. will not remain valid). Also the String will no longer be regarded as having an assigned capacity.

The concept of having an assigned capacity is important in considering the behaviour of assign, erase and replace when the parameters are such that length of the String is reduced. For example

   String A = "0123456789";
   A.reserve(1); // will set capacity to A.size() = 10
   const char* d = A.data();
   A.erase(1,9);

will leave a valid value in d whereas

   String A = "0123456789";
   const char* d = A.data();
   A.erase(1,9);

will not leave a valid value in d since the storage of the String data will have been moved.

The operator= does not conform to these rules. A = something will always remove any assigned capacity for A (and will not pick up any capacity from the something).

In this package A.reserve() or A.reserve(0) will remove any assigned capacity. ie it will be as though no capacity had ever been assigned. So an erase or a replace that changes a length will cause a reallocation.

But don't expect anyone else's package to follow these rules.

Policy on operator+, operator+= and append

The evaluation of the concatenation expression A+B is delayed until the expression is used or until the value is referred to twice. This means the expressions such as A+B+C are evaluated in one sweep rather than having A+B formed as a temporary before evaluating A+B+C.

Unfortunately, this means that in expressions such as A + c_string the c-string c_string will be converted to a String object, before the overall String is formed. Since c-strings will usually be small I don't see this as a serious problem.

Likewise A+=X or A.append(X) will not be evaluated until the result is used (unless A has been assigned a capacity that is large enough to accommodate X). This means that sequences like

   A += X1;
   A += X2;
   ...

will not cause repeated reallocations of the space used by the String data.

 

To do list

 

History

 

August, 1998 changes

 

Go to top

To online documentation page