As we read words from a file, we are going to place the chars in a big array wordsbuffer[]. Each individual word will be entered into the big buffer and will be followed by a character 0 (backslash 0).
We'll skip over non-alphabetic characters anc convert all upper case letters to lower-case. Here's some helpful code:
int alpha(char ch) { return ((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z')); } char lower(char c) { return (c >= 'A' && c <= 'Z') ? c + 'a' - 'A' : c; }These are used for checking if the current char ch is alphabetic, and enabling us to convert it to lowercase.
The starting locations (within the buffer) of each individual word will be used to reference the word. As we process the file, we'll keep track of three locations within the buffer:
/* For reading words from a file and storing them in wordsbuffer we need */ FILE *inFile; char wordsbuffer[MAX]; char *previousword = wordsbuffer - 1; // doesn't exist yet char *thisword = wordsbuffer - 1; // doesn't exist yet char *nextword = wordsbuffer; // next word will go here int done = 0; // not done yet
The code for obtaining a word is a little tricky because we want to enable an ungetword facility for removing a word that need not have been entered into the buffer.
char *getword() { if (done) return (char*) EOF; previousword = thisword; thisword = nextword; while (!alpha(*nextword = fgetc(inFile)) && (*nextword != EOF)); if (*nextword == EOF) return (char*) EOF; *nextword = lower(*nextword); while (alpha(*++nextword = lower(fgetc(inFile)))); if (*nextword == EOF) done = 1; *nextword++ = '\0'; return thisword; }
Suppose we are some way into reading Hamlet Act 3 Scene 1. We're reading the beginning of Hamlet's famous soliloquy:
Exeunt KING CLAUDIUS and POLONIUS Enter HAMLET HAMLET To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune, Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocksWe've finished reading "HAMLET" and beginning on "To be or not".
The program proceeds to read the word "be". The sequence is shown:
Continuing:
we read the next word: "to" again!
At this stage, we know we don't want the second "to". So we do an unget() and rearrange our pointers to get:
The program will now proceed to re-use the locations currently occupied by that second "to". As a matter of fact, that will be a second "be" and will be ungot also. Eventually to be filled with "that".
Ungetting a word is not too hard since we maintained those three pointers into the buffer:
void ungetword() { // ungets at most one word if (previousword < wordsbuffer) printf("Attempt to unget what has not been got\n"); else { nextword = thisword; thisword = previousword; previousword = wordsbuffer - 1; } }
How do you compare two words?
int strcomp(char *s, char *t) { for ( ; *s == *t; s++, t++ ) if (*s == '\0') return 0; return (*s - *t); }
strcomp() compares each corresponding char until either both are string terminators or until one differs from the other. strcomp() returns 0 if the strings pointed to by s and t are exactly the same, otherwise it returns a positive or a negative int according as the string s is later or earlier alphabetically than string t. Figure it out.