Read from a file into a large char array

Chapter: Read from a file into a large char array

As we read words from a file, we are going to place the chars in a big array wordsbuffer[]. Each individual word will be entered into the big buffer and will be followed by a character 0 (backslash 0).

We'll skip over non-alphabetic characters anc convert all upper case letters to lower-case. Here's some helpful code:


int alpha(char ch) {
  return  ((ch>='a'&&ch<='z')||(ch>='A'&&ch<='Z'));
}

char lower(char c) {
  return (c >= 'A' && c <= 'Z') ? c + 'a' - 'A' : c;
}

These are used for checking if the current char ch is alphabetic, and enabling us to convert it to lowercase.

Exercise 6

Write a main() to go with these to test them

The starting locations (within the buffer) of each individual word will be used to reference the word. As we process the file, we'll keep track of three locations within the buffer:

Here are declarations:

/* For reading words from a file and storing them in wordsbuffer we need */
FILE *inFile;
char wordsbuffer[MAX];
char *previousword = wordsbuffer - 1;      // doesn't exist yet
char *thisword = wordsbuffer - 1;          // doesn't exist yet
char *nextword = wordsbuffer;              // next word will go here
int done = 0;                              // not done yet

The code for obtaining a word is a little tricky because we want to enable an ungetword facility for removing a word that need not have been entered into the buffer.

char *getword() {
  if (done) return (char*) EOF;
  previousword = thisword;
  thisword = nextword;
  while (!alpha(*nextword = fgetc(inFile)) && (*nextword != EOF));
  if (*nextword == EOF) return (char*) EOF;
  *nextword = lower(*nextword);
  while (alpha(*++nextword = lower(fgetc(inFile))));
  if (*nextword == EOF) done = 1;
  *nextword++ = '\0';
  return thisword;
}

Suppose we are some way into reading Hamlet Act 3 Scene 1. We're reading the beginning of Hamlet's famous soliloquy:

Exeunt KING CLAUDIUS and POLONIUS

Enter HAMLET

HAMLET
To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
We've finished reading "HAMLET" and beginning on "To be or not".

The program proceeds to read the word "be". The sequence is shown:

Continuing:

we read the next word: "to" again!

At this stage, we know we don't want the second "to". So we do an unget() and rearrange our pointers to get:

The program will now proceed to re-use the locations currently occupied by that second "to". As a matter of fact, that will be a second "be" and will be ungot also. Eventually to be filled with "that".

Ungetting a word is not too hard since we maintained those three pointers into the buffer:


void ungetword() {  // ungets at most one word
  if (previousword < wordsbuffer) 
    printf("Attempt to unget what has not been got\n");
  else {
    nextword = thisword;
    thisword = previousword;
    previousword = wordsbuffer - 1;
  }
}

Exercise 7

Write code now to test these functions and make sure you understand their workings.

Q. 1
What if you want to unget the most recent two words?


How do you compare two words?


int strcomp(char *s, char *t) {
  for ( ; *s == *t; s++, t++ ) if (*s == '\0') return 0;
  return (*s - *t);
}

strcomp() compares each corresponding char until either both are string terminators or until one differs from the other. strcomp() returns 0 if the strings pointed to by s and t are exactly the same, otherwise it returns a positive or a negative int according as the string s is later or earlier alphabetically than string t. Figure it out.


rhyspj@gwu.edu