Total Pageviews

Saturday, November 13, 2010

Lesson 33 -C-Handling Files in C

This section describes the use of C's input / output facilities for reading and writing files. There is also a brief description of string handling functions here.

The functions are all variants on the forms of input / output which were introduced in the previous section.

UNIX File Redirection

UNIX has a facility called redirection which allows a program to access a single input file and a single output file very easily. The program is written to read from the keyboard and write to the terminal screen as normal.

To run prog1 but read data from file infile instead of the keyboard, you would type


prog1 < infile To run prog1 and write data to outfile instead of the screen, you would type prog1 > outfile
Both can also be combined as in

prog1 < infile > outfile
Redirection is simple, and allows a single program to read or write data to or from files or the screen and keyboard.

Some programs need to access several files for input or output, redirection cannot do this. In such cases you will have to use C's file handling facilities.

C File Handling - File Pointers

C communicates with files using a new datatype called a file pointer. This type is defined within stdio.h, and written as FILE *. A file pointer called output_file is declared in a statement like


FILE *output_file;

Opening a file pointer using fopen

Your program must open a file before it can access it. This is done using the fopen function, which returns the required file pointer. If the file cannot be opened for any reason then the value NULL will be returned. You will usually use fopen as follows


if ((output_file = fopen("output_file", "w")) == NULL)
fprintf(stderr, "Cannot open %s\n", "output_file");
fopen takes two arguments, both are strings, the first is the name of the file to be opened, the second is an access character, which is usually one of:

"r" Open file for reading
"w" Create file for writing
"a" Open file for appending

As usual, use the man command for further details by typing man fopen.

Standard file pointers in UNIX

UNIX systems provide three file descriptors which are automatically open to all C programs. These are

stdin The standard input. The keyboard or a redirected input file.
stdout The standard output. The screen or a redirected output file.
stderr The standard error. This is the screen, even when ouput is redirected. This is the conventional place to put any error messages.

Since these files are already open, there is no need to use fopen on them.

Closing a file using fclose

The fclose command can be used to disconnect a file pointer from a file. This is usually done so that the pointer can be used to access a different file. Systems have a limit on the number of files which can be open simultaneously, so it is a good idea to close a file when you have finished using it.

This would be done using a statement like


fclose(output_file);
If files are still open when a program exits, the system will close them for you. However it is usually better to close the files properly.

Input and Output using file pointers

Having opened a file pointer, you will wish to use it for either input or output. C supplies a set of functions to allow you to do this. All are very similar to input and output functions that you have already met.

Character Input and Output with Files

This is done using equivalents of getchar and putchar which are called getc and putc. Each takes an extra argument, which identifies the file pointer to be used for input or output.

puchar(c) is equivalent to putc(c, stdout)
getchar() is equivalent to getc(stdin)

Formatted Input Output with File Pointers

Similarly there are equivalents to the functions printf and scanf which read or write data to files. These are called fprintf and fscanf. You have already seen fprintf being used to write data to stderr.

The functions are used in the same way, except that the fprintf and fscanf take the file pointer as an additional first argument.

Formatted Input Output with Strings

These are the third set of the printf and scanf families. They are called sprintf and sscanf.

sprintf
puts formatted data into a string which must have sufficient space allocated to hold it. This can be done by declaring it as an array of char. The data is formatted according to a control string of the same form as that for p rintf.
sscanf
takes data from a string and stores it in other variables as specified by the control string. This is done in the same way that scanf reads input data into variables. sscanf is very useful for converting strings into numeric v values.

Whole Line Input and Output using File Pointers

Predictably, equivalents to gets and puts exist called fgets and fputs. The programmer should be careful in using them, since they are incompatible with gets and puts. gets requires the programmer to specify the maximum number of characters to be read. fgets and fputs retain the trailing newline character on the line they read or write, wheras gets and puts discard the newline.

When transferring data from files to standard input / output channels, the simplest way to avoid incompatibility with the newline is to use fgets and fputs for files and standard channels too.

For Example, read a line from the keyboard using


fgets(data_string, 80, stdin);
and write a line to the screen using

fputs(data_string, stdout);

Special Characters

C makes use of some 'invisible' characters which have already been mentioned. However a fuller description seems appropriate here.

NULL, The Null Pointer or Character

NULL is a character or pointer value. If a pointer, then the pointer variable does not reference any object (i.e. a pointer to nothing). It is usual for functions which return pointers to return NULL if they failed in some way. The return value can be tested. See the section on fopen for an example of this.

NULL is returned by read commands of the gets family when they try to read beyond the end of an input file.

Where it is used as a character, NULL is commonly written as '\0'. It is the string termination character which is automatically appended to any strings in your C program. You usually need not bother about this final \0', since it is handled automatically. However it sometimes makes a useful target to terminate a string search. There is an example of this in the string_length function example in the section on Functions in C.

EOF, The End of File Marker

EOF is a character which indicates the end of a file. It is returned by read commands of the getc and scanf families when they try to read beyond the end of a file.

Other String Handling Functions

As well as sprintf and sscanf, the UNIX system has a number of other string handling functions within its libraries. A number of the most useful ones are contained in the file, and are made available by putting the line


#include
near to the head of your program file.
A couple of the functions are described below.

strcpy(str1, str2) Copies str2 into str1
strcmp(str1, str2) Compares the contents of str1 and str2. Return 0(false) if both are equal.

A full list of these functions can be seen using the man command by typing


man 3 strings

Conclusion

The variety of different types of input and output, using standard input or output, files or character strings make C a very powerful language. The addition of character input and output make it highly suitable for applications where the format of data must be controlled very precisely.

No comments:

Post a Comment