K&R - Chapter 6.1 - Structures

Chapter 6 - Structures

A structure is a collection of one or more variables, possibly of different types, grouped together under a single name for convenient handling.
Structures help to organize complicated data, allowing a group of related variables to be treated as a unit instead of separate entities.
Employee has many attributes, a point is a pair of co ordinates etc.

6.1 Basics of Structures

The keyword struct introduces a structure declaration, which is a list of declarations enclosed in braces.
An optional name called structure tag may follow the struct keyword. (here point) which can be used as shorthand.
The variables names in it are called members.

Basic object is a point which has x and y co-ordinate, both integers.

struct point {
	int x;
	int y;
};

A structure member or tag and an ordinary variable can have the same name without conflict, since they can always be distinguished by context.
The same member name may occur in closely related objects.

A struct declaration defines a type. The right brace that terminates the list of members may be followed by a list of variables, just as any basic types.

struct { ... } x, y, z; is similar to int x, y, z;
in a sense, it declares x, y, z to be variables of the named type and sets aside space for them.

A structure declaration that is not followed by a list of variables reserves no storage; it merely describes a template or shape of a structure.

struct point pt;
defines a variable pt which is a structure of type struct point.

struct date {
	int day;
	int month;
	int year;
	int yearday;
	char mon_name[4];
};

struct date d = { 14, 7, 1776, 186, "JUl" }; 
// initialized with list of initializers.

struct date d; template without without list of variables.

Operator . connects the structure name and member name.
A member of a structure is accessed / referred to in an expression by a construction of form, structure-name.member
printf("%d, %d", pt.x, pt.y) to print co-ordinates of the point pt

To set leap from the date in structure d

leap = d.year%4 == 0 && d.year%100 != 0 || d.year%400 == 0;

to check month name,

if (strcmp(d.mon_name, "Aug") == 0 ) ...

to convert first character of month name to lower case,

d.mon_name[0] = lower(d.mon_name[0]);

Structures can be nested (a rectangle is a pair of points that denotes diagonally opposite corners)

struct rect {
	struct point pt1;
	struct point pt2;
};

A payroll record

struct person {
	char name[NAMESIZE];
	char address[ADRSIZE];
	double salary;
	struct date birthday;
	struct date hiredate;
};

The person structure contains two date structures.
struct person emp; declaring emp;
emp.birthday.month refers to month of birth.
. associates from left to right.

If declared screen as struct rect screen;, then
screen.pt1.x refers to x co ordinate of pt1 member of screen

6.2 Structures and Functions

There are number of restrictions on C structures.
The only legal operations on a structure are taking its address with & and accessing its members, copying it, or assigning to it as a unit (as arguments),
(copying structures make a shallow copy, pointers are copied but does not make copy of the data to which the pointers point to. Structures in structures are also shallow copied)

Structures may not be compared.
A structure may be initialized by a list of constant member values;
an automatic structure may be initialized by an assignment.

makepoint will take two integers and return a point structure:

struct point makepoint(int x, int y)
{
	struct point temp;
	
	temp.x = x;
	temp.y = y;
	return temp;
}

Argument name and member names are same but there is no conflict.
makepoint can be used to make any structure dynamically or provide structure arguments to a function.

struct rect screen;
struct point middle;
struct point makepoint(int, int);

screen.pt1 = makepoint(0,0);
screen.pt2 = makepoint(XMAX, YMAX);
middle = makepoint( (screen.pt1.x) + (screen.pt2.x)/2,
					  (screen.pt1.y) + (screen.pt2.y)/2);

Functions for doing arithmetic on points.

struct addpoints (struct point p1, struct point p2)
{
	p1.x += p2.x;
	p1.y += p2.y;
	return p1;
}

Here both arguments and the return value are structures. . . .

Passing structures to a function as a pointer is more efficient than to copy the whole structure.
struct point *pp; says,
pp is a pointer to a structure of type strcut point.
If pp points to a point structure, *pp is the structure, and (*pp).x and (*pp).y are the members.

struct point origin, *pp;
pp = &origin;
printf("origin is (%d, %d)\n", (*pp).x, (*pp).y);

A shorthand to represent a pointer p to a structure. p->member-of-structure

printf("origin is (%d, %d)\n", pp->x, pp->y);

both . and -> associate from left to right.

struct rect r, *rp = &r;

r.pt1.x
(r.pt1).x
rp->pt1.x
(rp->pt1).x   // all four are equivalent

++p->len increments len, not p because it means++(p->len).
(++p)->len increments p before accessing len
(p++)->len increments p afterwards.

Similarly *p->str fetches whatever str points to;
*p->str++ increments str after accessing whatever it points to;
(*p->str)++ increments whatever str points to;
*p++->str increments p after accessing whatever str points to.

6.3 Arrays of Structures

To count the occurrences of each keyword in C.
Each keyword is a pair of word and its count:

char *word;
int count;

A structure with an array:

struct key {
	char *word;
	int count;
} keytab[NKEYS];

The structure declaration declares a structure of type key, defies an array keytab of structures in this type and sets aside storage for them.
Each element of the array is a structure. also written as.

struct key {
	char *word;
	int count;
};

struct key keytab[NKEYS];

Since keytab contains constant set of names, it is easier to make it an external variable and initialize it when it is defined.

struct key {
	char *word;
	int count;
} keytab [] = {
	{"auto", 0},
	{"break", 0},
	...
}

The inner braces are not necessary when initializers are simple variables or character strings but not in pairs corresponding to the structure members.


The keyword counting program……..

#include <stdio.h>
#include <ctype.h>
#include <string.h>

#define MAXWORD 100

int getword(char *, int);
int binsearch(char *, struct key, int);

// count C keywords
main() {
	int n;
	char word[MAXWORD];
	
	while (getword(word, MAXWORD) != EOF)
		if ( isalpha(word[0]) )
			if ( (n = binsearch(word, keytab, NKEYS)) >= 0)
				keytab[n].count++;
	
	for (n = 0; n < NKEYS; n++)
		if (keytab[n].count > 0)
			printf("%4d %s\n", keytab[n].count, keytab[n].word);
	
	return 0;
}

// find words in tab[0]...tab[n-1]
int binsearch(char *word, struct key tab[], int n)
{
	int cond;
	int low, mid, high;

	low = 0;
	high = n-1;
	
	while (low <= high) {
		mid = (low+high)/2
		if ( (cond = strcomp(word, tab[mid].word)) < 0)
			high = mid -1;
		else if (cond > 0)
			low = mid +1;
		else
			return mid;
		}
	return -1;
}

getword finds a word, which is copied into the array named as its first argument.

// get next word or character from input
int getword(char *word, int lim)
{
	int c, getch(void);
	void ungetch(int);
	char *w = word;
	
	while ( isspace(c = getch()) )
		;
	
	if ( c != EOF )
		*w++ = c;
	if (!isalpha(c)) {
		*w = '\0';
		return c;
	}
	for ( ; --lim > 0; w++)
		if (!isalnum(*w = getch()) ) {
			ungetch(*w);
			break;
		}
	*w = '\0';
	return word[0];
}

getword uses getch and ungetch from chapter 4.
isspace to skip space and isalpha to identify letters, isalnum to identify letters and digits; all are from <ctype.h>

6.4 Pointers to Structures

Rewriting the keyword counting program again using pointers instead of array indices. main and binsearch need modification.

#include <stdio.h>
#include <ctype.h>
#include <string.h>

#define MAXWORD 100

int getword(char *, int);
struct key *binsearch(char *, struct key, int);   // changed

// count C keywords
main() {
	char word[MAXWORD];
	struct key *p;
	
	while (getword(word, MAXWORD) != EOF)
		if ( isalpha(word[0]) )
			if ( (p = binsearch(word, keytab, NKEYS)) != NULL )
				p->count++;
	
	for (p = keytab; p < keytab + NKEYS; p++)
		if (p->count > 0)
			printf("%4d %s\n", p->count, p->word);
	return 0;
}

// find words in tab[0]...tab[n-1]
struct key *binsearch(char *word, struct key tab[], int n)
{
	int cond;
	struct key *low = &tab[0];
	struct key *high = &tab[n];
	struct key *mid;
		
	while (low < high) {
		mid = low + (high-low) /2;
		if ( (cond = strcomp(word, mid->word)) < 0)
			high = mid;
		else if (cond > 0)
			low = mid + 1;
		else
			return mid;
		}
	return NULL;
}

The declaration of binsearch indicates that it return a pointer to struct key instead of an integer. this is declared both in function prototype and in binsearch.
If it finds a word, it return a pointer to it otherwise NULL.

The elements of keytab are now accessed by pointers. which changes binsearch. high low are pointers.
computation of mid has to change as it is illegal to to add pointers. but subtraction is legal. high-low is number of elements so mid = low + (high-low) / 2 sets mid to element halfway between high and low.

// get next word or character from input
int getword(char *word, int lim)
{
	int c, getch(void);
	void ungetch(int);
	char *w = word;
	
	while ( isspace(c = getch()) )
		;
	
	if ( c != EOF )
		*w++ = c;
	if (!isalpha(c)) {
		*w = '\0';
		return c;
	}
	for ( ; --lim > 0; w++)
		if (!isalnum(*w = getch()) ) {
			ungetch(*w);
			break;
		}
	*w = '\0';
	return word[0];
}

6.5 Self-referential Structures

When wanting to handle data where the words are not known beforehand and searching through the seen words everytime in a list is not time efficient.
Solution is to keep a set of seen words in a sorted order all the time and placing the words in proper position as they arrive. This can be done by binary tree.

The binary tree node, with four components.

struct tnode {
	char *word;           // points to the text of the word
	int count;            // number of occurences
	struct tnode *left;   // points to left child
	struct tnode *right;  // points to right child
};

It is illegal for a structure to contain an instance of itself.
struct tnode *left; declares left to be a pointer to a tnode, not tnode itself.

No node may contain more than two children.
It is maintained in such a way that the left node always contains only the words which are lexicographically less than the word at that node, and right node contains the words that are greater.

#include <stdio.h>
#include <ctype.h>
#include <string.h>

#define MAXWORD 100
struct tnode *addtree(struct tnode *, char *);
void treeprint(struct tnode *);
int getword(char *, int);

// word frewuency count
main()
{
	struct tnode *root;
	char word[MAXWORD];
	
	root = NULL;
	while (getword(word, MAXWORD) != EOF)
		if (isalpha(word[0]))
			root = address(root, word);
	treeprint(root);
	return 0;
}

The function addtree is recursive.
A word is presented by main to top level of the tree. At each stage the word is comapred to word in the node; and percolated down by recursive call to addtree.
Eventually, the word either match something or null pointer is encountered, indicating that a node must be created and added to the tree.
addtree returns a pointer to new node.

. . . .

If tree becomes unbalanced because the words do not come in random order(words are already in order) then the running time of the program can grow too much.

6.6 Table Lookup

6.7 Typedef

6.8 Unions

6.9 Bit-fields