Lab 1 - Inverted Index

Due Wednesday 9/2 - 3:30pm

In this assignment, you will brush up on your Java skills and review Java's file I/O libraries and Collections framework.  You will write a program that processes several text files and builds an inverted index.  Your inverted index will be a data structure that stores a mapping from words to the documents in which those words were found. 

  1. You will design the inverted index data structure using any data structures available in the Java Collections framework.  Think about efficiency!  Insertion of new records should be fast.  Also, given a word, finding its record should be fast.
  2. Your program will take as input a String denoting a directory on the user's computer.  It will traverse the directory and all its subdirectories.  For each text file found (you may assume you only process files with extension .txt), your program will process the file and add the appropriate data to the inverted index.
    1. For each word in the file, your program will store a record in the inverted index indicating the document in which the word appears and the position at which the word was found in the document.
    2. Your program will ignore all characters except letters and digits.
  3. The output of your program will be a text file named output.txt that contains the information in the inverted index. 
  4. You will submit all of your code and class files in a jar called invertedindex.jar.  I will run your program as follows.  If your program does not run as follows, one letter grade will be deducted from your score.
                java -cp invertedindex.jar Driver -d /My/Directory

Submission Instructions