Zachary Johnson | Web Crawler and Search Algorithm

Web Crawler and Search Algorithm

A web crawler application created for COM S 311 (Analysis and Design of Algorithms). It uses breadth first search to generate a graph of web pages, starting from a seed URL and creating edges to all pages linked from that page (pages downloaded using jsoup). The graph generating method also has parameters "maxPages" and "maxDepth" to constrain the size of the graph. An inverted index containing the URLs, their content, and their indegrees is then created using the web graph. This index is then used to implement time-efficient search queries of the collected web pages.

Project link: https://git.ece.iastate.edu/ztj1/google-2

Breadth First Search
Java
Algorithms
Data Structures
Inverted Index
Web Scraping
Graphs