**Published:** 2025-12-02 Room Link: https://tryhackme.com/room/googledorking ## Table of Contents - [[#Notes|Notes]] - [[#Tasks|Tasks]] - [[#Tasks#Task 1: Ye Ol' Search Engine|Task 1: Ye Ol' Search Engine]] - [[#Tasks#Task 2: Let's Learn About Crawlers|Task 2: Let's Learn About Crawlers]] - [[#Tasks#Task 3: Enter: Search Engine Optimisation|Task 3: Enter: Search Engine Optimisation]] - [[#Tasks#Task 4: Beepboop - Robots.txt|Task 4: Beepboop - Robots.txt]] - [[#Tasks#Task 5: Sitemaps|Task 5: Sitemaps]] - [[#Tasks#Task 6: What is Google Dorking?|Task 6: What is Google Dorking?]] --- ## Notes `Crawlers` - Search engines use crawlers on websites - Grabs contents in dictionary format - Stores and indexes this dictionary info - If links exist on website, crawler will index that site as well `Search Engine Optimisation` - How responsive website is - how easy it is to crawl - Keywords in website `Robots.txt` - The file `Robots.txt` is the first thing that is indexed by crawlers. - Must be served at the root directory. - Determines the permissions the crawler has for the page. - Can you use regexing to allow/disallow a number of things at once. | Keyword | Function | | ---------- | ----------------------------------------------------------------------------------------------------------------------------------- | | User-agent | Specify the type of "Crawler" that can index your site (the asterisk being a wildcard, allowing **all "User-agents"** | | Allow | Specify the directories or file(s) that the "Crawler" **can** index | | Disallow | Specify the directories or file(s) that the "Crawler" **cannot** index | | Sitemap | Provide a reference to where the sitemap is located (improves SEO as previously discussed, we'll come to sitemaps in the next task) | --- ## Tasks ### Task 1: Ye Ol' Search Engine No answer needed. ### Task 2: Let's Learn About Crawlers **Name the key term of what a "Crawler" is used to do. This is known as a collection of resources and their locations** - Index **What is the name of the technique that "Search Engines" use to retrieve this information about websites?** - Crawling **What is an example of the type of contents that could be gathered from a website?** - Keywords ### Task 3: Enter: Search Engine Optimisation **Use the same [SEO checkup tool](https://web.dev/measure/) and other online alternatives to see how their results compare for [https://tryhackme.com](https://tryhackme.com/) and [http://googledorking.cmnatic.co.uk](http://googledorking.cmnatic.co.uk/)** - No answer needed ### Task 4: Beepboop - Robots.txt **Where would "robots.txt" be located on the domain "ablog.com"** - `ablog.com/robots.txt` **If a website was to have a sitemap, where would that be located?** - `/sitemap.xml` **How would we only allow "Bingbot" to index the website?** - `User-agent: Bingbot` **How would we prevent a "Crawler" from indexing the directory "/dont-index-me/"?** - `Disallow: /dont-index-me/` **What is the extension of a Unix/Linux system configuration file that we might want to hide from "Crawlers"?** - `.conf` ### Task 5: Sitemaps **What is the typical file structure of a "Sitemap"?** - `XML` **What real life example can "Sitemaps" be compared to?** - `Map` **Name the keyword for the path taken for content on a website** - `Route` ### Task 6: What is Google Dorking? **What would be the format used to query the site bbc.co.uk about flood defences** - `site: bbc.co.uk flood defences` **What term would you use to search by file type?** - `filetype:` **What term can we use to look for login pages?** - `intitle: login`