**Published:** 2025-12-02
Room Link: https://tryhackme.com/room/googledorking
## Table of Contents
- [[#Notes|Notes]]
- [[#Tasks|Tasks]]
- [[#Tasks#Task 1: Ye Ol' Search Engine|Task 1: Ye Ol' Search Engine]]
- [[#Tasks#Task 2: Let's Learn About Crawlers|Task 2: Let's Learn About Crawlers]]
- [[#Tasks#Task 3: Enter: Search Engine Optimisation|Task 3: Enter: Search Engine Optimisation]]
- [[#Tasks#Task 4: Beepboop - Robots.txt|Task 4: Beepboop - Robots.txt]]
- [[#Tasks#Task 5: Sitemaps|Task 5: Sitemaps]]
- [[#Tasks#Task 6: What is Google Dorking?|Task 6: What is Google Dorking?]]
---
## Notes
`Crawlers`
- Search engines use crawlers on websites
- Grabs contents in dictionary format
- Stores and indexes this dictionary info
- If links exist on website, crawler will index that site as well
`Search Engine Optimisation`
- How responsive website is
- how easy it is to crawl
- Keywords in website
`Robots.txt`
- The file `Robots.txt` is the first thing that is indexed by crawlers.
- Must be served at the root directory.
- Determines the permissions the crawler has for the page.
- Can you use regexing to allow/disallow a number of things at once.
| Keyword | Function |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| User-agent | Specify the type of "Crawler" that can index your site (the asterisk being a wildcard, allowing **all "User-agents"** |
| Allow | Specify the directories or file(s) that the "Crawler" **can** index |
| Disallow | Specify the directories or file(s) that the "Crawler" **cannot** index |
| Sitemap | Provide a reference to where the sitemap is located (improves SEO as previously discussed, we'll come to sitemaps in the next task) |
---
## Tasks
### Task 1: Ye Ol' Search Engine
No answer needed.
### Task 2: Let's Learn About Crawlers
**Name the key term of what a "Crawler" is used to do. This is known as a collection of resources and their locations**
- Index
**What is the name of the technique that "Search Engines" use to retrieve this information about websites?**
- Crawling
**What is an example of the type of contents that could be gathered from a website?**
- Keywords
### Task 3: Enter: Search Engine Optimisation
**Use the same [SEO checkup tool](https://web.dev/measure/) and other online alternatives to see how their results compare for [https://tryhackme.com](https://tryhackme.com/) and [http://googledorking.cmnatic.co.uk](http://googledorking.cmnatic.co.uk/)**
- No answer needed
### Task 4: Beepboop - Robots.txt
**Where would "robots.txt" be located on the domain "ablog.com"**
- `ablog.com/robots.txt`
**If a website was to have a sitemap, where would that be located?**
- `/sitemap.xml`
**How would we only allow "Bingbot" to index the website?**
- `User-agent: Bingbot`
**How would we prevent a "Crawler" from indexing the directory "/dont-index-me/"?**
- `Disallow: /dont-index-me/`
**What is the extension of a Unix/Linux system configuration file that we might want to hide from "Crawlers"?**
- `.conf`
### Task 5: Sitemaps
**What is the typical file structure of a "Sitemap"?**
- `XML`
**What real life example can "Sitemaps" be compared to?**
- `Map`
**Name the keyword for the path taken for content on a website**
- `Route`
### Task 6: What is Google Dorking?
**What would be the format used to query the site bbc.co.uk about flood defences**
- `site: bbc.co.uk flood defences`
**What term would you use to search by file type?**
- `filetype:`
**What term can we use to look for login pages?**
- `intitle: login`