Here we'll continue our theme of "overcoming functional fixedness bias" by looking at Google "dorking" and robots.txt.
First I'd like to direct you to the robots.txt FAQ for a quick primer on what exactly this is. I'll also put a very, very brief explanation here for the more lazy folks. Back in the day a protocol called "The Robots Exclusion Protocol" was created to help search engines know which pages on a site to index or not. The way this works is that the web crawler bots will typically drop every directory past the main domain and instead looks for "/robots.txt", example:
www.mywebsite.com/post1/day/post.html becomes www.mywebsite.com/robots.txt
Now this sounds great except for 2 big problems; firstly crawlers are NOT required to abide by the file exclusions and secondly this can leak information about your site/setup since it's a publicly available file. Even Google will still occasionally crawl where it's not supposed to but will simply not provide a description in these cases, example:
WSJ disallows /user but searching for site:wsj.com/user/ still brings up results (just no description).
Looking for allowed user agents in robots.txt and then changing your own user agent can potentially give you access to new areas*. These problems are acknowledged by the robots site itself and is part of the OWASP webserver review.
*Thanks to MentalRental on Reddit for suggesting adding WSJ/User agent parts.
Here is wsj's robots.txt
And here is a page that we aren't supposed to be allowed to access.
Now there are a few ways to switch user agents: the first is by using "user agent switcher" extension for Chrome. Simply install and select a new agent from the drop-down.
The second way is to manually switch agents using the developer console tools. Keep in mind this will only work once and then you will need to re-apply it after loading a new page.
f12 to open console -> network tab -> more tools -> network conditions to unhide the feature.
Then uncheck "select automatically" and select your new agent.
That OWASP link has a pretty good run-through example and abiding by the 7th law of power (get others to do work for you) I think it's safe to move on to dorking.
Google "dorking" is just another way to say "abusing search operators to look for information that shouldn't be indexed". The basic idea here it to take commonly used wording/phrases from admin panels, default directory paths, admin url locations etc and plug them into Google to see what you get. There is a multitude of information you can gather using this method and it can even search for specific vulnerabilities, reveal passwords and other sensitive information, show verbose error/log messages, and give a foothold or show you the "door" to more sensitive areas of the website. Let's look at a few examples from exploitdb!
Ex 1: Footholds
Think of footholds as one step (likely one of the first) in your exploit chain - "pwning" a target is typically a series of successful attacks/exploits and every step can help open new attack vectors.
Let's start off with a generic search to look for file upload pages.
"You have selected the following files for upload (0 Files)."
Simple enough and it gets hits but this slightly refined query specifically looks for Liferay file upload pages.
You can use these dorks to look for things you're already familiar with or to discover completely new things to look for. Example: I found a website in our first query (no pics, not trying to name+shame) that uses a file upload service called "Xtra Upload v2" and decided to dork the corresponding url pieces a few times to see what I could find.
This only returned 2 results: the original page I found from the first dork and a second page. Why not try a few mutations to see what will happen?
inurl:"xu/index.php/home" inurl:"xu1/index.php/home" inurl:"xu3/index.php/home"
These returned nothing of interest but were worth checking (possible version naming scheme); now let's try going back a directory!
This returns 3 pages worth of results (some junk) and seems to also mention a "Cloud Share" service (different from the prior pages which used that Xtra v2 service). Both descriptions mention being built on CodeIgniter PHP framework.
Next I wanted to search the codeigniter forums for traces of xu2 just to be sure we're on the right track but only a few results showed up.
The main CodeIgnitier site has documentation for the new 3.x version and even the deprecated 2.x version. A quick google about xtra upload also revealed a few github forks to rummage through if we're so inclined.
So here's what we've gotten from our <5m dorking session:
Not bad for <5m and it's also a great, fairly benign example to show just how far down the "rabbit hole" you can go.
Ex 2: Vulnerability Searching
Dorking can also help you find vulnerable files and servers. Let's say we want to search for a specific known PHP authentication bypass vulnerability as shown in exploitdb by "Mr.tro0oqy".
allintext:Copyright Smart PHP Poll. All Rights Reserved. -exploit
Plenty of other examples like this - for instance this dork searches for MongoDB setups where some versions allow execution of unix commands (exploit outlined here).
allinurl:moadmin.php -google -github
This next dork looks at drupal instances but excludes patch version 7.32 (which was the latest patch at time of writing). This one likely needs a few tweaks since it uses a simple mechanism for exclusion (as of now this will likely show anything above or below 7.32).
inurl:CHANGELOG.txt intext:drupal intext:"SA-CORE" -intext:7.32 -site:github.com -site:drupal.org
If you're feeling frisky you could exclude more by adding in extra "-intext:X.xx" instances or even tell it to only match a certain one with intext:"X.xx".
inurl:CHANGELOG.txt intext:drupal intext:"SA-CORE" -intext:7.32 -intext7.31 -site:github.com -site:drupal.org inurl:CHANGELOG.txt intext:drupal intext:"SA-CORE" intext:"7.44" -site:github.com -site:drupal.org
This last example looks for implementations of OpenSSL that may be vulnerable to heartbleed attacks :)
"OpenSSL" AND "1.0.1 Server at" OR "1.0.1a Server at" OR "1.0.1b Server at" OR "1.0.1c Server at" OR "1.0.1d Server at" OR "1.0.1e Server at" OR "1.0.1f Server at"
Ex 3: Passwords
Yes, that's right ladies and gents, you can even find people's passwords with dorks. This beautiful little gem looks for database backup files that often contain username/password combos for wordpress.
You can even scrape uploaded text files to look for email username/password combinations.
site:static.ow.ly/docs/ intext:@gmail.com | Password
Scraping pastebin in similar fashion will almost always get a good number of hits.*
site:pastebin.com intext:@gmail.com | @yahoo.com | @hotmail.com daterange:2457388-2457491
*Note that for daterange you need to use JD (Julian Date)
This github scrape revealed quite a nice SCADA password list. Oh, the things you'll find!
site:github.com ext:csv userid | username | user -example password
You can even find Cisco VPN credential dumps! Quite the creative user/pass combo...
filetype:pcf "cisco" "GroupPwd"
There are plenty of other examples just on the exploitdb page alone. So, how would dorking help in a targeted attack? These fun little dorks rely mostly on misconfiguration and human error but there are a few ways it may help.
Now get to dorking, ya bunch of dorks!