How to Perform a Safe Password Analysis

It’s one of the most exciting moments in a security researcher’s work: while looking through an obscure log file, you see strings like “James1984” and “SecureMe!” scattered throughout the data. Upon closer inspection, you realize that you’ve uncovered hundreds if not thousands of cleartext username/password pairs!
Even as you celebrate your success, you are also tempted to use your victory to push for additional security reforms, such as a stronger password policy, or publish your results to educate other security professionals. But how, exactly, would you go about conducting and publishing a password analysis without exposing the company to harm, from insider threats or otherwise?

Step 1: Develop a Remediation Plan and Get It Approved

With a “minimize risk to company” hat affixed firmly to your head, the first thing you should be concerned about is removing passwords from the place they were detected, or at least restricting access to that area. Only after you have achieved this goal should you be concerned about creating a report regarding password quality.

Plan to Remove the Password Data

To remove passwords, you may end up altering files or database records, restricting access to them, or destroying them outright. Add whatever you think is the right way to mitigate this risk, and perhaps a second mitigation method, to your plan.

Plan to Stop the Application from Writing New Password Data

The application that caused the problem may still be writing usernames and passwords out to the location you found. Filing a high-priority security defect with the application developers to have this information redacted in logs should be another part of your plan.

Plan for the Time Gap Between Now and Deployment of Fix

You will also have to worry about the time between when you manually remove current password data and the time when your application developers deliver a fix to their application. To address this gap, add a recommendation to turn off logging, retune logging, set up access controls, or automate the remediation steps you performed during the initial cleanup to your plan.

Plan to Analyze the Passwords—Safely and Somewhere Else

Finally, if you are going to perform a password analysis on your findings, you will need a secure copy of the original data and a secure workspace to develop your reports and intermediate artifacts. Add a section to your plan that states you will make an encrypted copy of the original file, perform analysis in a secure environment, publish a report that describes the quality of the passwords in use without revealing any information about any particular users, and destroy your original work. (We will flesh out each of these elements below.)
Also, if you and your company are familiar with “chain of custody” (CoC) procedures regarding sensitive data or security findings, you may also want to build those steps into your plan. (CoC procedures go above and beyond the steps listed here; you would add your own CoC procedures to the steps listed in this article.)

Get Your Plan Approved

Now, with your four-part plan (remove passwords, submit a bug, remediate until fixed, and analyze securely) under your arm, approach the sponsor of your security work and get his or her approval to proceed with all parts.

If You Do Not Have Approval, Do Not Proceed

At this point it is quite likely that your sponsor will allow you to proceed with certain parts of your plan, but not the analysis. If this is the case, you may continue to lobby for inclusion, but please do not proceed down the path of using company data for an unapproved password analysis and absolutely never retain a “personal” copy of the original data for your own purposes.

Step 2: Preparing Your Password Analysis Lab

Make a Strongly Encrypted Copy of the Source Data

If you have permission to proceed, start by securing a copy of the original data in an encrypted file with an appropriate name. OpenPGP is a good choice for encrypting the data until you have your lab ready. Appropriate names are those that link the file to you, your project codename or the date, but don’t reveal the contents. For example, “jsmith_20131227.pgp” and “bluegoat_01.pgp” would be appropriate names, but “everyones_passwords.pgp” or “securityscan_findings.pgp” would not be appropriate.
Once you have a strongly encrypted copy of the original data, you may proceed with your mitigation plan to remove the original copies of the data. (Remember, mitigation should take a higher priority than password analysis.)

Set Up an Encrypted Folder for Your Workspace

Next, set up a folder that uses automatic encryption through the operating system or another piece of software. For example, on Microsoft operating systems, EFS is a good choice. This folder should be completely empty when you begin because you will delete it and all of its contents at the end of your analysis.

Unpack the Source Data into Your Encrypted Folder

Move (i.e., copy and then delete the original) your original encrypted file into your encrypted folder and then unpack its contents there.

Step 3: Strip the Data Down to Usernames and Passwords

Using scripts or an automated text processor, strip your original files down to just username/password combinations. Note that this step can be time-intensive, particularly if you need to obtain programming services from another part of the company.

Increase the Density of Your Password Data

It is easy to use Windows “findstr,” Unix “grep,” or command-line database commands to locate and filter interesting lines that may contain passwords from original data. Performing this initial filter yields “dense” files (i.e., more password data than before) that make the next step more accurate and efficient.
For example, to quickly locate lines in a file that might contain passwords by filtering for the word “pass” you could use the following grep or findstr commands.
grep ‘pass’ *.logs > possible-passwords.txt
findstr /C:"pass" *.logs > possible-passwords.txt
Or, you could use the following SQL Server command to pull possible passwords out of a table.
sqlcmd /Q "SELECT Notes FROM Users WHERE Notes LIKE ‘%pass%”"

Parse Your Password Data

Once you have some dense password files, variations on the “split()” function will help you parse the data. For example, imagine a web log with entries such as:
2013-02-18 12:23:34 POST /ChangePassword.php User=Joe&OldPass=FFrr44&NewPass=GGrrtt1&NewPass2=GGrrtt1 200
First, you would use a split() command to grab the sixth element. Then you would use a second split() command to grab key/value pairs such as “OldPass=FFrr44”, and a third split() command to break each key/value pair into a key (such as “OldPass”) and a value (such as “FFrr44”).
In this instance, your parsing code might look something like this (in C#):
// Reads in a web log file full of username/password combinations
// Writes unique username/password combinations into a hashtable
while ((line = fileIn.ReadLine()) != null)
  Username = "";
  Password = "";
  ParsedLine = line.Split(' ');         // Split on each space
  sQueryString = aParsedLine[5];        // Get the 6th element
  ParsedArgs = sQueryString.Split('&'); // Split on each ampersand
  foreach (string OneArg in ParsedArgs) // For each element...
    UsrPwd = OneArg.Split('=');    // Split on the equal sign
    // Expect Username to come before password on a line
    // Usernames are case-insensitive, so we'll lower-case them all
    if (UsrPwd[0].ToLower().StartsWith("username"))
      Username = HttpUtility.UrlDecode(UsrPwd[1].ToLower());
    // If we already found the username, we can look for passwords
    // Passwords are NOT case-sensitive, so no upper- or lower-casing
    // Note that we cannot just use the username as the key if we are
    // expecting multiple passwords in a line (e.g., oldpass, newpass)
    if ((Username.Length > 0) && UsrPwd[0].ToLower().Contains("pass"))
      Password = HttpUtility.UrlDecode(UsrPwd[1]);
      HashKey = Username + "||" + Password;
      if (!hashtable.ContainsKey(HashKey))
        hashtable[HashKey] = Password;
// Now write out the hashtable, one line per username/password combo
foreach (DictionaryEntry entry in hashtable)
    Username = entry.Key.ToString().Replace("||" + entry.Value, "");
    Password = entry.Value.ToString();
    fileOut.WriteLine(Username + "\t" + Password);

Step 4: Analyze Your Username and Password Data

Now that you have a file that contains nothing but usernames and passwords, you are ready to analyze your data. You can conduct any number of experiments on your data, but producing some basic length, complexity, and predictability (or “guessability”) statistics is a great place to begin.
I recommend using a two-step approach to your analysis. Step one is to go line-by-line through your password file and calculate statistics for each line, writing a new password statistics line to a new CSV file as you go. Step two is to pull your password statistics CSV file into your favorite spreadsheet and run your final analysis against the individual statistics.

Calculating Password Length

To calculate password length, simply read in each password and write out its length. Most programming languages include a “len” or “length” method or property on strings; use it.

Calculating Password Complexity

Regular expressions (“RegEx”) are the right tool to use when checking passwords for complexity. A simple test looking for upper-case letters, lower-case letters, numbers and special characters can be used with calls to a single RegEx-powered function.
For example (in C#):
iLength   = Password.Length;
iUppers   = CountInstances(Password, "A-Z");
iLowers   = CountInstances(Password, "a-z");
iNumbers  = CountInstances(Password, "0-9");
iSpecials = iLength - iUppers - iLowers - iNumbers;
CountInstances(ToSearch, ToFind) {
  return Regex.Matches(ToSearch, "[" + ToFind + "]").Count;

Calculating Similarity

To calculate if any two items are similar to each other, you will probably need to build a function. However, the time to build such a function is well worth it, since it will allow you to detect similarity in strings like “JohnSmith” and “j.smith45”.
The following function returns “true” if any set of characters iWindow characters long matches between the two phrases. If returns “false” if iWindow is shorter than either of the two phrases or if no match is discovered.
IsPhraseSimilarToPhrase(sPhrase1, sPhrase2, iWindow) {
   if(sPhrase1.Length >= iWindow AND  sPhrase2.Length >= iWindow) {
      for (i=0; i<sPhrase1.Length - iWindow; i++) {
         sCheck1 = LowerCase(Substring(sPhrase1, i, iWindow))
         for (j=0; j<sPhrase2.Length - iWindow; j++) {
            sCheck2 = LowerCase(Substring(sPhrase2, j, iWindow))
            if (sCheck1 == sCheck2) {
               return true
   return false
Use this function or one like it to conduct the rest of your statistical calculations.

Calculating Similarity Between Username and Password

To calculate whether a username is similar to a password, simply feed both into your similarity function. For example:
If IsPhraseSimilarToPhrase(Username, Password, 4)...

Calculating Similarity Between Password and Current Year

To calculate whether a password is similar to the current year, simply feed the year (or year part) into your similarity function. For example:
If IsPhraseSimilarToPhrase(Username, “2013”, 4)...

Calculating Similarity Between Password and an Initial Password

To calculate whether a password is similar to an initial static password, simply feed the initial password (or common piece of initial password) into your similarity function. For example:
If IsPhraseSimilarToPhrase(Username, “Starter”, 4)...

Calculating Similarity Between Password and the Word “Pass”)

To calculate whether a password is similar to the word “password” (or just “pass”), simply feed the phrase “pass” into your similarity function. For example:
If IsPhraseSimilarToPhrase(Username, “pass”, 4)...
You may want to run it again with the shorter phrase “pwd” as well.
(or) If IsPhraseSimilarToPhrase(Username, “pwd”, 3)...

Calculating Similarity Between Password and Dictionary Words

Finally—a challenge! Before we can perform this analysis, we need a dictionary full of words to test. There are many dictionaries available for free from the Internet, but many need to be pre-processed to strip out comments and extra columns before we can use them.
If you are running your analysis on Windows, an incredibly useful test of tools to download now are the “GnuWin32” tools, especially “wget” to download pages from the Internet and “grep” to parse pages downloading from the Internet. These tools can be combined in a short batch file to download and prepare a batch file for our use.
REM Pull a free dictionary file off the Internet
wget -nd
REM Now strip off everything but the first word in the file
grep -Eo "^[^ ]+" OWL.txt > dictionary-en.txt
Or, on Linux:
wget -nd
grep -Eo '^[^ ]+' OWL.txt > dictionary-en.txt
Now, we can open up the resulting dictionary file and check to see if each password contains a dictionary word. Remember to perform your “contains” comparison while ignoring case sensitivity. For better performance, you may also want to read the entire password file into memory first (most computers can spare the room these days) and reuse it for each password entry.
You will probably also want to ignore any dictionary words shorter than three or four letters (I ignore anything shorter than four letters), which you can do in code (e.g., “if DictWord.Length > 3”) or by erasing the top entries from your dictionary if they are arranged from shortest to longest word.
For example (in C#):
// Read the dictionary into memory (hashtable called “hashWords”)
while ((line = fileIn.ReadLine()) != null) {
  sOneWord = line.Trim().ToLower();
  if (sOneWord.Length >= 4)  {
     hashWords[sOneWord] = sOneWord;
// Call a function to see if a password contains a dictionary word
if PhaseContainsWordsFromList(Password, hashWords)...
// Use this function to check password against a list of words
PhraseContainsWordFromList(sPhrase, htPhrases) {
  bMatched = false;
  foreach (DictionaryEntry entry in htPhrases) {
    if (sPhrase.Contains(entry.Key.ToString())) {
      bMatched = true; break;
  return bMatched;
Note that simply flagging a password as bad because it contains a dictionary word is not a good idea in all cases. If the password is long enough to contain multiple (>2) dictionary words and a mix of upper-case and lower-case letters, it may still be a strong password. However, if a password only contains a single dictionary word and is as short as it could be, it probably is not a strong password.

Calculating Similarity Between Password and Keyboard Phrases

Unfortunately there do not appear to be readily accessible lists of keyboard phrases on the Internet. (I hope I’m wrong – please let me know otherwise in the comments below.) With that in mind, you may need to write your own list of keyboard phrases for this test. A few examples of the types of sequences that should be in that file are listed below. (Take a look at your keyboard while you’re typing these you don’t understand where these are coming from.)
Once your file is ready, use code similar to that you used to discover password discoverability to compare each password against an entry in the list. Do not worry about multiple uses of keyboard phrases; a single use of any of these common phrases should be enough to flag a password as weak.

Performing Statistical Analysis

Using the spreadsheet of your choice, load up your line-by-line statistics files and use the spreadsheet’s “MIN”, “MAX”, “MODE”, “AVERAGE” and “COUNTIF” functions to calculate:
  • Minimum, maximum and average password length
  • Most common password length
  • % of passwords containing upper-case letters, lower-case letters, numbers and special characters
  • Average number of upper-case letters, lower-case letters, numbers and special characters in each password
  • % of passwords similar to their usernames
  • % of passwords containing this year
  • % of passwords containing the phrase “pass”
  • % of passwords containing a dictionary word
  • % of passwords containing a keyboard phrase
  • Optional: % of passwords similar to the initial static password

Step 5: Publish the Report and Destroy Your Lab

Please do not forget to perform this step, and perform it completely.
When you are ready to publish your results, perform a final check on your report to make sure it doesn’t contain any personally identifiable information or specific usernames. When you are ready, move the final copy of your report out of your encrypted folder into a permanent location in your company’s file store.
Then, delete the entire lab folder. Make sure that it has really been deleted. For example, on a Windows operating system, make sure that it is not just sitting in the Recycle Bin.

Step 6: Sharing Your Results with the Security Community

If you want to share your analysis with the greater security community, please be prepared to wait a while, and to release your results “in waves.”
First, you will absolutely want to wait until all remediation is complete. In some cases this will mean publishing the fix to the affected application. In other cases this will require you to wait until end users have been forced to change their passwords after the fix enters production.
Second, you will want to wait until the political ramifications of the breach have shaken out. (This can vary widely by company.)
When (or if) you can safely satisfy both criteria, only then should you approach your sponsor about sharing your analysis with other security experts. Ideally you would tie your release in with a local security event that will bring prestige to your company as an industry leader, rather than in an online forum (which could be seen as a knock on its operations). Make sure specific company and application information is scrubbed from your report, although you may want to retain information such as the size of the user base and the application’s regulatory exposure (e.g., subject to HIPAA, PCI-DSS, SOX, etc.).
Once you have permission to proceed, go ahead and release your findings. Expect that a copy of your findings will be published on the Internet, so plan to use the web site resources of the organization you released the results through to publish your findings. Among other things, publishing through a security web site also gives you cover (e.g., “see, other security experts thought it was okay. too”) in case your sponsor changes his or her mind about publishing your results later.

Post a Comment