If you're interviewing for a junior Java developer position you might be asked.
Why should you never store secrets or passwords in a String in Java?
This can be answered based on a few facts of Java. First of all, java.lang.String
is immutable. Therefore, once allocated, there is no way to change the value of a String. So this means if you ever update the value, simply an entire new String is created. The old string merely sits in memory with no JVM reference to it.
Lets start a look at this problem by investigating the class DumpMemory
(no points for what we'll do later). It stores a password that's meant to be entered and matched.
class DumpMemory {
private String line = "hi there";
public void setPassword(String value) {
managerPassword = value;
}
public boolean matches(String password) {
return managerPassword.equals(password);
}
}
Enter our bad actor. Somehow, they got access to the machine that it runs on. Lets just pretend that's not impressive already and instead highlight another security concern.
Our bad actor can now core dump out our running Java process
# gcore <pid>
From here it's trivial to get the Strings referenced by each class.
First, lets create a hprof file. This will transform the dump file created earlier into a filetype that Java specific tools are happy to read.
> jmap -dump:format=b,file=dump.hprof $JAVA_HOME/bin/java <core file>
We can take a look at this using the jhat
tool.
> jhat dump.hprof
Reading from dump.hprof...
Dump file created Wed Mar 29 11:32:54 AEDT 2017
Snapshot read, resolving...
Resolving 10814 objects...
Chasing references, expect 2 dots..
Eliminating duplicate references..
Snapshot resolved.
Started HTTP server on port 7000
Server is ready.
If we navigate to http://localhost:7000 we can see the package we're interested in.
After selecting DumpMemory
we then select the a list of references to our object.
then we can select the DumpMemory
reference.
Suddenly, plain as day, we can see our String just sitting there. This is entirely expected and understandable. The String exists as a field on an instance of a class, and thus has to stay around while the instance is in scope. Given the JRE isn't going to encrypt Strings at rest, we have to expect it to be in our heap dump.
Though couldn't we just lower the scope of the String as much as possible and wait for GC to come along and clean up our secret?
Lets do so in the following example:
public class DumpMemory {
String line = "";
public void run() {
line = new Scanner(System.in).nextLine();
System.out.println("You entered: " + line);
}
public static void doSomething() {
new DumpMemory().run();
}
public static void main(String[] args) {
doSomething();
new Scanner(System.in).next();
}
}
Again, a somewhat contrived example. We get a String from the user. Lets pretend this is a secret String. We proceed to securely print it, and then doSomething()
completes. This means that the instance that was created in that method is now up for garbage collection, as it is no longer in scope.
In the example, once doSomething()
has completed we wait for user input so we can get a chance to dump the process.
Once gcore
has completed, we can't use the standard Java core analysers. That's because they'll ignore parts of the dump that aren't part of a valid stack/heap. Instead, we can take a look at the core file in a hex viewer.
When running this, I entered the String beagle3
as the input. Now analysing the core dump, you might expect that since the object has gone out of scope you wouldn't see it any more.
Hmm. Even though the String is firmly out of scope at the point that we took the dump, look what we actually see. The reasoning behind this is the fact that going out of scope is not enough for the JVM to get rid of the underlying data.
What about after a GC?
Sometimes you'll read an answer about the fact that if you use a String
rather than a byte[]
then you'll have to wait for garbage collection to come along before that String will no longer exist in memory.
Lets investigate that claim to find if that's always true.
This time we'll start Java with a slightly custom flag. java -verbose:gc dva.DumpMemory
. This will allow us to see when gc has taken place so that we can know when to take our dump.
Here's the output of a run of the same program
# java -verbose:gc dva.DumpMemory
beagle3
You entered: beagle3
[GC (JvmtiEnv ForceGarbageCollection) 13107K->1544K(251392K), 0.0033103 secs]
[Full GC (JvmtiEnv ForceGarbageCollection) 1544K->1399K(143360K), 0.0099899 secs]
What I did was first enter the hyper-secure password, and then using JProfiler, I forced a full GC of the program. You can see the effects in the JProfiler memory view.
Taking the dump once again. I opened it in a hex editor.
This dump was performed after the GC, but the String is still sitting in memory. The reason this is happening is due to the fact that, when garbage collecting, the JVM does not bother to zero out memory (for good reason). Instead it simply discards GC'able objects from its internal memory model and marks the memory as available for future object storage.
Will it ever go away?
Yes!
Lets make an example that does just that. In the previous example we had a single GC, lets change it so we're having lots of GC.
public class DumpMemory {
String line = "";
public void run() {
line = new Scanner(System.in).nextLine();
System.out.println("You entered: " + line);
}
public static void doSomething() {
new DumpMemory().run();
}
public static void main(String[] args) {
doSomething();
while(true) {
new Date().toString();
}
}
}
As we run this one we're going to see GC occurring consistently without any JProfiler help
# java -verbose:gc dva.DumpMemory
beagle3
You entered: beagle3
[GC (Allocation Failure) 65536K->840K(251392K), 0.0010497 secs]
[GC (Allocation Failure) 66376K->848K(251392K), 0.0010727 secs]
[GC (Allocation Failure) 66384K->896K(251392K), 0.0006899 secs]
[GC (Allocation Failure) 66432K->872K(316928K), 0.0008372 secs]
[GC (Allocation Failure) 131944K->880K(316928K), 0.0010596 secs]
[GC (Allocation Failure) 131952K->856K(438272K), 0.0009543 secs]
[GC (Allocation Failure) 263000K->840K(438272K), 0.0014307 secs]
[GC (Allocation Failure) 262984K->816K(427520K), 0.0004112 secs]
[GC (Allocation Failure) 252720K->816K(418816K), 0.0003833 secs]
[GC (Allocation Failure) 242992K->816K(408576K), 0.0003985 secs]
[GC (Allocation Failure) 233776K->816K(400896K), 0.0004971 secs]
[GC (Allocation Failure) 225072K->816K(391680K), 0.0004864 secs]
[GC (Allocation Failure) 216880K->816K(384512K), 0.0004790 secs]
[GC (Allocation Failure) 208688K->816K(375808K), 0.0004398 secs]
[GC (Allocation Failure) 201008K->816K(369664K), 0.0004333 secs]
[GC (Allocation Failure) 193840K->816K(361472K), 0.0003493 secs]
[GC (Allocation Failure) 186672K->816K(355840K), 0.0004877 secs]
Now if we take a dump here, we're going to (almost definitely) be unable to find the string beagle3
. This is because as the String containing beagle3
went out of scope, the memory it existed in was available to be used at some point in the future. As we continually allocate objects, eventually the place where beagle3
sat in memory is overwritten with a new object that was allocated when creating a Date
.
This means that eventually our mistake of having a secret as a String is occluded, but if your application only performs GC every now and then, you are liable for having that secret around for much longer than you wanted in your lexical scope.
What should I do instead?
Use byte[]
. Byte arrays are the underlying way that Strings are stored in Java anyway. You should be able to perform the same kind of changes on a byte array that you would otherwise perform on a String. You'd simply change
public boolean checkSecret(String otherText) {
// assuming 'secret' is a String
return secret.equals(otherText);
}
to
public boolean checkSecret(byte[] otherText) {
return Arrays.equal(secret, otherText);
}
The great thing that a byte array gives you is the chance to zero out your array as soon as you've finished using the secret. This means you have explicit controls over the lifetime of your secret.
// Secret starts existing in memory
byte[] superSecret = passwordManager.loadPassword();
boolean eq = Arrays.equal(superSecret, userInput);
// Secret is purged from memory
Arrays.fill(superSecret, (byte) 0);
return eq
Unlike altering a String, when you alter a byte array you are actively changing the underlying data in the heap. Therefore you don't run into the same issue where your data will stay long after the secret has gone out of scope. Even if the memory representing superSecret
does stay around for a while, it is only filled with zeros in our example and thus has hidden our secret from prying eyes (unless your secret was actually all zeros).
So, if you ever are dealing with Strings that should be very private you should follow these rules
- use
byte[]
in every place that you wanted to use aString
- Only load the
byte[]
data right before you need it - As soon as the critical work is done, zero out your byte array
Using this will give you the smallest window of time that your secret is sitting in memory, waiting for someone to discover it.
Lets prove this
Taking the following program
package dva;
import java.security.SecureRandom;
import java.util.Arrays;
import java.util.Random;
import java.util.Scanner;
public class DumpByteMemory {
// This is an unrealistic random, but it allows us to
// look for what could be considered a random value
Random rand = new Random(1);
byte[] line = new byte[10];
public void run() throws Exception {
System.out.println("We're about to generate our *secure key*");
rand.nextBytes(line);
System.out.println("Secret is stored, look for it");
new Scanner(System.in).next();
Arrays.fill(line, (byte)0);
}
public static void doSomething() throws Exception {
DumpByteMemory f = new DumpByteMemory();
f.run();
}
public static void main(String[] args) throws Exception {
doSomething();
System.out.println("Out of scope");
new Scanner(System.in).next();
}
}
Here we're pretending a password is being generated, though we're setting a seed so that we know what to look for in the dump. When setting the seed to 1, the example (on my JRE) will produce 73 D5 1A BB D8 9C B8 19 6F 0E
for the 10 byte array.
So lets take two dumps:
- One after the message
Secret is stored, look for it
appears - One once the message
Out of scope
appears
Here we see the results of that. At first we see this stream of data in our dump. Then at the second point we see that same memory has been completely zeroed out.
This shows the security that you can get by simply changing the way that you handle your keys. Instead of hoping that GC comes along and eventually overwrites your object you can make sure that you keep your secret around for as short a time as possible.