Thursday, January 21, 2016

Truncate vs Delete in Oracle

1. truncate is fast delete is slow.

2. truncate doesn't do logging delete logs on per row basis.

3. rollback is possible with delete not with truncate until specifically supported by vendor.

4. truncate doesn't fire trigger, delete does.

5. Don't delete, truncate it when it comes to purge tables.

6. truncate reset identity column in table if any, delete doesn't.

7. truncate is DDL while delete is DML (use this when you are writing exam)

8. truncate doesn't support where clause, delete does.

Happy Learning..

Difference between View vs Materialized View in database

Based upon on our understanding of View and Materialized View, Let’s see, some short difference between them :

1) First difference between View and materialized view is that, In Views query result is not stored in the disk or database but Materialized view allow to store query result in disk or table.

2) Another difference between View vs materialized view is that, when we create view using any table, rowid of view is same as original table but in case of Materialized view rowid is different.

3) One more difference between View and materialized view in database is that, In case of View we always get latest data but in case of Materialized view we need to refresh the view for getting latest data.

4) Performance of View is less than Materialized view.

5) This is continuation of first difference between View and Materialized View, In case of view its only the logical view of table no separate copy of table but in case of Materialized view we get physically separate copy of table

6) Last difference between View vs Materialized View is that, In case of Materialized view we need extra trigger or some automatic method so that we can keep MV refreshed, this is not required for views in database.

When to Use View vs Materialized View in SQL

Mostly in application we use views because they are more feasible, only logical representation of table data no extra space needed. We easily get replica of data and we can perform our operation on that data without affecting actual table data but when we see performance which is crucial for large application they use materialized view where Query Response time matters so Materialized views are used mostly with data ware housing or business intelligence application.

That’s all on difference between View and materialized View in database or SQL. I suggest always prepare this question in good detail and if you can get some hands on practice like creating Views, getting data from Views then try that as well.

Tuesday, January 19, 2016

Trigger

A trigger is a program in a database that gets called each time a row in a table is INSERTED, UPDATED, or DELETED. Triggers allow you to check that any changes are correct, or to fill in missing information before it is COMMITed. Triggers are normally written in PL/SQL or Java.

[edit]Examples

Audit logging:

CREATE TABLE t1 (c1 NUMBER);
CREATE TABLE audit_log(stamp TIMESTAMP, usr VARCHAR2(30), new_val NUMBER);

CREATE TRIGGER t1_trig
  AFTER INSERT ON t1 FOR EACH ROW
BEGIN
  INSERT INTO audit_log VALUES (SYSTIMESTAMP, USER, :NEW.c1);
END;
/

Prevent certain DML operations:

CREATE OR REPLACE TRIGGER t1_trig
  BEFORE INSERT OR UPDATE OR DELETE
  ON t1
BEGIN
  raise_application_error(-20001,'Inserting and updating are not allowed!');
END;
/


Function







A function is a block of PL/SQL code named and stored within the database. A function always returns a single value to its caller.



Contents


1 Creating and dropping functions
2 Calling functions
3 Examples
4 Also see




[edit]Creating and dropping functions

Create a function:
CREATE OR REPLACE FUNCTION mult(n1 NUMBER, n2 NUMBER) RETURN NUMBER
AS
BEGIN
  RETURN n1 * n2;
END;
/


Remove the function from the database:
DROP FUNCTION mult;


[edit]Calling functions

Call the above function from SQL:
SQL>  SELECT mult(10, 2) FROM dual;
MULT(10,2)
----------
        20


Call the above function from SQL*Plus:
SQL> VARIABLE val NUMBER
SQL> EXEC :val := mult(10, 3);
PL/SQL procedure successfully completed.
SQL> PRINT :val
       VAL
----------
        30


Calling the function from PL/SQL:
DECLARE
  v_val NUMBER;
BEGIN
  v_val := mult(10, 4);
  Dbms_output.Put_Line('Value is: '|| v_val);
END;
/


[edit]Examples

Simple lookup function (lookup an employee's salary):
CREATE OR REPLACE FUNCTION get_salary (p_empno NUMBER)
   RETURN NUMBER
AS
  v_sal emp.sal%TYPE;
BEGIN
  SELECT sal INTO v_sal FROM emp WHERE empno = p_empno;
  RETURN v_sal;
END;
/


Package







A package is a collection of procedures and functions stored within the database.

A package usually has a specification and a body stored separately in the database. The specification is the interface to the application and declares types, variables, exceptions, cursors and subprograms. The body implements the specification.

When a procedure or function within the package is referenced, the whole package gets loaded into memory. So when you reference another procedure or function within the package, it is already in memory.

[edit]Example
CREATE OR REPLACE PACKAGE my_pack AS
  g_visible_variable VARCHAR2(20);
  FUNCTION calc(n1 NUMBER, n2 NUMBER) RETURN NUMBER;
END;
/

CREATE OR REPLACE PACKAGE BODY my_pack AS

  g_hidden_variable CONSTANT INTEGER := 2;

  FUNCTION calc(n1 NUMBER, n2 NUMBER) RETURN NUMBER AS
  BEGIN
    RETURN g_hidden_variable * n1 * n2;
  END;

END;
/

Monday, January 18, 2016

DBMS SCHEDULER

DBMS_SCHEDULER is a more sophisticated job scheduler introduced in Oracle 10g. The older job scheduler, DBMS_JOB, is still available, is easier to use in simple cases and fit some needs that DBMS_SCHEDULER does not satisfy.

[edit]Create a job

BEGIN
  DBMS_SCHEDULER.CREATE_JOB (
     job_name           => 'my_java_job',
     job_type           => 'EXECUTABLE',
     job_action         => '/usr/bin/java myClass',
     repeat_interval    => 'FREQ=MINUTELY',
     enabled            => TRUE
  );
END;
/

Unlike DBMS_JOB you do not need to commit the job creation for it to be taken into account. As a corollary, if you want to cancel it, you have to remove or disable it (see below).

[edit]Remove a job

EXEC DBMS_SCHEDULER.DROP_JOB('my_java_job');

[edit]Run a job now

To force immediate job execution:

EXEC dbms_scheduler.run_job('myjob');

[edit]Change job attributes

Examples:

EXEC DBMS_SCHEDULER.SET_ATTRIBUTE('WEEKNIGHT_WINDOW', 'duration', '+000 06:00:00');

BEGIN 
  DBMS_SCHEDULER.SET_ATTRIBUTE
     ('WEEKNIGHT_WINDOW', 'repeat_interval', 
      'freq=daily;byday=MON, TUE, WED, THU, FRI;byhour=0;byminute=0;bysecond=0');
END;

[edit]Enable / Disable a job

BEGIN 
  DBMS_SCHEDULER.ENABLE('myjob');
END;

BEGIN 
  DBMS_SCHEDULER.DISABLE('myjob');
END;

[edit]Monitoring jobs

SELECT * FROM dba_scheduler_jobs WHERE job_name = 'MY_JAVA_JOB';
SELECT * FROM dba_scheduler_job_log WHERE job_name = 'MY_JAVA_JOB';

or checking from JOB owner schema

 SELECT * FROM user_scheduler_jobs WHERE job_name = 'MY_JAVA_JOB';
 SELECT * FROM user_scheduler_job_log WHERE job_name = 'MY_JAVA_JOB';

Use user_scheduler_jobs and user_scheduler_job_log to only see jobs that belong to your user (current schema).

Database Transaction Isolation Levels

The ANSI/ISO SQL standard (SQL92) defines four levels of transaction isolation with differing degrees of impact on transaction processing throughput. These isolation levels are defined in terms of three phenomena that must be prevented between concurrently executing transactions.
The three preventable phenomena are:

Dirty reads: A transaction reads data that has been written by another transaction that has not been committed yet.
Nonrepeatable (fuzzy) reads: A transaction rereads data it has previously read and finds that another committed transaction has modified or deleted the data.
Phantom reads (or phantoms): A transaction re-runs a query returning a set of rows that satisfies a search condition and finds that another committed transaction has inserted additional rows that satisfy the condition.

SQL92 defines four levels of isolation in terms of the phenomena a transaction running at a particular isolation level is permitted to experience. They are shown in Table 13-1:

Table 13-1 Preventable Read Phenomena by Isolation Level

Isolation Level	Dirty Read	Nonrepeatable Read	Phantom Read
Read uncommitted	Possible	Possible	Possible
Read committed	Not possible	Possible	Possible
Repeatable read	Not possible	Not possible	Possible
Serializable	Not possible	Not possible	Not possible

DBMS - Find nth salary from table

Suppose that you are given the following simple database table called Employee that has 2 columns named Employee ID and Salary:

Employee
Employee ID	Salary
3	200
4	800
7	450

Write a SQL query to get the second highest salary from the table above. Also write a query to find the nth highest salary in SQL, where n can be any number.

The easiest way to start with a problem like this is to ask yourself a simpler question first. So, let’s ask ourselves how can we find the highest salary in a table? Well, you probably know that is actually really easy – we can just use the MAX aggregate function:

select MAX(Salary) from Employee;

Remember that SQL is based on set theory

You should remember that SQL uses sets as the foundation for most of its queries. So, the question is how can we use set theory to find the 2nd highest salary in the table above? Think about it on your own for a bit – even if you do not remember much about sets, the answer is very easy to understand and something that you might be able to come up with on your own.

Figuring out the answer to find the 2nd highest salary

What if we try to exclude the highest salary value from the result set returned by the SQL that we run? If we remove the highest salary from a group of salary values, then we will have a new group of values whose highest salary is actually the 2nd highest in theoriginal Employee table.

So, if we can somehow select the highest value from a result set thatexcludes the highest value, then we would actually be selecting the 2nd highest salary value. Think about that carefully and see if you can come up with the actual SQL yourself before you read the answer that we provide below. Here is a small hint to help you get started: you will have to use the “NOT IN” SQL operator.

Solution to finding the 2nd highest salary in SQL

Now, here is what the SQL will look like:

SELECT MAX(Salary) FROM Employee
WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee )

Running the SQL above would return us “450”, which is of course the 2nd highest salary in the Employee table.

Subscribe to our newsletter for more free interview questions.

An explanation of the solution

The SQL above first finds the highest salary value in the Employee table using “(select MAX(Salary) from Employee)”. Then, adding the “WHERE Salary NOT IN” in front basically creates a new set of Salary values that does not include the highest Salary value. For instance, if the highest salary in the Employee table is 200,000 then that value will be excluded from the results using the “NOT IN” operator, and all values except for 200,000 will be retained in the results.

This now means that the highest value in this new result set will actually be the 2nd highest value in the Employee table. So, we then select the max Salary from the new result set, and that gives us 2nd highest Salary in the Employee table. And that is how the query above works.

An alternative solution using the not equals SQL operator

We can actually use the not equals operator – the “<>” – instead of the NOT IN operator as an alternative solution to this problem. This is what the SQL would look like:

select MAX(Salary) from Employee
WHERE Salary <> (select MAX(Salary) from Employee )

How would you write a SQL query to find the Nth highest salary?

What we did above was write a query to find the 2nd highest Salary value in the Employee table. But, another commonly asked interview question is how can we use SQL to find the Nth highest salary, where N can be any number whether it’s the 3rd highest, 4th highest, 5th highest, 10th highest, etc? This is also an interesting question – try to come up with an answer yourself before reading the one below to see what you come up with.

The answer and explanation to finding the nth highest salary in SQL

Here we will present one possible answer to finding the nth highest salary first, and the explanation of that answer after since it’s actually easier to understand that way. Note that the first answer we present is actually not optimal from a performance standpoint since it uses a subquery, but we think that it will be interesting for you to learn about because you might just learn something new about SQL. If you want to see the more optimal solutions first, you can skip down to the sections that says “Find the nth highest salary without a subquery” instead.

The SQL below will give you the correct answer – but you will have to plug in an actual value for N of course. This SQL to find the Nth highest salary should work in SQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS:

SELECT * /*This is the outer query part */
FROM Employee Emp1
WHERE (N-1) = ( /* Subquery starts here */
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

How does the query above work?

The query above can be quite confusing if you have not seen anything like it before – pay special attention to the fact that “Emp1″ appears in both the subquery (also known as an inner query) and the “outer” query. The outer query is just the part of the query that is not the subquery/inner query – both parts of the query are clearly labeled in the comments.

The subquery is a correlated subquery

The subquery in the SQL above is actually a specific type of subquery known as acorrelated subquery. The reason it is called a correlated subquery is because the the subquery uses a value from the outer query in it’s WHERE clause. In this case that value is the Emp1 table alias as we pointed out earlier. A normal subquery can be runindependently of the outer query, but a correlated subquery can NOT be run independently of the outer query. If you want to read more about the differences between correlated and uncorrelated subqueries you can go here: Correlated vs Uncorrelated Subqueries.

The most important thing to understand in the query above is that the subquery is evaluated each and every time a row is processed by the outer query. In other words, the inner query can not be processed independently of the outer query since the inner query uses the Emp1 value as well.

Finding nth highest salary example and explanation

Let’s step through an actual example to see how the query above will actually execute step by step. Suppose we are looking for the 2nd highest Salary value in our table above, so our N is 2. This means that the query will look like this:

SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary)

You can probably see that Emp1 and Emp2 are just aliases for the same Employee table – it’s like we just created 2 separate clones of the Employee table and gave them different names.

Understanding and visualizing how the query above works

Let’s assume that we are using this data:

Employee
Employee ID	Salary
3	200
4	800
7	450

For the sake of our explanation, let’s assume that N is 2 – so the query is trying to find the 2nd highest salary in the Employee table. The first thing that the query above does is process the very first row of the Employee table, which has an alias of Emp1.

The salary in the first row of the Employee table is 200. Because the subquery is correlated to the outer query through the alias Emp1, it means that when the first row is processed, the query will essentially look like this – note that all we did is replace Emp1.Salary with the value of 200:

SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > 200)

So, what exactly is happening when that first row is processed? Well, if you pay special attention to the subquery you will notice that it’s basically searching for the count of salary entries in the Employee table that are greater than 200. Basically, the subquery is trying to find how many salary entries are greater than 200. Then, that count of salary entries is checked to see if it equals 1 in the outer query, and if so then everything from that particular row in Emp1 will be returned.

Note that Emp1 and Emp2 are both aliases for the same table – Employee. Emp2 is only being used in the subquery to compare all the salary values to the current salary value chosen in Emp1. This allows us to find the number of salary entries (the count) that are greater than 200. And if this number is equal to N-1 (which is 1 in our case) then we know that we have a winner – and that we have found our answer.

But, it’s clear that the subquery will return a 2 when Emp1.Salary is 200, because there are clearly 2 salaries greater than 200 in the Employee table. And since 2 is not equal to 1, the salary of 200 will clearly not be returned.

So, what happens next? Well, the SQL processor will move on to the next row which is 800, and the resulting query looks like this:

SELECT *
FROM Employee Emp1
WHERE (1) = (
SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > 800)

Since there are no salaries greater than 800, the query will move on to the last row and will of course find the answer as 450. This is because 800 is greater than 450, and the count will be 1. More precisely, the entire row with the desired salary would be returned, and this is what it would look like:


EmployeeID Salary
7 450

It’s also worth pointing out that the reason DISTINCT is used in the query above is because there may be duplicate salary values in the table. In that scenario, we only want to count repeated salaries just once, which is exactly why we use the DISTINCT operator.

A high level summary of how the query works

Let’s go through a high level summary of how someone would have come up with the SQL in the first place – since we showed you the answer first without really going through the thought process one would use to arrive at that answer.

Think of it this way – we are looking for a pattern that will lead us to the answer. One way to look at it is that the 2nd highest salary would have just one salary that is greater than it. The 4th highest salary would have 3 salaries that are greater than it. In more general terms, in order to find the Nth highest salary, we just find the salary that has exactly N-1 salaries greater than itself. And that is exactly what the query above accomplishes – it simply finds the salary that has N-1 salaries greater than itself and returns that value as the answer.

Find the nth highest salary using the TOP keyword in SQL Server

We can also use the TOP keyword (for databases that support the TOP keyword, like SQL Server) to find the nth highest salary. Here is some fairly simply SQL that would help us do that:

SELECT TOP 1 Salary
FROM (
      SELECT DISTINCT TOP N Salary
      FROM Employee
      ORDER BY Salary DESC
      ) AS Emp
ORDER BY Salary

To understand the query above, first look at the subquery, which simply finds the N highest salaries in the Employee table and arranges them in descending order. Then, the outer query will actually rearrange those values in ascending order, which is what the very last line “ORDER BY Salary” does, because of the fact that the ORDER BY Default is to sort values in ASCENDING order. Finally, that means the Nth highest salary will be at the top of the list of salaries, which means we just want the first row, which is exactly what “SELECT TOP 1 Salary” will do for us!

Find the nth highest salary without using the TOP keyword

There are many other solutions to finding the nth highest salary that do not need to use the TOP keyword, one of which we already went over. Keep reading for more solutions.

Find the nth highest salary in SQL without a subquery

The solution we gave above actually does not do well from a performance standpoint. This is because the use of the subquery can really slow down the query. With that in mind, let’s go through some different solutions to this problem for different database vendors. Because each database vendor (whether it’s MySQL, Oracle, or SQL Server) has a different SQL syntax and functions, we will go through solutions for specific vendors. But keep in mind that the solution presented above using a subquery should work across different database vendors.

Find the nth highest salary in MySQL

In MySQL, we can just use the LIMIT clause along with an offset to find the nth highest salary. If that doesn’t make sense take a look at the MySQL-specific SQL to see how we can do this:

SELECT Salary FROM Employee 
ORDER BY Salary DESC LIMIT n-1,1

Note that the DESC used in the query above simply arranges the salaries in descending order – so from highest salary to lowest. Then, the key part of the query to pay attention to is the “LIMIT N-1, 1″. The LIMIT clause takes two arguments in that query – the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. So, it’s saying that the offset of the first row to return should be N-1, and the max number of rows to return is 1. What exactly is the offset? Well, the offset is just a numerical value that represents the number of rows from the very first row, and since the rows are arranged in descending order we know that the row at an offset of N-1 will contain the (N-1)th highest salary.

Find the nth highest salary in SQL Server

In SQL Server, there is no such thing as a LIMIT clause. But, we can still use the offset to find the nth highest salary without using a subquery – just like the solution we gave above in MySQL syntax. But, the SQL Server syntax will be a bit different. Here is what it would look like:

SELECT Salary FROM Employee 
ORDER BY Salary DESC OFFSET N-1 ROW(S) 
FETCH FIRST ROW ONLY

Note that I haven’t personally tested the SQL above, and I believe that it will only work in SQL Server 2012 and up. Let me know in the comments if you notice anything else about the query.

Find the nth highest salary in Oracle using rownum

Oracle syntax doesn’t support using an offset like MySQL and SQL Server, but we can actually use the row_number analytic function in Oracle to solve this problem. Here is what the Oracle-specific SQL would look like to find the nth highest salary:

select * from (
  select Emp.*, 
row_number() over (order by Salary DESC) rownumb 
from Employee Emp
)
where rownumb = n;  /*n is nth highest salary*/

The first thing you should notice in the query above is that inside the subquery the salaries are arranged in descending order. Then, the row_number analytic function is applied against the list of descending salaries. Applying the row_number function against the list of descending salaries means that each row will be assigned a row number starting from 1. And since the rows are arranged in descending order the row with the highest salary will have a 1 for the row number. Note that the row number is given the alias rownumb in the SQL above.

This means that in order to find the 3rd or 4th highest salary we simply look for the 3rd or 4th row. The query above will then compare the rownumb to n, and if they are equal will return everything in that row. And that will be our answer!

Find the nth highest salary in Oracle using RANK

Oracle also provides a RANK function that just assigns a ranking numeric value (with 1 being the highest) for some sorted values. So, we can use this SQL in Oracle to find the nth highest salary using the RANK function:

select * FROM (
select EmployeeID, Salary
,rank() over (order by Salary DESC) ranking
from Employee
)
WHERE ranking = N;

The rank function will assign a ranking to each row starting from 1. This query is actually quite similar to the one where we used the row_number() analytic function, and works in the same way as well.

Direct Comparison - Findbugs vs PMD vs Checkstyle

	Findbugs ^[1]	PMD ^[3]	Checkstyle ^[2]
Version	3.0.0	5.2.2	6.1.1
License	Lesser GNU Public License	BSD-style license	Lesser General Public License
Purpose	Potential Bugs finds - as the name suggests - bugs in Java byte code	Bad Practices looks for potential problems, possible bugs, unused and sub-optimal code and over- complicated expressions in the Java source code	Conventions scans source code and looks for coding standards, e.g. Sun Code Conventions, JavaDoc
Strengths	- finds often real defects - low false detected rates - fast because byte code - less than 50% false positive	- finds occasionally real defects - finds bad practices	- finds violations of coding conventions
Weaknesses	- is not aware of the sources - needs compiled code	- slow duplicate code detector	- can't find real bugs
Number of rules	408	234	132
Rule Categories	Correctness Bad practice Dodgy code Multithreaded Correctness Performance Malicious Code Vulnerability Security Experimental Internationalization	*JSP* - Basic JSF - Basic JSP XSL - XPath in XSL Java - Design - Coupling - Jakarta Commons Logging - Basic - Strict Exceptions - Security Code Guidelines - Java Logging - Android -Controversial - Comments - Type Resolution - Empty Code - String and StringBuffer - Code Size - Braces - Unused Code - Unnecessary - J2EE - JavaBeans - Migration - Import Statements - JUnit - Naming - Finalizer - Optimization - Clone Implementation Ecmascript - Basic Ecmascript - Unnecessary - Braces XML - Basic XML	Annotations Block Checks Class Design Coding Duplicate Code Headers Imports Javadoc Comments Metrics Miscellaneous Modifiers Naming Conventions Regexp Size Violations

Eclipse Memory Analyzer (MAT) - Tutorial

1. Memory handling in Java

Java handles its memory in two areas. The heap and the stack. We will start with a short overview of memory in general on a computer. Then the Java heap and stack is explained.

1.1. Native Memory

Native memory is the memory which is available to a process, e.g. the Java process. Native memory is controlled by the operating system (OS) and based on physical memory and other physical devices, e.g. disks, flash memory, etc.

The processor (CPU) of the computer computes the instructions to execute and stores its computation results into registers. These registers are fast memory elements which stores the result of the CPU. The processor can access the normal memory over the memory bus. A amount of memory a CPU can access is based on the size of the physical address which the CPU uses to identify physical memory. A 16-bit address can access 2^16 (=65.536) memory locations. A 32-bit address can access 2^32 (=4.294.967.296) memory locations. If each memory area consists of 8 bytes then a 16-bit system can access 64KB of memory and the 32-bit system can access 4GB of memory.

An operating system (OS) normally uses virtual memory to map the physical memory to memory which each process can see. The OS assigns then memory to each process in a virtual memory space for this process and maps access to this virtual memory to the real physical memory.

Current 32-bit systems uses an extension (Physical Address Extension (PAE)) which extends the physical space to 36-bits of the operation system. This allows the OS to access 64GB. The OS uses then virtual memory to allow the individual process 4 GB of memory. Even with PAE enabled a process can not access more than 4 GB of memory.

Of course with a 64-bit OS this 4GB limitation does not exist anymore.

1.2. Memory in Java

Java manages the memory for use. New objects created and placed in the heap. Once your application have no reference anymore to an object the Java garbage collector is allowed to delete this object and remove the memory so that your application can use this memory again.

1.3. Java Heap

In the heap the Java Virtual Machine (JVM) stores all objects created by the Java application, e.g. by using the "new" operator. The Java garbage collector (gc) can logically separate the heap into different areas, so that the gc can faster identify objects which can get removed.

The memory for new objects is allocated on the heap at run time. Instance variables live inside the object in which they are declared.

1.4. Java Stack

Stack is where the method invocations and the local variables are stored. If a method is called then its stack frame is put onto the top of the call stack. The stack frame holds the state of the method including which line of code is executing and the values of all local variables. The method at the top of the stack is always the current running method for that stack. Threads have their own call stack.

1.5. Escape analysis

As stated earlier Java objects are created and stored in the heap. The programming language does not offer the possibility to let the programmer decide if an object should be generated in the stack. But in certain cases it would be desirable to allocate an object on the stack, as the memory allocation on the stack is cheaper than the memory allocation in the heap, deallocation on the stack is free and the stack is efficiently managed by the runtime.

The JVM uses therefore internally escape analysis to check if an object is used only with a thread or method. If the JVM identify this it may decide to create the object on the stack, increasing performance of the Java program.

1.6. Memory leaks

The garbage collector of the JVM releases Java objects from memory as long as no other object refers to this object. If other objects still hold references to these objects, then the garbage collector of the JVM cannot release them.

2. Analyzing memory leaks with Eclipse

2.1. Using heap dumps to get a snapshot of the memory of an application

A heap dump is a snapshot of the complete Java object graph on a Java application at a certain point in time. It is stored in a binary format called HPROF.

It includes all objects, fields, primitive types and object references.

2.2. The Eclipse Memory Analyser (MAT) tooling

The Eclipse Memory Analyser Tooling (MAT) is a set of plug-ins for the Eclipse IDE which provides tools to analyze heap dumps from Java application and to identify memory problems in the application. This helps the developer to find memory leaks and high memory consumption issues.

It visualizes the references to objects based on Java heap dumps and provides tools to identify potential memory leaks.

3. Analyzing Android heap dumps with Eclipse

Android allows to create heap dumps of an application's heap. This heap dump is stored in a binary format called HPROF. To create a heap dump use the Dump HPROF file button in the DDMS Perspective.

The Android heap dump format is similar to the Java heap dump format but not exactly the same. Eclipse MAT can work directly with the Android heap dump format.

4. Installation

Install Eclipse MAT via the Help → Install New Software... menu entry. Select the update site of your release from the drop-down box and once its content is downloaded, select General Purpose Tools and its sub-entries Memory Analyzer and Memory Analyzer(Charts).

5. Creating heap dumps for Java programs

It is possible to instruct the JVM to create automatically a heap dump in case that it runs out of memory, i.e. in case of aOutOfMemoryError error. To instruct the JVM to create a heap dump in such a situation, start your Java application with the -XX:+HeapDumpOnOutOfMemoryError option.

Use the File → New → Other... → Other → Heap Dump menu entry to open a dialog to select for which process you want to acquire a memory dump.

Select the process for a heap dump in the following dialog and press the Finish button.

Alternatively you can also interactively create a heap dump via Eclipse. For this open the Memory Analysis perspective via Open Perspective → Other....

If you trigger the creation of the heap manually the JVM performs a garbage collector run before it writes the heap dump.

6. Use the Eclipse Memory Analyzer

After a new heap dump with the .hprof ending has been created, you can open it via a double-click in Eclipse. If you used MAT to create the heap dump, it should be opened automatically.

You may need to refresh your project (F5 on the project). Double-click the file and select the Leak Suspects Report.

The overview page allows you to start the analysis of the heap dump. The dominator tree gives quickly an overview of the used objects.

In the dominator tree you see the references which are hold.

To find which element is holding the reference to this object, select the entry and select Find shortest path to GC root from the context menu.

7. Example

7.1. Create Project

Create the Java project called com.vogella.mat.first and the com.vogella.mat.first package. Create the following class.

package com.vogella.mat.first;

import java.util.ArrayList;
import java.util.List;

public class Main {

  
/**
   * @param args
   */
public static void main(String[] args) {
    List list = new ArrayList();
    while (1<2){
      list.add("OutOfMemoryError soon");
    }

  }

}

7.2. Run project and create heap dump

In Eclipse add the -XX:+HeapDumpOnOutOfMemoryError to the runtime configuration.

Run the project. It crashes and writes an heap dump.

7.3. Use MAT to analyze the heap dump

Open the heap dump in MAT and get familiar with using the MAT tooling.