Wednesday, April 25, 2012

Getting started with Neo4J - a Beginners Tutorial

I've worked with databases for a long time.  Recently, a came across neo4J and cannot believe how awesome it is.  I want to devote this blog post to helping people get it installed and the fun you can have.

First, a little bit abut Graph Databases.  Graph databases are substantially different than RDBMS systems. As one person puts it, if you write, you can write code, if you can draw you can draw graphs.  It is really that simple.  A graph database starts with a root node.  The database is comprised of nodes, relationships, indexes and properties.  A simplification of this is the following chart:


A simple graph database.

A graph database keeps track of nodes and relationships as well as indexes (we'll get into that later).  For now think of this in kinderSpiele terms.  A graph is simply a drawing that show how things are connected.  Each connection might have a name.  each node might also have a name.  Here is a simplistic view of a graph.


In this simple graph, Duane - [LOVES] -> Neo4j, the latter of this has a property of being binary.  Now psychology aside (this in fact would be an unhealthy physical relationship), this captures several important concepts yet leaves out several very relevant ontological answers. Relationships organize Nodes into structures that allow a Graph to resemble many natural structures including a List, a Tree, a Map, or a compound Entity – any of which can be combined into yet more complex, richly inter-connected structures.  It is obvious that Duane loves Neo4J but here are some questions that are left unanswered.

1. Does neo4J love Duane back?
2. Is Neo4J even aware that Duane loves it?
3. Is Duane able to see that Neo4J is in fact a binary node and probably not suited for a proper relationship?

All are possible but undefined in this scenario.    That is why the next concept that must be introduced is a traversal mechanism.  Traversals allow navigation of graphs via statements that can select exact routing between many of these objects.  THese can be written in many languages such as cypher and allow a filter to be applied to find a path though the nodes and relationships to find answers to certain questions.  Such a question in the real world may be "How many friends do I have who enjoy eating spumonte ice crean while reading up on graph databases on Technoracle".  In reality that subset of the population is likely very small but when applied to something like Facebook or Google Plus, become highly relevant.

Here is a depiction of how traversals work. Again this is rudimentary.



Again,. this is very simple but you get the idea.  The traversal mechanism can take a set of instructions, then use if to find data it requires very efficiently.  An example might be that you use Facebook.  When you log in, it starts with the node of "you".  As the page loads, the javaScript on the page creates a backend query that says find all the nodes that are related to the user down to a layer of X deep.  Neo4J's Java API supports depth limits making it idea for this sort of operation.  Unlike an RDBSM system where an entire table might have to be walked, Neo4J allows you to set limits and take actions based on the current state.   Paths are predefined statements, often written in Cypher.

Keeping with the Facebook example, an INDEX is often a useful tool.  When a certain node is required a a start point over and over again, you can use it as an index to start with.  By contract, RDBS systems use a table and rows lookup to find the startpoint.  The index is simply a contextual based starting point.  Indexes can map directly to a node, a relationship or backwards from a property.  Instead of saying:
SELECT * FROM TABLES WHERE * EQUALS "Duane Nickull"....  you can tell a graphDB to "get Duane Nickull" then traverse outwards from him.  Simple and efficient.

Neo4J is a commercially supported, free and open source graph database that is going to rock the world.  Trust me on this.  Next post will be getting started.  All the sordid details (at least 3 easy steps) it takes to get up and running.





6 comments:

  1. Nice writeup - superstoked that you like it Duane!

    ReplyDelete
  2. I'm thinking to start a new project which will requires network-like data model. So your article is a very timely for me.
    From other side Graph Databases have seen their ups and downs. One of the reason is their operational efficiency compare to RDBMS. How you will describe Neo4J in this context? Will it scale well in your opinion?
    Thank you again for the article!

    ReplyDelete
  3. Absolutely! It scales very well. If you are interested in some additional facts, please ping me at duane at nickull dot net and I can get you all you need.

    ReplyDelete
  4. I've just start using Neo4j for my thesis. So I'm very glad to see this kind of blog.

    ReplyDelete
  5. Way to go, very interesting concepts and stuff, but can`t wait to start coding

    ReplyDelete

Do not spam this blog! Google and Yahoo DO NOT follow comment links for SEO. If you post an unrelated link advertising a company or service, you will be reported immediately for spam and your link deleted within 30 minutes. If you want to sponsor a post, please let us know by reaching out to duane dot nickull at gmail dot com.