Yesterday I write
a brief intro to Neo4J and promised to write more. Yesterday's post focused in on what a Graph Database is. Today we'll actually download and install the software itself. First, and I know you all hate this part, please make sure you have the pre-requisites installed. The people who write most software today all take time to document this so you get to have a better experience.
For reference, the machine I am on is a Mac Pro, 12 GB RAM and running OSX 10.7.3. It has 2 X 3 GHz Quad Core Intel Xeon processors so it should make short work of anything I through at it. My machine is Unix based so please modify the instructions based on your operating system. Linux users will be roughly the same.
1. Grab your browser (Chrome of course) and point it at
http://neo4j.org/download/
2. There are a variety of options open. Unless you want to start with the über - enterprise version or
build Neo4J Community from GitHub source, simply grab the latest stable version. In this case it is the Community (ie "free") version 1.7. If you have a slow connection, use the time wisely to view
Emil's video while the download completes. You will have to choose a location for the download.
3. Unpack the source with whatever tools your Operating System provides. Once unpacked, it will look something like this:
4. To start Neo4J, it is quite easy. Grab a terminal (Shell) and navigate to the <neo4J_Home_Directory>/bin directory and type in sh ./neo4j This will give you a list of available options for starting the database as shown below:
5. These are very simple and self explanatory. To start neo4J, simply type in sh ./neo4j start
6. There are two ways to verify the neo4j instance has started. the first is to type in the command sh ./neo4j status which gives you a simple acknowledgement that it has started and the process ID (Unix based systems). A second, more verbose set of details, can be retrieved by typing in the command sh ./neo4j info This gives you a wealth of information including every jar in the CLASSPATH, JAVA_OPTS (options), the environmental variable JAVA_HOME, the NEO4J_INSTANCE which is a path to the current instance, the server PORT it is using over HTTP and the NEO4J_HOME environmental variable as well as the current JDK value.
Validating the port is fairly easy. Just grab your handy browser (Chrome please) and go to
http://localhost:7474 (unless you've already changed the configuration file to a different port). YOu should see a newly initalized databased as such:
So where can we configure these? Let's start with the JAVA_OPTS. YOu will see a line that looks like this -Dorg.neo4j.server.properties=conf/neo4j-server.properties The first part of this configuration starting with -Dorg.neo... is specified within a file that is under the /conf directory named neo4j-wrapper.conf. The JVM parameters are all specified here but are actually pointers to other configuration files. If you open this fine, you will see the same lines here:
wrapper.java.additional.1=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional.2=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional.3=-Dfile.encoding=UTF-8
Note that on OSX, the Java Version simply says "CurrentJDK" which is pretty bloody useless. If you really want to know the JDK version, use your command window and type in java -version. In my case I am running
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode)
Within the neo4j-wrapper.conf there are several properties such as the initial and maximum java heap sizes along with some warnings. It is probably a good idea to become familiar with the warnings before you go about tinkering with these settings. The one I found useful was to be able to uncomment line 10 to allow garbage collection logging. This data can be valuable in determining what is going on under the hood so I change it with every install.
This information is important for the next steps when we will embed neo4J into an Eclispe project. For today, let's play with our new Neo4J instance. In order to get the neo4j shell, simply type in the command (under the same
<neo4J_Home_Directory>/bin directory)
sh ./neo4j-shell This will now enable you to look around at some of the available options. Type in
help for a list of commands.
The shell commands are all well documented at
http://docs.neo4j.org/chunked/stable/shell-starting.html. Note that the shell is configured and enabled from the configuration of the Neo4j kernel (again in the /conf folder). Okay - enough foreplay. Let's make some nodes! Build your first few nodes ny using the mknode command. I can make two nodes, one with my name and one with my wife's name as such.
neo4j-sh (0)$ mknode Duane
neo4j-sh (0)$ mknode Bettina
If I go back and check the browser aimed at
http://localhost:7474 it will confirm that now in addition to the root node, there are two additional nodes for a total of three. Congratulations. You have just made some nodes! So how do you get to those nodes? This is why I started the documentation on Technoracle. I found it confusing using the neo4j-shell since almost every command resulted in an empty query until I read the documentation (an engineer's last resort). The GraphDB works almost exactly like a unix filesystem which means your learning curve should be tens times faster (assuming you're familiar with unix commands). When you invoke the shell, you are basically in the "~" directory or "me" as the neo4J folks call it. To traverse somewhere (to Duane or Bettina for example, you need to make a relationship using the mkrel command. It is very simple. Type in the following,
mkrel -ct KNOWS Duane
Now let's dissect this. mkrel is the command to make a relationship. There are two variables "-ct". C should be supplied if you are creating a new node (the wording is a bit rough to read using the "man pages and it took a while to figure out that a a relationship is basically a node as well.
If you want to nuke the graph, you can also do this using rmnode to delete nodes and rmrel to delete relationships. rmnode comes with a nasty little flag -f which has the same effect as the unix command "su rm -r *" which removes everything. As you probably guessed, using rmnode with the -f flag was irresistable and I had to try it so I typed in rmnode -f. Now beware, this removes all including the current node. Once you run this and try an ls, the return will be a question mark since there is no current node (or at least it didn't seem to be reachable. After running this you get:
neo4j-sh (?)$ ls
Node <ref> not found
To fix this, simply create a new root current node. You can even use JSON to give it a more robust meaningful name.
neo4j-sh (?)$ mknode --cd --np "{'name':'me'}"
neo4j-sh (me,10)$
Don't ask me why I do stuff like this. I think I just like to explore what is possible and how to recover before doing any serious work.
Earlier we created nodes that had no relationships we could traverse. Now we will use a different syntax to create new nodes we can reach. This easier way to create new nodes you can traverse to using the shell is to use the mkrel command. This command can actually create the new nodes as well as the relationship between the current and newly created node. To do this, type in the following:
mkrel -t LOVES neo4j and then type in ls into the prompt after that has finished.
Ta-da! You can now traverse the node using the trav command. The traverse command is very complex and can build very powerful statements and filters. For this lesson, all we want to do is go from the current node to the node we just created (node 11) in this case. To do this, use the syntax
trav -o depth -r LOVES:both,HAS_.*:incoming
trav = traverse
-o depth = the traversal order. The only possible values are BREADTH_FIRST DEPTH_FIRST breadth or depth. Think of these as controls over a funnel - very wide and shallow or very narrow and deep)
-r LOVES:both,HAS_.*:incoming= the r flag sets the relationship type. In this case apparently (me) - [:LOVES] -> (11) or I love node 11. To be honest, I might delete it right after this tutorial as I am already growing tired of this relationship ;-) I am somewhat not clear on the ":both,HAS_.*:incoming" however I believe it specifies that the traversal is to disregard the fact it is incoming or outgoing.
Okay - enough for today boys and girls. More soon. In the meantime, please do try this at home!