Moving pollution data from RDBMS to Neo4j

Moving pollution data from RDBMS to Neo4j

Neo4J graph db

Moving pollution data from RDBMS to Neo4j

In what case you choose graphdb, I choose it for traversal which is the one one the main feature of every graphdb. Ok let’s talk what is the graph database and how this helped in migrating the RDBMS database into Neo4j.

What is the graph DB?

In simpler words Graphdb works on graph theory. It is primarily a composition of Nodes and Edges so here are some points on what the graph database is - 

So a GraphDatabase - 

  • is a type of NoSQL database that uses graph theory to store, map and query relationships
  • a collection of nodes and edges
  • each node represents an entity (such as a person or business)
  • each edge represents a connection or relationship between two nodes
  • every node in a graph database is defined by a unique identifier
  • Neo4J uses cipher query language for Querying the Graph DB as SQL is available for RDBMS.

What is Cypher Query Language?

The cipher query language emphasizes on WHAT to find rather HOW to find. By nature it directly finds the relationship between two or more nodes. So Cypher is the declarative Query language to graphs as SQL to the relational databases. Its key principles and capabilities are:

  • Create, update, and remove nodes, relationships, labels, and properties.
  • Pattern matching for nodes and relationships in the graph, to extract information or modify the data.
  • Manages indexes and constraints.


Neo4 Relationship

Modeling Pollution RDBMS db in Neo4j!

First of all we need to understand entity relation in RDBMS. For that we have checked the country to states and states to the city to the station where the pollution has been getting measured. 

ER diagram RDBMS represents relationship
ER diagram RDBMS represents relationship


Pollution DB ER diagram

Each table has a relationship as a foreign key constraint. So here I ask which all cites have a certain pollution level, in RDMBS you need to have the relation with stations, pollution_data, pollutions, and city tables. But with graph DB its traversal so you just pick one node and can see all relationships together. Let’s see the graph model for the same.

Graph Data Model Neo4J

Graph Data Model Neo4J


Graph Model of the pollution DB mentioned above

In the graph model -

  • Nodes represent entities
  • Edges represent relationships
  • Connections between data are explored
  • Faster for associative data sets

After designing the model, we need to identify how to get the data imported in the neo4j, for that I have exported the data from SQL to CSV.

There are also other ways to do- 

  • Import from CSV
  • Import from API
  • Import from Code (Py2neo in python)

Import CSV in Neo4J

Here is the cipher query 

USING PERIODIC COMMIT LOAD CSV WITH HEADERS FROM "file:/pollution_data.csv" AS row CREATE (:pollution_data { name: row.pollution_type + ' = '+ row.avg_pollution , avg_pollution: row.avg_pollution, city_id: row.last_updated_on, pollution_id: row.pollution_id, station_id: row.station_id} );


This query loads the CSV file and creates all data as nodes in the GraphDB. 

Simple load of Data on neo4J
Simple load of Data on neo4J/Nodes without relationship



Create Relationship

You can create a relationship from the load query itself but for the demonstration purpose I have used it separately. 


MATCH(a:pollution_data), (b:pollution) where a.pollution_id=b.pollution_id CREATE (a)-[r:POLLUTION]->(b) return r


MATCH(a:pollution_data), (b:stations) where a.station_id=b.station_id CREATE (a)<-[r:POLLUTION_AT_STATION]-(b) return r


Load data from the relationship


MATCH p=(a:country)-[r]-(b:states)-[:CITY]->(c:city)-[:AREA]-(d:stations)-[:POLLUTION_AT_STATION]-(e:pollution_data) return p


Relationship in Neo4J

Data has been shown in on single node Delhi as state

Data relationship from country > State > City > Station > Pollution data or you can see in reverse way.

You can use Neo4J for various purposes for your client production application, visit Neo4j website for more details and an online sandbox account to try. 

Follow me on twitter:  ravindrasingh01

Disclaimer - Kindly note the data shown in the diagram was taken from and it is old data. Used to do my personal learning only.