Loading...
Loading...
Nearly two decades of graph DB leadership (first release 2007). Cypher, property graph, massive community, Aura managed option.
AWS-only, fully managed. Supports both property graph (openCypher + Gremlin) and RDF (SPARQL).
Multi-model. Graph, document, and key-value in one engine. AQL query language across all models.
Pick Neo4j when graph is the product: knowledge graphs, fraud detection, recommendations, identity graphs.
Cypher is the closest existing language to the GQL ISO standard (39075:2024), Aura runs on all three big clouds, and the community is unmatched.
Pick Amazon Neptune when you are AWS-native and want managed graph with IAM, VPC, and CloudWatch built in.
Supports both property graph (openCypher + Gremlin) and RDF (SPARQL); Neptune Analytics handles heavier graph-ML workloads.
Pick ArangoDB when you need graph alongside other data models without running multiple databases. Graph + document + key-value in one engine, one query language (AQL).
Best when the graph is one part of a bigger app, not the whole product.
Rare. Unlike some database categories, running multiple graph databases is almost always a mistake - the data models and query languages differ enough that cross-store consistency is hard. Pick one. The exception is keeping RDF (SPARQL) data in Neptune alongside a property-graph Neo4j, but most teams should consolidate.
What each database can natively store and query. Neo4j is graph-only (property graph). Neptune is graph-only but supports BOTH property graph and RDF. ArangoDB is multi-model - graph, document, and key-value in one engine. The shape of this coverage often decides "can this database replace more than one thing in my stack?"
Native support means first-class storage and query. Partial means workable through plugins or secondary APIs. No means not supported natively - you would run a separate database for that model. Coverage drives "can this single database replace what I run now?" decisions more often than pure query performance.
A 6-step mental model for picking the right graph database based on your cloud stack, query language preference, and whether graph is the whole product or a feature.
If graph is central to the product (knowledge graph, fraud detection, recommendation engine), Neo4j is the safest default - biggest community, richest graph-specific tooling, best documentation. If graph is one data model among several, ArangoDB's multi-model story is worth considering.
Why this is not a win: GQL (ISO/IEC 39075:2024) is the first new ISO database language since SQL, and the property-graph half of the industry is converging on it. Neo4j Cypher already supports most mandatory GQL features (per the official Cypher Manual GQL conformance pages); AWS has publicly committed to making Cypher an implementation of GQL "in our products and in openCypher" (AWS Database Blog), which covers Neptune as the AWS managed graph product. ArangoDB's AQL is outside that convergence but covers graph + document + key-value. Net effect: query language is becoming less of a differentiator over time, not more.
Why it matters: ArangoDB is the only multi-model option. Neptune covers both property graph and RDF which is unique. Neo4j is graph-only but that is its specialty.
Why this is not a win: Neo4j Aura runs on AWS, GCP, Azure. Neptune is AWS-only. ArangoGraph (formerly Oasis) runs on AWS, GCP, and Azure. Hosting flexibility varies.
Why it matters: Neo4j has decades of tutorials, books, conference talks, and job postings. Neptune benefits from AWS ecosystem. ArangoDB has a dedicated, active community but is smaller.
Why it matters: Neo4j Community is GPLv3 - real OSI-approved open source, copyleft. Neptune is proprietary AWS-only. ArangoDB shifted to BSL 1.1 + a custom Community License in 3.12 (Q1 2024) with a 100 GB dataset cap and commercial-use restrictions; the source converts to Apache 2.0 after a 4-year change date but is not OSI-approved today. If "actually open source" matters, Neo4j Community is the only option here.
Why this is not a win: All three scale to billions of edges in production with the right tuning. Hundreds of billions is where things get hard for any graph DB. Scaling strategies differ significantly.
Why it matters: Neo4j's storage engine is graph-native and pointer-chasing is efficient. Neptune matches on transactional workloads. ArangoDB trades some graph-specific performance for multi-model flexibility.
Why this is not a win: Neo4j GDS has a decade of graph algorithm implementations. Neptune Analytics (GA November 29, 2023) plus Neptune ML added strong in-place analytics. ArangoDB has Pregel-style algorithms but less specialized tooling.
Why it matters: Neo4j has invested heavily in "graph + LLM" - knowledge graph RAG, vector indexes inside Neo4j 5.x, tight LangChain/LlamaIndex integration. Neptune and ArangoDB are catching up.
Illustrative performance and cost shapes for a 50M-node, 200M-edge property graph. Exact numbers vary with traversal depth, query complexity, and hardware. Graph-DB benchmarks are notoriously workload-specific - use your own traces.
| Operation | Dataset | Neo4j | Neptune | ArangoDB | Delta |
|---|---|---|---|---|---|
| Shortest-path query (depth 4) | social graph, 50M nodes | ~20-50 ms | ~25-60 ms | ~30-80 ms | - |
| Bulk import (nodes + edges) | 200M edges | ~45 min (neo4j-admin import) | ~60 min (bulk loader) | ~50 min (arangoimport) | - |
| Concurrent traversals (1k users) | 3-hop queries | ~2-4k qps (tuned) | ~1.5-3k qps (managed) | ~1.5-3k qps | - |
| Cost at medium scale | 50M nodes, tuned cluster | ~$800-1500/mo Aura dedicated | ~$1000-2000/mo Neptune | ~$600-1200/mo ArangoGraph or self-host | - |
| Multi-model query (graph + document) | graph traversal + JSON filter | Two separate queries / stores | Two separate queries / stores | One AQL query | - |
Below is a "find friends of friends who like hiking" query in each database's native query language. Cypher and openCypher are closest to each other (openCypher is modeled on Cypher). AQL uses its own syntax that looks different but handles graph queries cleanly. The data model is identical across all three; the query syntax is the main code-level difference.
// Neo4j - Cypher
MATCH (me:Person {id: $user_id})
-[:FRIEND]->(friend:Person)
-[:FRIEND]->(fof:Person)
-[:LIKES]->(topic:Topic {name: 'hiking'})
WHERE me <> fof AND NOT (me)-[:FRIEND]->(fof)
RETURN DISTINCT fof.name AS suggestion,
count(*) AS shared_friends
ORDER BY shared_friends DESC
LIMIT 10;
// Cypher reads almost like ASCII-art graph patterns.
// Neo4j's storage engine walks these patterns natively.// Amazon Neptune - openCypher (same syntax as Cypher)
MATCH (me:Person {id: $user_id})
-[:FRIEND]->(friend:Person)
-[:FRIEND]->(fof:Person)
-[:LIKES]->(topic:Topic {name: 'hiking'})
WHERE me <> fof AND NOT (me)-[:FRIEND]->(fof)
RETURN DISTINCT fof.name AS suggestion,
count(*) AS shared_friends
ORDER BY shared_friends DESC
LIMIT 10;
// Neptune also speaks Gremlin for property graph:
// g.V().has('Person', 'id', userId)
// .out('FRIEND').out('FRIEND').dedup()
// .where(out('LIKES').has('name', 'hiking'))
// ...and SPARQL for RDF if your data is triples.// ArangoDB - AQL (unified multi-model language)
FOR me IN Person
FILTER me.id == @user_id
FOR friend, e IN 1..1 OUTBOUND me FRIEND
FOR fof IN 1..1 OUTBOUND friend FRIEND
FILTER fof._id != me._id
LET topic = (
FOR t IN 1..1 OUTBOUND fof LIKES
FILTER t.name == 'hiking'
RETURN t
)
FILTER LENGTH(topic) > 0
COLLECT suggestion = fof.name
WITH COUNT INTO shared_friends
SORT shared_friends DESC
LIMIT 10
RETURN { suggestion, shared_friends }
// AQL is more verbose than Cypher for pure graph queries.
// Its strength is that the SAME language handles documents and key-value.Note: Cypher and openCypher (Neo4j / Neptune) share a pattern-matching syntax that reads close to ASCII graphs. AQL is more procedural and verbose for pure graph, but the same language extends to document and key-value queries - which is ArangoDB's core value proposition.
# Neo4j - neo4j-admin import (offline, fast)
# Prepare CSV files with specific headers:
# nodes.csv: :ID,name,:LABEL
# edges.csv: :START_ID,:END_ID,:TYPE,since
neo4j-admin database import full \
--nodes=Person=nodes.csv \
--relationships=FRIEND=edges.csv \
--overwrite-destination \
--high-parallel-io=on
# ~45 minutes for 50M edges on a beefy machine.
# Online import via LOAD CSV is slower but no downtime.# Neptune - S3 bulk loader
# Upload CSVs (Gremlin or openCypher format) to S3, then:
curl -X POST \
https://your-cluster.region.neptune.amazonaws.com:8182/loader \
-H "Content-Type: application/json" \
-d '{
"source": "s3://your-bucket/edges/",
"format": "opencypher",
"iamRoleArn": "arn:aws:iam::123456789012:role/NeptuneLoadFromS3",
"region": "us-east-1",
"failOnError": "FALSE",
"parallelism": "HIGH"
}'
# ~60 minutes for 50M edges. Uses parallel S3 reads
# and bulk indexing. You must format CSVs per Neptune spec.# ArangoDB - arangoimport (also supports streaming)
# Prepare JSON lines files:
# persons.jsonl: {"_key": "p1", "name": "Ada"}
# friends.jsonl: {"_from": "persons/p1", "_to": "persons/p2"}
arangoimport \
--file persons.jsonl \
--collection persons \
--type jsonl \
--server.endpoint tcp://localhost:8529
arangoimport \
--file friends.jsonl \
--collection friends \
--type jsonl \
--from-collection-prefix persons \
--to-collection-prefix persons
# ~50 minutes for 50M edges. Handles both documents
# and edge collections with the same tool.Note: Bulk import performance is close across all three (~45-60 min for 50M edges on reasonable hardware). The operational differences are meaningful: Neo4j's import requires downtime; Neptune uses S3 + IAM; ArangoDB uses its unified CLI. Pick by what fits your ops.
For graph-first products, Neo4j remains the safest default - largest community, richest graph-specific tooling (GDS, Bloom, Cypher), tightest LLM / RAG integration. Neptune wins on AWS-native ops simplicity. ArangoDB wins on multi-model needs. "Best" depends on your context; Neo4j wins on graph specialization.
Not entirely. Neptune supports openCypher which is close to Cypher, but not identical - some Neo4j-specific features (APOC library, Graph Data Science library, native vector indexes) do not exist on Neptune. Most straightforward Cypher queries port cleanly; advanced workloads require rewriting. Plan for a migration project, not a drop-in.
Property graph (Neo4j, Neptune, ArangoDB) models nodes + edges with properties on both. RDF (Neptune) models data as subject-predicate-object triples, optimized for semantic web and linked data use cases. Property graph is more intuitive for most application workloads; RDF shines for data integration and semantic reasoning tasks. Neptune is unique in supporting both.
Sometimes. ArangoDB can handle graph + document + key-value workloads in one engine, which is genuinely useful for multi-model apps. But for the most demanding single-model workloads (high-scale pure graph, high-scale pure document), specialized databases typically outperform multi-model. ArangoDB is great when the models are moderate; less compelling when one model is extreme.
Postgres with recursive CTEs handles small-to-medium graphs (tens of thousands of nodes with shallow traversals) acceptably. Past that, graph databases are typically 10-100x faster on deep traversals because their storage engines are optimized for pointer-chasing. If your graph is small and you already run Postgres, try it first. Past ~1M nodes with deep traversals, move to a dedicated graph DB.
Neo4j, by a clear margin in 2026. Neo4j 5.x has native vector indexes (for RAG), tight LangChain and LlamaIndex integration, and "GraphRAG" patterns are well-documented on Neo4j. Neptune and ArangoDB support vector search too, but the LLM ecosystem gravitates toward Neo4j for knowledge-graph-based RAG workflows.
Community Edition is GPL and fine for many workloads. Enterprise adds clustering, fine-grained access control, advanced monitoring, and commercial support. For production at any real scale, most teams end up on Enterprise or Aura (managed). For learning, prototyping, or small production workloads, Community is enough.
TigerGraph is a performant analytical graph DB with its own GSQL language - strong on pure graph analytics, smaller community. JanusGraph is an open-source distributed graph DB (Apache TinkerPop / Gremlin) that predates Neptune but is harder to operate. Dgraph is a GraphQL-native graph DB with its own niche, but factor in ownership churn: Dgraph Labs was acquired by Hypermode in 2023 and then by Istari Digital in October 2025, so anyone evaluating it in 2026 should account for two ownership changes in three years and check current roadmap signals before committing. All three are valid in specific scenarios but have smaller communities than Neo4j / Neptune / ArangoDB.
GQL (ISO/IEC 39075:2024) is the first new ISO database language standard since SQL, published by ISO on April 12, 2024. It defines a standard query language for property graphs - basically what SQL is for relational. Neo4j Cypher already supports most mandatory GQL features (Cypher Manual: GQL conformance, since Neo4j 5.23/5.25); AWS has publicly committed to making Cypher an implementation of GQL "in our products and in openCypher" (which covers Neptune); openCypher's stated mission is now to help engines converge to GQL conformance. The practical effect: query language is becoming less of a differentiator over time. ArangoDB's AQL sits outside the convergence (its value is multi-model, not graph-spec compliance), so picking ArangoDB means consciously stepping off the GQL path.