Hacker News new | past | comments | ask | show | jobs | submit login
Memary: Open-Source Longterm Memory for Autonomous Agents (github.com/kingjulio8238)
216 points by james_chu 17 days ago | hide | past | favorite | 66 comments



This seems like its overloading the term knowledge graph from its origins. Rather than having information and facts encoded into the graph, this appears to be a sort of similarity search over complete responses. It's blog style "related content" links to documents rather than encoded facts.

Searching through their sources, it looks like the problem came from Neo4j's blog post misclassifying "knowledge augmentation" from a Microsoft research paper with "knowledge graph" (because of course they had to add "graph" to the title).

This approach is fine, and probably useful but its not a knowledge graph in the sense that its structure isn't encoding anything about why or how different entities are actually related. A concrete example in a knowledge graph you might have an entity "Joe" and a separate entity "Paris". Joe is currently located in Paris so would have a typed edge between the two entities of something like "LocatedAt".

I didn't dive into the code but what I inferred from the description and referenced literature, it is instead storing complete responses as "entities" and simply doing RAG style similarity searches to other nodes. It's a graph structured search index for sure but not a knowledge graph by the standard definitions.


Exactly. Glad to see this. I do think knowledge graphs are important to AI assistants and agents though and someone needs to build a knowledge graph solution for that space.

The idea of actual entities and relationships defined like triples with some schema and appropriately resolved and linked can be useful for querying and building up the right context. It may even be time to start bringing back some ideas from the schema.org back the day to standardize across agents/assistants what entities and actions are represented in data fed to them.


Yeah, one of the specific things I'd love to do is collaboratively bulking up WikiData more. It's missing a ton of low hanging fruit that people using an ML augmented tool could really make some good progress on, similar to ML assisted OpenStreetMapping work


This isn't going to work so long as WikiData is controlled by admins that misapply the notability criteria to delete information they subjectively consider trivial

I think the next frontier is a a wiki-style, collaborative site, deliberately purposed at storing information for LLMs


Is it bad that my first thought was "that's going to be a huge amount of work and end up with a lot of gaps and blind spots... we should have GPT-4 polulate it!"

I think it could work if each entry has to be peer reviewed by 3 (more?) humans. Although, AIs are better than humans at captchas now, so I'm not exactly sure how that would work either way...


Yeah precisely. Knowledge graphs are simple to think about but as soon as you look into them you realize all the complexity is in the creation of a meaningful ontology and loading data into that ontology. I actually think LLMs can be massively useful for building up the ontology but probably not in the creation of the ontology itself (far too ambiguous and large/conceptual task for them right now).


How do we build ontology using LLMs? Will the building blocks be like the different parts of a brain? P.S I am assuming that by "creation of ontology itself" means creation of AGI.


Ontologies are just defining what certain category, words, and entity types mean. Commonly used in NLP for data representation (“facts”/triples/etc.) in knowledge graphs and other places where the definition of an ontology helps provide structure.

This doesn’t have anything to do with AGI or brains. They are typically created or tuned by humans and then models fit/match/resolve entities to match the ontology.


@dbish nailed it, but I can give you a bit more concrete example. Continuing off the light example I started off with. An ontology for knowing what city a person currently is in. We have two classes of entities, a person, and a city. There is a single relationship type "LocatedAt" you can add and remove edges to indicate where a person is and you can construct some verification rules such as "a person can only be in one city at time".

To have an LLM construct a knowledge graph of where someone is (and I know this example is incredibly privacy invasive but its a simple concrete example not representative). Imagine giving an LLM access to all of your text messages. You can imagine giving it a prompt along the lines of "identify who is being discussed in this series of messages, if someone indicates where they are physically located report that as well" (you'd want to try harder than that, keeping it simple).

You could get an output that says something like `{"John Adams": "Philadelphia, PA, US"}`. If either the left or right side are missing create them. Then remove any LocatedAt edges for the left side and add one between these two entities. You have a simple knowledge graph.

Seems easy enough, but try to ask slightly harder questions... When did we know John Adams was in Philadelpha? Have they been there before? Where were they before Philadelphia? The ontology I just developed isn't capable of representing that data. You can of course solve these problems and there are common ontological patterns for representing it.

The point is, you kind of need to know the kind of questions you want to ask about your data when you're building your ontology and you're always going to miss something. Usually you find out the unknown questions you want to ask of the data only after you've already built your system and started asking it questions. It's the follow-ups that kill you.

There has been a lot of work on totally unstructured ontologies as well, but you're moving the hard problem elsewhere not solving it. Instead of having high quality data you can't answer every question with, you have arbitrary associations that may mean the same thing and thus any query you make is likely _missing_ relevant data and thus inaccurate.

Huge headache to go down, but honestly I think it is a worthwhile one. Previously if you changed your ontology to answer a new question, a human would have to go through and manually and painstakingly update your data to the new system. This is boring, tedious, easy-to-get-wrong-due-to-inattention kind of work. It's not complex, its not hard, its very easy to double check but it does require an understanding of language. LLMs are VERY capable of doing this kind of work, and likely more accurately.


Yes! My position on KGs largely flipped post GPT-3. Before KGs were mostly a niche thing given the cost vs rewards, and now they're an everyone thing.

- The effort needed for KGs has higher general potential value because RAG is useful

- The KG ontology quality-at-scale problem is now solvable by LLMs automating index-time data extraction, ontology design, & integration

This is an area we're actively looking at: Self-determining ontologies for optimizing RAG, such as during evolving events, news, logs, emails, customer conversations. We have active projects across areas here already like for emergency response, cyber security, news mining, etc. If folks are interested here, we're definitely looking for design partners with challenging problems on it. (And, looking to hire a principal cybersecurity researcher/engineer on it, and later in our other areas too!)


Have any good research on this subject?


We've been working on gov/enterprise/etc projects here as part of louie.ai that are coming to a head -- I'd expect not far from what XAI, Perplexity, and others are also doing. However, while those must focus on staying cheap at consumer scale, we focus on enterprise scale & ROI, so get to make different trade-offs, and I'm guessing closer to how Google and other more mature teams do KG. We're not doing traditional KG, however -- it's a needlessly/harmfully lossy discretization -- but are coming from lessons in that world, esp in the large-scale intel/OSINT side and graph neural net community.

A bit more concretely, for the LLM era, we're especially oriented around the move from vanilla RAG chunking to graph RAG, hierarchical RAG, "auto-wiki" style projects, and continuous learning LLMs. Separately, we've been working on neurosymbolic query synthesis for accessing this and mixing in (privacy-aware) continuous learning from teams using it. I think the first public details on this were in my keynote at the 2024 Infosec Jupyterthon, and we'll be sharing more at graphtheplanet.com next week as well. We haven't said as much, but we're also looking at the problem that the data itself isn't to be trusted, e.g., blind men and the elephant reporting different things over time on the news/social media/IT incident tickets.

Right now we're just focusing on building and delivering great tech, customer problem by customer problem. There's a lot to do for a truly good end-to-end analyst/analytics experience!


If this paper overloaded the term, than I did the same in my recent EMNLP paper where I used th term “Semantic Knowledge Graph” to refer to what I think you’re talking about.

https://aclanthology.org/2023.newsum-1.10/

Automatically created knowledge graphs using embeddings is a massively powerful technique and it should start to be exploited.

You can blame peer review for letting the definition of knowledge graph get watered down.

Also note that my coauthor. David, is the author of the first and still the best open source package for creating or working with semantic graphs.

https://github.com/neuml/txtai/blob/master/examples/38_Intro...


"Semantic Knowledge Graph" is even worse! The term is intended for the design of semantic networks with edges restricted to a limited set of relations. A knowledge graph is already about semantics!

Gotta say txtai seems like a useful tool to throw in my toolbox!


I apologize for having been part of the bastardization of a term. I think none of us know what the right term might be for the kinds of graphs we generate.


Are graph databases really relevant during retrieval? Does anyone use them to augment a vector store as a candidate source?


We're using in a large engineering use-case, 100s of millions of objects - and it almost doubled the nDCG@10 score vs pure vector search


This is quite significant. I suppose you did not use using the knowledge graph during training the embeddings?


While I'm 100% on board with RAG using associative memory, I'm not sure you need Neo4J. Associative recall is generally going to be one level deep, and you're doing a top K cut so even if it wasn't the second order associations are probably not going to make the relevance cut. This could be done relationally, and then if you're using pg_vector you could retrieve all your rag contents in one query.


I think there's a lot of cases where you don't want to just RAG it. If you're going for tool assisted, it's pretty neat to have agent write out queries for what it needs against the knowledge graph. There was an article recently about how LLMs are bad at inferring B is A from A is B. You can also do more precise math against it, which is useful for questions even people need to reason out.

I need to dig into what they're doing here more with their approach, but I think using an LLM for both producing and consuming a knowledge graph is pretty nifty, which I wrote up about a year ago here, https://friend.computer/jekyll/update/2023/04/30/wikidata-ll... .

I will say figuring out how to actually add that conversation properly into a large knowledge graph is a bit tricky. ML does seem slightly better at producing an ontology than humans though (look how many times we've had to revise scientific names for creatures or book ordering)


Yes, but this doesn’t seem to be an actual knowledge graph which is part of the issue imho. If you look at the Microsoft knowledge graph paper linked in the repo it looks like they build out a real entity-relationship based knowledge graph rather then storing responses and surface form text directly.


I think it's relatively unlikely that having an agent write graph queries will outperform vector search against graph information outputted into text and then transformed into vectors.

The related issue that I think is being conflated in this thread is that even if your goal was to directly support graph queries, you could accomplish this with a vanilla database much easier than running a specialized graph db


Outperform in what way? There's some distinct things it already does better on like multi-hop and aggregate reasoning than a similarity context window dump. In general, tool-assisted, of which KG querying is one tool, does pretty good on the benchmarks and many of the LLM chats cutting over to it as the default.

> if your goal was to directly support graph queries, you could accomplish this with a vanilla database much easier than running a specialized graph db

Postgres and MySQL do have pretty reasonable graph query extensions/features. If by easier, you mean effort to get up a MVP, I'd agree, but I'm a bit more dubious on the scale up, as you'd probably get something like Facebook and Tao.


My initial thought was "building the knowledge graph is what LLMs and the embedding process does implicitly", why the need for a graphdb like Neo4j?


So your solution would be to fine tune the LLM with new knowledge? How do you make sure it preserves all facts and connections/relations and how can you verify during runtime it actually did, and didn't introduce false memories/connections in the process?


I think RAG has a lot to say here. New content / facts go through the embedding process and are then available for query.

I don't generally disagree that a more discrete (not continuous) knowledge base will be another component to augment ai systems. The harder part is how do you build this? (Curate, clean, ETL, query) Not sure a graphdb is the best first choice. Relational DBs can take you pretty far and it is unclear how many 1+N or multi-hop queries you'll need in a robust ai / agent system


I think you are misunderstanding. An embedding places a piece of knowledge in N dimensional space. By using vector distance search you are already getting conceptually similar results.


Semantic difference doesn’t define relationships between semantically disimilar entities in the same way a structured knowledge graph would let you add a new learned relationship. Similarly you can’t necessarily do entity resolution with prurely emebeddings since you’re again just comparing similarity based on the embedding model you’re using rather then the domain or task you’re accomplishing which could differ a lot depending on how generalized the embedding model is vs what you’re doing.


AFAICT most of the "graph" rag implementations discussions, instead of fancy graph queries & or structured knowledge graph, mean:

1. Primary: Inverted index on keywords (= entities). At ingest time, extract entities and reverse index on them. At query time, extract entities and find those related documents, and include next to the vector results as part of the reranking set, or maybe something fancier like a second search based on those.

2. Secondary: Bidrectionally linked summary. At index time, recursively summarize large documents and embed+link the various nested results. At retrieval time, retrieve whatever directly matches, and maybe go up the hierarchy for more.

3. Secondary: Throw everything into the DB - queries, answers, text, chunks - and link them together. As with the others, the retrieval strategy for getting good results generally doesn't leverage this heterogeneous structure and instead end up being pretty simple & direct, e.g., any KV store.

AFAICT, KV stores are really what's being used here to augment the vector search. Scalable text keyword reverse indexing is historically done more on a KV document store like opensearch/elasticsearch, as it doesn't really stress most of the power of a graph engine. Recursive summaries work fine that way too.

Multihop queries and large graph reasoning are cool but aren't really what these are about. Typed knowledge graphs & even fancier reasoning engines (RDF, ...) even less so. These retrieval tasks are so simple that almost DB can work in theory on them -- SQL, KV, Graph, Log, etc. However, as the size grows, their cost/maintenance/perf etc differences show. We do a lot of graph DB + AI work for our dayjob, so I'm more bullish here on graph long-term, but agreed with others, good to be intellectually honest to make real progress on these.


I will say that sometimes you want a very specific definition of a thing or process to get consistent output. Being able to associatively slurp those up as needed is handy.


Not only that, the embedding process can represent relationships and ontology, especially the subtle aspects that are hard to capture in edges or json

Go back to the original word2vec example, how would you put this in neo4j, in generalization?

king - man + woman = queen


Is it not valuable though to differentiate statistically inferred relationships from those that are user-declared?

I would think that a) these are complementary, and b) the potential inaccuracy of the former is much larger than the latter.


Absolutely, however this project says

> Llamaindex was used to add nodes into the graph store based on documents.

So it sounds like they are generating it based on LLM output rather then user-defined. I also wonder how often you need more than a single hop that graphdbs aim to speed up. In an agent system with self-checking and reranking, you're going to be performing multiple queries anyhow

There is also interesting research around embedding graphs that overlaps with these ideas.


I like your answer and a great example of the limitation of knowledge/semantic graphs. Personally, I'd put a knowledge graph on top of the responses to expose it to LLM as an authority & frame of reference. I think it should be an effective form of protection against hallucination and preventing outright incorrect / harmful outputs in contradiction with the facts known by the graph. At least in my experiments.


Topological relationships vs metric relationships. I suppose a great embedding could handle both, but a graph database might help in the tail, where the quality of the embeddings is weaker?


Doesn't a topological space require a metric?

I agree that something that is more like an SQL query, where you have definitive inclusion, will be useful. The harder question is how do you build something like that? How much AI involvement is there in creating the more discrete relation knowledge base.


I dont know if you need a graphdb in particular but there are likely explicit relationships or entities to resolve to eachother that you’d want to add that aren’t known by a general model about your use case. For example if you are personalizing an assistant maybe you need to represent that “John” in the contacts app is the same as “Jdubs” in Instagram and is this person’s husband.


LLMs have a limited context size, i.e. the chat bot can only recall so much of the conversation. This project is building a knowledge graph of the entire conversation(s), then using that knowledge graph as a RAG database.


Exactly! With memary only relevant information is passed into the finite context window.


Very interesting, thank you for making this available!

At OpenAdapt (https://github.com/OpenAdaptAI/OpenAdapt) we are looking into using pm4py (https://github.com/pm4py) to extract a process graph from a recording of user actions.

I will look into this more closely. In the meantime, could the authors share their perspective on whether Memary could be useful here?


Very cool project! I think one of the main way (a bit orthogonal to what you do now) to adapt to GUI / CLI would be to develop an open-source version of something like Aqua Voice https://withaqua.com/

Perhaps it could make sense to add this to your effort?


Thanks! OpenAdapt already supports audio recording during demonstration (https://github.com/OpenAdaptAI/OpenAdapt/pull/346). Perhaps I misunderstood — can you please clarify your suggestion?


It's a kind of text input which mixes text and edition instructions, look at the demo


What’s the goal of creating a graph from the actions? Do you have any related papers that talk about that? We also capture and learn from actions but haven’t found value in adding structure beyond representing them semantically in a list with the context around them of what happened.


The goal is to have a deterministic representation of a process that can be traversed in order to accomplish a task.

There's a lot of literature around process mining, e.g.:

- https://en.wikipedia.org/wiki/Process_mining

- https://www.sciencedirect.com/science/article/pii/S266596382...

- https://arxiv.org/abs/2404.06035


Awesome, thanks.

Yes we are also on the process mining and RPA space and use image reco + ocr + click tracking for part of it. My (poorly worded probably) question was why do knolwdge graphs matter for you, since traditional rpa doesn’t use them for all the cases I’ve seen at least, unless I misunderstood what you’re saying here about trying out graphs. I’ll read the LLM RPA paper you have linked here too, maybe that explains the use of graphs, and I haven’t read this one so thank you.

We basically used the same technique some of the UIPath folks have used which is representing everything as a sequence of actions, which "branching" being represented by different linear sequences that the model can ingest and make decisions over which sequence to follow, which is kind of a graph i guess but not how we represent it.


> traditional rpa doesn’t use them for all the cases I’ve seen

That's because traditional RPA relies on humans to create the automations.

Our goal is to create the automation automatically by observing human demonstrations.


These new systems would do well to have a compelling “wow, this solves a hard problem that can’t be solved in another straightforward way”.

The current YouTube video has a query about the Dallas Mavericks and it’s not clear how it’s using any of its memory or special machinery to answer the query: https://www.youtube.com/watch?v=GnUU3_xK6bg


If you search about the Mavericks again (not included in the video) the agent will query the knowledge graph for results from prior executions.


I hate when I find a cool AI project and I open the github to read the setup instructions and see "insert OpenAI API key." Nothing will make me loose interest faster.


Unconstructive comment. OpenAI is the golden standard for an llm: if you cared to dig deeper you’d realize that that you really could incorporate another llm with little effort.


Most projects also give you the option of providing an base url for the API so that people can use Azure's endpoints. You can use that config option with LiteLLM or a similar proxy tool to provide an OpenAI compatible interface for other models, whether that's a competitor like Claude or a local model like Llama or Mistral.


Is the expectation for the lib (or project) to work with various vendors or you expect to just pay for tokens


You can easily incorporate llama 3 or other OS models


Looks cool. This is similar to what I'm doing for long-term memory in AISH, but packaged up nicely. Others have pointed out that they're somewhat abusing the term KG. But ... you could imagine other processes poring over the "raw" text chunks and building up a true KG from that.


Sounds promising. Can this system be integrated with the Wikidata knowledge graph instead?


Yes! You can easily swap knowledge graphs under the same agent. Would love to see this happen!


Log4j was so unbelievably slow to load data, bloated, and hard to get working on my corporate managed box that I wasn't too sad when it turned out unable to handle my workload. Then the security team asked me why it was phoning home every 30 seconds. Ugh.

I have since found Kuzu Db, which looks foundationally miles ahead. Plus no jvm. But have not yet given it a shot for rough edges. At the time, it was easier just to stay in plain application code.

Hopefully the workload intended by this tool won't notice the bloat. But it would be nice to be able to dump huge loads of data into this knowledge graph as well, and let the GPT generate queries against it.


How many times are people going to reinvent, rename and resell a database?


This is a really cool project, but is it just me that feels slightly uncomfortable with its name sounding so similar to "mammary"?


Discomfort noted, but I think it can work in either case. Pronounced your way, it’s the proverbial teat of knowledge for LLMs.


Argh, that only makes it worse.


Yeah, they could've gone with "Memury" which is pronounced much closer (at least for me) to the original "Memory".


The a is for agents :)


How does it compare with zep ai ? Anyone knows ?


It's open source :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: