[scroll down for transcript]
What is the evolution of data information and how prepared are we for this? Big Data and AI technologies have already established their presence in almost every industry; the circumstances have now matured to bring forward Data Integrity. But how will this be achieved?
Engine B’s solution is to bring a universal open source platform based on Knowledge Graphs that make available background data knowledge, flexible search, uncover hidden data and risk management purposes. Let’s identify the necessity of Knowledge Graphs through user cases and its great potential in business application and let’s engage in an insightful discussion.
Innovation and Digitalisation is the successor of the Information/Data Era. Big Data & AI are the means to an end not the end goal. But what is the end goal? Going beyond Big Data, we need to integrate Quality data driven insights that will allow substantial improvement on service provision efficacy.
We need to uncover the knowledge hidden in our data. How can we achieve this? By creating standard and universally acceptable Data Models with the aim to produce Knowledge Graphs, a set of interlinked descriptions of entities.
This is where Engine B’s project comes in to fill this rather unmapped field, aiming to bring this concept to life as an open source standardised platform. This will allow businesses and services of all sizes across domains to have access and integrate this platform in their system based on their custom needs.
Engine B’s partners and backers include all four large accountancy firms, plus several of the next tier and challenger firms. Plus the accountants’ UK trade body, the Institute of Chartered Accountants in England and Wales (ICAEW). Engine B is also working with a number of leading law firms.
Why Knowledge Graphs? The main contribution of Knowledge Graphs is:
Provide background knowledge: A benefit already in use by many pioneer tech companies such as Google, that enables the user to retrieve sum up information about the data entity in search.
Flexible search: Relations are pre-existing between entities; therefore, it is easier and quicker for the analyst/user to retrieve any requested information, considering the staggering amount of time spent on retrieving client data before.
Uncover hidden data: As above, relations between data entities pre-exist and bearing in mind the query built by the analyst is not designed to connect specific data to retrieve information, unexpected correlations between data entities can be retrieved.
Risk Management Purposes: Drawing connections between unexpected events or information that would not be connected otherwise allows for the quantification of risk exposure within a complex network.
In this presentation we will show Engine B plans to use Knowledge Graphs in order to establish an open data standard, why this is a necessity, and how Knowledge Graphs promote Data Quality, Data Integrity and information exchange. We will also engage in an open discussion to exchange concerns, feedback and ideas.
[00:00:00.450] - George Anadiotis Hello, everyone, and welcome to our third talk in today's Connected Data London Online Meetup, which is about enterprise Knowledge Graphs and how they can assist in the drive towards a knowledge driven economy. Last talk for the day. Last but not least, is Symeon Vasileiadis. who is a Knowledge Engineer with Engine B, and he's going to talk about how Knowledge Graph can help with data integrity, innovation and digitalisation in professional services. So without further ado, Symeon the floor is yours.
[00:00:36.720] - Symeon Vasileiadis Hi all, first of all, I want to congratulate both Tony and Jans for their presentations. They were real good. I mean, Tony, those drawings; well, they were great. So I hope you guys enjoy the conference as much as I do. And thank you for joining me today. My name is Symeon Sotiris Vasileiadis, and I have an academic background on computer science and data analytics. I have also specifically conducting research on social media graphs.
[00:01:04.510] - Symeon Vasileiadis I have worked in positions which cover different phases of the beautiful journey of the data transformation, such as management information analyst, business intelligence analyst, data scientist, and at the moment I work as a knowledge engineer in Engine B. The reason why this career path chose me is mainly because I strongly believe that everything around us is data that's converted to knowledge through our brain. The goal and my target for this presentation is to share with you my insights about Knowledge Graph and why Knowledge Graph are the best way to achieve efficiency for better decision making and why digitalisation in professional services is so essential.
[00:01:50.480] - Symeon Vasileiadis Today, we'll explore the following: why we use artificial intelligence instead of expert systems and compute with words. Do present machine learning models pass the Turing Test. Why do we need data integrity and how Knowledge Graph will help us on this? We'll, then talk about adding based approach and vision. I will go through real life Knowledge Graph study cases. Finally will conclude with interactive Q&A. Very often companies are asking why they should change the successful expert systems and compute with words to artificial intelligence. To understand why they should, we need to understand why they adopt such an approach for the very first place.
[00:02:36.970] - Symeon Vasileiadis Artificial intelligence was always the ultimate target for computer science founding father Alan Turing described this as the ability to achieve a human level performance in a cognitive task. This vision has been involved in three phases, as briefly you can see on this slide and as we will further explore in the following. So in phase one 40s and 50s, artificial intelligence, theoretical background was established. The term artificial intelligence was created in a summer workshop by IBM and the 60s came along and artificial intelligence technologies, technologies and algorithms were invented.
[00:03:23.070] - Symeon Vasileiadis However, these extremely pioneer ideas could not be implemented due to the limited technological capabilities. As a result, by the time of the 70s, 80s and 90s, most of the artificial intelligence funding was cancelled. However, expert systems and compute with words developed in previous decades to implement artificial intelligence was now adopted as the next best solution. So from this, I want you to keep that always the target was artificial intelligence from the very beginning of computer science, and we only end up using expert systems and compute with words because of the limited technological capabilities.
[00:04:11.870] - Symeon Vasileiadis So now we're moving to phase two by the arrival of millennium, artificial terms and limitations of the past had been decreased by cheap memory storage and advanced CPUs. Starting by the appearance of big data, of course, data volume are now growing at the average 63 percent per month, with 12 percent of organizations report over 100 percent data growth every month. The term big data was launched in 2005 by O'REILLY. In terms of CPU Intel and AMD developed the first dual and quad pore processors.
[00:04:49.240] - Symeon Vasileiadis These technological advancements allow us to bring back to force, the abandoned artificial intelligence theories and projects of the past. All this brings us to the present day, as the data quality focus rather than a quantity focus era. In a high degree this is a result of a large demand towards data migration to cloud services providing faster and quality results. However, as cloud computing as a service is used on demand, users have to be selective and cautious with the data they load to avoid unnecessary high charges.
[00:05:36.840] - Symeon Vasileiadis Thus creating the necessity of the quality of the market. All the above steps created a new business programming approach in which programs and scripts are retrieving information already existing in a server rather than traditional approach, programming approach, which is the data feed the program targeted to create information. For example, instead of have an Excel sheet loaded with data on a regular basis, and after from this creating some dashboards, now we have a query language scripts, SQL or CQL , extracting information from a server, local or cloud to retrieve information or even to explore new aspects for the business.
[00:06:21.950] - Symeon Vasileiadis This programming approach has been named as data centric programming that is already being used by businesses that lead to market. Now that we are in the phase three, we should ask ourselves, has artificial intelligence been achieved? If so, how can we know that? to validate this assumption we shall the Turing test. For example, when using a chatbot, neural language processing, NLP, and classification methods are used to achieve hi-pass score on Turing's Imitation Game. However, what if a question on this chatbot has never come up before. Then in most of the cases it is marked as an outlier and the predefined answer is presented by the chatbot.
[00:07:11.420] - Symeon Vasileiadis From this limitation, we can still distinguish machines from human agents. So what if chatbots use Knowledge Graphs. Then the question, this outlier, is given the potential to still be linked to the relevant answer, or at least something that the computer thinks is relevant. And I stress the word think, as a human brain because essentially in the end of the day, thinking is essentially the connection of different concepts. So what do what do we need to make computer capable of thinking and connecting concept?
[00:07:53.840] - Symeon Vasileiadis We need a single version of the truth across all industries. That is digitilisation. To simplify this, we could possibly use the analogy of data as a common currency. Drawing knowledge from economic studies, countries that belong in a common currency union are eligible to enjoy their particular union's benefit such as share of trade and capital mobility within this countries. If another another country decides to enjoy these benefits, can only do so by joining this union and adopt this common currency.
[00:08:32.370] Similarly, the formation of a common data group is essential to harness the benefit of artificial intelligence. Therefore, all companies should aim to be digitalized as well as adopt the defined data standards. I know this statement may not sound unconditional, but it is the reality. Imagine, for example, in a period when the currency is changing and someone end up with a chest full of old coins. They just are useless. So we now understand why we need a common data ground in a professional service.
[00:09:14.230] But why do we need a Knowledge Graph? We can understand this through a quick example. Could anyone tell me how number zero is represented in the Latin numerical system? Actually I can't see the chat at the moment? but Don't worry if you can't, it's okay because they value zero does not exist in the Latin numerical system. And this is because of the idea of singularity. In other words, nothing does not exist. Everything is something, even nothing, and everything is connected.
[00:09:50.390] And that's exactly the idea behind Knowledge Graph every data entity is connected, is connected somehow with the other entities creating a data continuum. A Knowledge Graph is the most capable tool to collect information of these connections by using a unique way to represent data relationships. For instance, if we are to represent Facebook's social network in a Knowledge Graph, ninety nine point six percent of all pairs of users are connected by paths of six degrees. This is known as the six handshakes rule and I find fascinating.
[00:10:26.310] So why is this helpful in a common though? Knowledge Graphs have many benefits. Among them, the most important are extracting background knowledge, conduct flexible search, uncover hidden data-patterns and run risk management projects. Engine B's role in all this is to provide a common data ground using universally accepted command data models and artificial intelligence through machine learning and Knowledge Graphs. In Engine B, we also know well that 30 to 40 percent of time in a request coming from a professional service is spending in data manipulation and data cleansing.
[00:11:12.180] This is why we envision one set of data models, open source and available to all in which we capture client data into an intelligent government data access platform and perform Knowledge Graphs that show multiple data sources relate to each other and what they what they mean rather what they are. After all Audit experts across the biggest companies on the field sit together and talk about the necessity of building a common data model, Engine B managed to build one go a step further by building a Knowledge Graph.
[00:11:56.690] Between the expert knowledge and the end product state of the art machine, learning techniques is used by Engine B's data engineers and data scientists, in addition, are partners with Microsoft, is helping us to use Azure cloud computing on its maximum. But now let's have a quick look of how this computer model looks and the Knowledge Graph. So now I present you. The Common Data Model, fascinating, right? I know it looks like an Excel sheet , but it's much more than this.
[00:12:36.610] So when you see this Excel, I want you to imagine, I want you to understand that behind it, they are the best experts of the biggest audit companies in the world, sitting together in a room, talk about a common data model. That's an achievement by itself. Another thing that I want you to keep for now in your mind from this file is the levels that this Common Data Model is split , so we have Engine B's standards entities that belong to a Standard and attribute that belong, attributes that belong to entities .From this Common Data Model.
[00:13:17.070] We create an entity relationship diagram in this diagram, which is a high level diagram, that we can see how the entities are connected to each other. From these two files and with some Cypher in between. We have our Knowledge Graph. In here. We can see all entities loaded in the knowledge database in regards to audit, so, for example, if we take this and we can see that it's displayed in three categories, in three tags, as we see, on is the Standard, and they are the same as the CDM, it's the standard, the Entity and the Records.
[00:14:14.480] Every record belongs to an entity and every entity belongs to a standard. We can see a relationship between entities and other records that belong to this Entity. From here, we can the further research or create some other queries to retrieve the information we need, for example, we can just pick all the Standard. Or just the Entities? Or even just the records. Or, for example, we can pick a particular an entity named Fixed Assets.
[00:15:08.810] And again, from this point, you can see the components of this Entity and its standard and after the records in the entity that belong to this Standard. Back to our presentation now. Built a Knowledge Graph is not in of task. This is why in Engine B we plan to build a series of services oriented and concept oriented Knowledge Graphs focusing in auditing, property, legal, tax and fraud. We work with property experts to build a CDM at the moment for real estate and of course, a Knowledge Graph for this field too.
[00:16:13.800] Some other business focused examples to represent Knowledge Graph's importance could be humour detection in social media sentiment analysis. NLP machine learning algorithms break the context of a comment in a bag of words characterize each word with a positive or negative sign based on the lexicon they load, count the frequency of the words and create an overall positive or negative score. The problem with this approach is that some words can be used as sarcasm and we can easily end up with a positive overall score for a negative meaning comment and the other way around.
[00:16:55.570] For example, in the comment: "this three hour long movie was so good I definitely recommend it to my neighbour with the barking dogs". For us, it's easy to understand that this comment is sarcastic. However, a machine learning algorithm will pick the quote, good, and will sign this comment, the overall score for this comment as positive in some cases, Knowledge Graphs can be used to detect sarcasm by connecting such comments with the replies, the other replies, under the comment , or even with rating or reviews from the particular user about the movie.
[00:17:35.280] So, for example, if I have a positive comment and my rating is low and also my review is negative, then obviously my comment is sarcastic. Another example is when the word frequency is used for risk detection purposes. For example, it is fine when building information appears repeatedly into the building's manager email account, but it's suspicious when it appears even just two times into a customer service representative e-mail account. Artificial intelligence without Knowledge Graph to make data connection is not artificial intelligence.
[00:18:18.760] In the book called "The Man Who Mistook His Wife for a Hat" the author, describes a patient with a rare neurological condition in which his brain could not connect the collect data from his senses to make conclusions. In one occasion, the patient was holding a glove and the doctor asked him what are you holding? The patient was able to provide the glove's characteristics, saying that what he's holding is a continuous surface, infolded on itself with a five protrusions, but was unable to compete in a conclusion that he is holding a glove. This is a good example
[00:19:00.940] present us how important data connections are extract knowledge. Thank you so much. That's my references. OK, well, thanks. Thanks a lot. It a white you know, you're right on time, but still you managed to squeeze quite, quite a few things in in your in the time you have to about. That's good. That's that's always. And that's even. So we have a few questions and actually some of them I wanted to ask myself to be honest.
[00:19:36.270] So ask some of the things that you mentioned specifically towards the end of your talk had to do with natural language processing and how you can go. You can extract information through that the to incorporate in your Knowledge Graph. So I will have a question from Jacomo who says that results from natural language processing are often probabilistic so you can discard it. He says this kind of data about the truthfulness of associations and because it's not always valid in many contexts and I think you you referred to that when you mentioned the example of irony and how that this is something we can interpret as humans, but it's not always interpretable by natural language processing.
[00:20:23.070] So could you maybe expand a little bit of that? And, yes, something you have encountered. So have you found lots of ways to counter that?
[00:20:31.650] So, yes, the idea of using Knowledge Graph in sentiment analysis is to connect more sources for a safer conclusion. So, for example, if we count only the frequency of a word positive or negative, it's quite easy to end up in a sarcastic comment and then characterize it as possibly even if it's negative. But if we connect and then add in this procedure also Knowledge Graph, we will have more resources to understand better if if the comment is sarcastic or if it's negative or positive.
[00:21:09.360] The example of a movie and how we rating. So if we have data about how when a user rate a movie and the rating is low and after we have a comment which is characterized as positive, that rings a bell, that's something something is happening probably that's a sarcastic comment because otherwise what's the point of someone and the low and low rate and in a movie that he thinks it's good? Mm hmm.
[00:21:40.780] OK, we have a somewhat related question. Actually, it has to do with the Knowledge Graph reconciliation and possibly from from different sources and data that can show what an expert body that now lets in in a Knowledge Graph. Isn't that also, in effect, probabilistic?
[00:22:02.600] And so when they acknowledge in the Knowledge Graph sorry, sorry about this, this is the way the question has been formulated so that when Expo's embodies the the there their knowledge in in it in a formalism, isn't that in effect, populistic so. Well, empathy. Well, this is one way to say that I would say formalize maybe, but the question remains.
[00:22:28.610] So you may have one expert that I see down and you may ask a number of experts. So you may have been contradicting views. And I think you also thoughts on that briefly when you when you saw that Excel sheet with textually.
[00:22:42.760] So how we follow that up so far, I think, and the way that we approach as a company is by putting as expert knowledge and communicate this knowledge and after build the Knowledge Graph. OK, I'm wondering if Shamus the CEO of our company and wants to do and something that's just not consistent.
[00:23:06.900] Yeah. And I so I just want to say that, I mean, there's some truth in what he's saying in terms of if you if you effectively crowdsource a data model and the relationship embroils between data elements that build out a Knowledge Graph that the the way we manage that was we did it individually, but also as a one group or across all the firms or the audit firms. We had nine audit firms, troyen during that period working with us, including all the large ones around the world.
[00:23:38.950] So so some of that is agreed amongst the firms. But there's also an implication there. You know, as you get some more higher order pieces in terms of the way the rules work, that it becomes problematic because actually you're going to make an assumption about connections across the graph. Right. So I think those two things are different. So so, you know, the concept of, I don't know, the headquarters for Unilever is on Victorian management.
[00:24:10.320] You're connecting the CEO to the Unilever. So you that made the assumption that the CEO is in the headquarters in or in Victoria. That is assumption. That's probabilistic. It's not for sure. So I think you've got to try and separate those two things out, if that makes sense. But you can identify when you're doing that so you can make it clear when you respond to the answer of the Knowledge Graph.
[00:24:40.480] Similar question in the in a previous talk as well, so Knowledge Graph do give you the the technical infrastructure to to integrate knowledge, but that doesn't necessarily mean that, you know, your knowledge is going to be in agreement. Your your notes are going to be in agreement with each other. So this is where the tricky part is actually.
[00:25:00.730] Yes. The human component is, actually my answer in the question if artificial intelligence will take over the world it's always the human component it's essential. All of these technologies are built based on what humans want to achieve.
[00:25:15.760] So, yes, I think that's just going to add to that that that I think that's exactly right. That what I would say, though, is if you just look at professional services, you know, as Symeon said, 40 percent of time spent in audit tax on legal work is spent sorting out the data so we can sort out the data connections, et cetera. That's going to reduce down a lot of the effort. But then if you can start using the Knowledge Graph to kind of extrapolate out knowledge that that's even better.
[00:25:51.130] So, you know, we're still expecting humans to be involved. But let's look at what humans are doing now and how many mistakes they make are the cost of those mistakes. So, you know, we all know of let's just take it as an example, right? We all know of massive failures in companies which should have been started or could have been spotted by audit. And actually, if you if you think about what we're doing with our data models or Knowledge Graph, my position is that actually things like wire card or patisserie, Valerie, et cetera, et cetera, would never have happened.
[00:26:26.380] Right. We spotted that earlier on. So do we still need a Kubernetes? Yes, it does. Bringing, formalising and normalizing data and putting into Knowledge Graph and going across structured, unstructured sources reduced down the failures. It absolutely does. So we can produce a better quality result. The human is there to orchestrate rather than implement the technologies.
[00:26:51.460] I totally agree. I said, the question actually, which is so you mentioned I saw actually the outline of your talk, that part of what you want to do has an open source flavour to it. So I guess that means that you maybe your aim is to to come up with models for professional services that you want to be adopted by open sourcing. So my question is, how how far down that road are you so far? At which point I should think about that.
[00:27:23.280] So some of the data models themselves are effectively a superset of data model, which already exist.
[00:27:32.720] Right. So if anybody works in finance, they'll know that there are certain kind of kind of godfathers of data if that make sense. So there's a guy called Eric Cohen who does a lot of data models in finance. And so we use a lot of the work that he's done in the past and lots of other people have brought them together. So so some of those were built from open source data models being brought together, ratified, cleaned up the gas field, as it were.
[00:28:00.920] So the base data model will be open sourced on our GitHub in the next couple of weeks, but it will also be published by Microsoft as part of that data model, which means that if you happen to be using, you don't have to use Azura. But if you happen to be using Microsoft an easier than our data, but it'll be plug and play into the power platform, et cetera. So and then then will the in terms of the way we build our Knowledge Graph, some of that will be visible and some won't be visible because we've got to try and protect some of the IP which has come for us from the firms.
[00:28:38.030] But we're going to try and open source as much as we can because that's part of our our mission, if that were as it were. I can show you actually already answered something, which was going to be my next question. So what's what's your plan to drive adoption? But it sounds like you have some kind of partnership with Microsoft based on what we say.
[00:28:58.110] We do have a partnership with Microsoft. And we're also I mean, we're working with, you know, Imperial College by university. Not at all. And we're working with Microsoft, working with nine or if I was originally now 13 audit firms were perhaps by the ISO, etc. and all of those all of those routes would just be our platform. All those or not the academics, but everybody else. We just speaking a platform so you can imagine a big auditor in every company with more than, say, ten million dollars has to be audited around the world.
[00:29:34.080] And the more audit firms who decide to use our Knowledge Graph means that they'll be putting our data platform into the and audit client. And then the platform can be used by the client or the auditor or other people for doing services. So heads, we're doing legal Knowledge Graph as well, and we will do some stuff around insurance, property, et cetera. And that means that corporate, big and small, can access the Knowledge Graph, but they can also allow people external to their organization to access those Knowledge Graph and they will therefore be able to provide digital services like, I don't know, classic one being changing all the contracts from legal contracts to some other form of measuring inflation, for example.
[00:30:24.300] So great start ups in Greece or Germany or wherever I can go. I want to provide a B2B I service doing the following thing. They don't have to worry about trying to find the data in a client. That data can be provided to them.
[00:30:42.390] So yeah, I think we have another question, which I guess is related to that. So the question is from John and he asks whether standards should be part of your Knowledge Graph. And he mentioned specifically GAAP, which I have to admit, I don't know what it is, but it probably means something to it, sadly.
[00:31:02.820] So. So the data platform that we've got, there are certain things you've got to do to make sure that the data you pull out, the systems, both structured, unstructured comments, we normalize the data, what we create lineage from where the source systems are. And by normalizing, we're kind of basically cleaning up the data. But we have a visibility of what's going on in the data. Then we go through various data quality checks and then you can start putting in certain rules, right.
[00:31:33.870] So you could do the gap stuff, et cetera, in there. So we're actually going to do make that flexible by service. So if it's an audit firm, if one of the other firms like actually I want to put the following logical rules to the Knowledge Graph integrity for the following audit rules, that we could do it in the platform. But if they want to do it above the platform, so that sets in the way that sits could so as a part of the platform, they could do that.
[00:32:05.280] But what we've got to do is we've got we've got to pull the data out, make sure that it's accurate, complete. There's a certain list of five things you've got to get right in professional services in terms of the quality, accuracy and completeness of the data. So we'll get those bits done. We can do the logical rules if required. And those logic rules would vary depending on where abouts you are in the world on standards are following, or you could do it by the platform when you're doing the service itself.
[00:32:31.740] So a mixture of both is the answer. But we are working with, you know, the AICPA in the U.S., the ICAO in the UK. We're also contributing to some of the new open standards around data access, which is being defined in the US between XBRL and AICPA. So we're kind of in all of that, if that makes sense. Well, it sounds like you you have your your your fingers in many pots, and let's let's wrap up with one last question.
[00:33:07.120] And this one, I guess, is for Symeon probably we have a question asking, why did you choose Neo4j? What was the rationale?
[00:33:16.270] Actually we investigated a couple of tools and as most of the tools it's about what you want to achieve at the moment and what is and suits you best based on what you try to build. So at this particular Audit Knowledge Graph Neo4j was more, capable let's say, to to build what we want. But however, it's not we are not restricted or we are not. We are open to use all the available technologies.
[00:33:48.130] And and it's fair to say that we also looked at we looked at we we did a proper formal to you about the ten top 10 and scoring purposes. Right. And what we wanted out of them, etc.. As Symeon said, we're not committed to one or the other. We we did way like Gracken as well, actually, but yeah, for various reasons.
[00:34:15.630] We have another Knowledge Graph for property built in Graken. So it's we're not committing. However, for that purpose, for the Audit Knowledge Graph, it fits better more of the visualization and the graphics. So yeah. OK, thanks. It's been an interesting presentation and thank you for answering quite a few questions, actually. Thanks. Yeah, and I think there's actually a few more, so we'll get those to you offline and hopefully we can get the answers to those as well.
[00:34:53.370] So thank you for presenting thanks for everyone who was here with us this joining and asking questions and hopefully getting something out of this presentation. Thanks to our sponsors, Franz and Enterprise Web. So that wraps up this meetup. We'll see you at the next one. And we'll keep an eye out for the recordings and the podcast and answers to the questions and all the all the follow up to the thanks and see you soon, bye everyone.