Posts

BERT

BERT is a neural network developed by Jacob Devlin et.al. (Google) back in 2018. It improves performance nerual nets on natural language processing tasks significantly when compared to most other network types, including previous leader - Recurrent Neural Network Architectures. BERT addresses such RNN issues as handling long sequences of text, scalability and parallelism. BERT resolved those by introducing a special type of architecture called Transformers. Transformers apply positional encoding and attention to build outputs. Positional encoding deals with encoding word order information into the data itself. Attention determines relationship between every single word in the input and establish how it relates to each words in the output. This is something that's learned from data by seeing many examples.  BERT stands for: Bidirectional - which means it uses left/right context (i.e. the whole input, not just preceding or following words) when dealing with a word Encoder Representa

Sentiment Analysis with the HuggingFace Transformers

The HuggingFace Transformers is one of the most advanced and easy to use collection of libraries for applying various ML models. Simply pick the most applicable pre-trained model to your domain and get the results right away! If we wanted to carry out text classification, and more specifically sentiment analysis, with HuggingFace it would be a 3 step-process: Pre-process text to generates tokens that the model can work with Feed token IDs into the model to obtain the activations Determine sentiment by converting activations into probabilities using a softmax function and then picking the max value via argmax Here is how it might look like in Python:    

Transaction Isolation Levels

Image
In the database world when the same data is accessed by different users concurrently, ACID properties ensure that we can isolate access to the data. ACID stands for atomicity, consistency, isolation and durability Atomicity guarantees that all operations either complete or fail so that there would be no partially completed operations Consistency ensures that each transaction leaves database content in a usable state so that the next transaction gets to a version of data it can work with. Isolation provides a way for multiple transactions to execute concurrently without being aware of other concurrent transactions, and the last property is Durability, which ensures that the result of successful transactions are persisted even with system failures Atomicity, consistency and durability are taken care of by a database "transparently", but isolation is something we can control with SQL. Let's find out what it means exactly. When two users work with the same d

Fallacies of distributed computing - RICH LAST

I've always loved how enterprise application platforms such as Spring, Java EE, and to some extent, PeopleSoft simplified many aspects of the application development. The key enterprise development focus is building the business logic, and it's great when your platform hides the complexities of dealing with things like concurrency and thread-safety. Spring does that by providing via component scopes and dependency injection, Java EE deals with that through Enterprise Java Beans that can manage concurrency and isolation for the developer among other things. PeopleSoft is trying to accommodate that by bridging WebLogic with Tuxedo (which stands for Transactions for Unix EXtended for Distributed Operations, AT&T technology from the 80s).  As we move into distributed space, beyond a single server or a cluster, we are losing a lot of the convenience of a monolithic enterprise platform. Things are somewhat different, and what was taken for granted needs to be explicitly taken c

PASSME MUsTeR

Image
While there are plenty of non-functional requirements ( take a look at ISO 25010 for example ), 6 of them are the key ones that we need to prioritize when designing solutions. A great mnemonic to remember those is PASSME. I first encountered it at jfdeclerq.biz and it served me very well. PASSME stands for Performance, Availability, Scalability, Security, Maintainability, Extensibility . Prioritization of those NFRs makes sense. Very few users are happy to use slow software, even fewer are happy when it's inaccessible, especially if starts failing as demand grows. We want our data and services secure. Lastly with the initial development costs comprising only 20-40% of the total cost of the software over its life-cycle, it is critical to provide well-written and extensible solution. PASSME can be extended another acronym MUsTeR: Manageability, Usability, Testability and Reliability. Collectively these ten NFRs should be good enough to cover most of the use cases, but usually the

Building an ML pipeline with ElasticSearch - Part 2

Image
In part one of this tutorial, we were able to successfully push course information such as ID, title and description from PeopleSoft into ElasticSearch and then retrieve it with a Jupyter notebook. In this part, let's examine how we can process unstructured course information to extract keywords. This can be used in a variety of use cases. For example, we can build a simple word cloud, or perhaps, use it as a start for creating digital badges for a blockchain solution.

Building an ML pipeline with ElasticSearch - Part 1

Image
ElasticSearch (ES) makes it easier than ever to explore PeopleSoft data. Prior to ES, the two ways to access the data were integration or direct SQL access. Both are came with some downsides. Integration is always hard - it requires development of an App Engine or Integration Broker service. SQL access imposes security risks as it requires direct connection to the underlying database. ElasticSearch is a great solution for this problem.

Global state management with Redux

Image
Redux is a great library for global state management. While some might say that global state is a bad thing, it is not entirely possible to move away from the global data, especially items such as authentication and authorization information, site settings and preferences, and so on.

React.js + PeopleSoft = Love?

Image
After taking a look at Kibana dashboards , I was curious how PeopleSoft managed to pull them back into the portal. The expectations were high: some kind of a fancy built-in function or Integration Broker wizardry, but it turned out to be plain old iframes. This is a great solution to marry a modern front-end framework and PeopleSoft. Can it be used for building a simple React + PS proof-of-concept (PoC)?

Stitching PeopleSoft and SharePoint

Image
We often need to integrate legacy solutions with more modern counterparts. This gets particularly tricky for systems like PeopleSoft as they live in their own tech bubbles, which are usually decades old despite the attempts of the vendor's to maintain and make them current. This integration is not the easiest journey.