ALL ARTICLES FOR Artificial intelligence

Some say artificial intelligence (AI) will be the next big thing after the internet: a tool enabling new industries and improving the lives of ordinary people. Others think AI is the greatest threat to society as we know it. This article will try to explain why both parties are correct.



As humans, we have a visual system that allows us to see (extract and understand) shapes, colours and contours. So why do we see every image as a different image? How do we know, for example, that a box in an image is, in reality, a box? And how do we know what a plane is or what a bird is?


We all have that one uncle in the family who knows the answer to every question. Even if that answer is often wrong. Let’s call him Uncle Bob. From politics to science, he confidently shares his opinion on every possible topic. Did you know that we only use 10% of our brain? Or that chimpanzees use sign language to communicate? Talking to Uncle Bob often turns into a game of “fact or fiction” – trying to guess whether he is actually right or just making stuff up. I don’t know about you, but for me assuming the latter usually feels like the safest bet.



A practical use for object detection based on Convolutional Neural Networks is in devices which can support people with impaired vision. An embedded device which runs object-detection models can make everyday life easier for users with such a disability, for example by detecting any nearby obstructions.


Embedded Technology Enablers


However, so far we have only seen a limited use of embedded devices or “wearable” devices to deploy AI in order to support users directly. This is largely due to the resource limitations of  embedded systems, the most significant of which are computing power and energy consumption.


Steady progress continues to be made in embedded device technology, especially in the most important element, which is miniaturisation. The current state-of-the-art is a three- nanometer process for MOSFET (metal oxide semiconductor field effect transistor) devices. Such smaller devices allow for shorter signal propagation time, and therefore higher clock frequencies. The development of multi-core devices allows concurrent processing, which means that applications can run more quickly. The energy efficiency of devices has increased and substantial improvements have been made in the energy density of modern Li-Ion and Li-Polymer batteries. Combining all these factors together makes it now feasible to run computationally intensive tasks, such as machine learning model inference on modern embedded hardware.


As a result, AI-based embedded technology is now widely used to process, predict and visualise medical data in real time. An increasing number of devices have been FDA-approved. However, many more applications are not on the FDA regulatory pathway, including AI applications that aid operational efficiency or provide patients with some form of support. Several thousands of  such devices are in use today.


Support for the Visually Impaired


Digica has developed an AI-based object-detection system which runs on a portable embedded device and is intended to assist the blind and partially sighted. The embedded device is integrated with a depth-reading camera which is mounted on the user’s body.

The system detects obstacles using the depth camera and relays information to the user by a haptic (vibration) controller and a Bluetooth earpiece. For the initial prototype, we selected a Raspberry Pi 4 as the chosen embedded device.

The application passes each captured frame from the camera to a segmenter and object detectors. The initial segmentation stage recognises large, static surfaces such as roads or sidewalks.


Example of detected segmented output


Note that the segmented output shown above is not displayed by the application because no display is connected to the output device.

The subsequent detector stage is used for detecting dynamic, moving objects, such as vehicles and people. A crosswalk detector is implemented as the final stage in the pipeline. All detected items are prioritised based on proximity and potential hazard before being sent to the user.


Example of localised detection output


The segmentation and detection stages operate on RGB video data. Distance  information is also provided by the stereo-depth camera. This information is used to alert the user to the proximity of detected objects by relaying information via an earpiece and through haptic feedback. To simplify presentation to the user, detected objects are identified as being on the left, on the right or straight ahead.

Detected objects are prioritised according to proximity and danger to the user. For each prioritised detection a complete set of information is presented to the user. This set of information refers to the classified object (for example, a car), the object’s location relative to the camera and the distance to the object.


Example of distance information for a prioritised object


The system uses Tensorflow and ONNX models for object detection. The target hardware  is an ARM 64-based Raspberry Pi, which means that the Arm NN SDK can be used to accelerate the development of AI features.


Significant advances in embedded technology have made it realistic to introduce Edge AI applications, such as the one described above. The technology is small, cheap and powerful enough to justify using it in mainstream development.

At Digica, our embedded software team works together with our team of our AI experts to make such developments a reality.





Increasing yields has been a key goal for farmers since the dawn of agriculture. People have continually looked for ways to maximise food production from the land available to them. Until recently, land management techniques such as the use of fertilisers have been the primary tool for achieving this.


Challenges for Farmers


Whilst these techniques give a much improved chance of an increased yield, problems beyond the control of farmers have an enormous impact:


  1. Parasites - “rogue” plants growing amongst the crops may hinder growth; animals may destroy mature plants
  2. Weather - drought will prevent crops from flourishing, whilst heavy rain or prolonged periods of cold can be devastating for an entire season
  3. Human error - ramblers may trample on crops inadvertently, or farm workers may make mistakes
  4. Chance - sometimes it’s just the luck of the draw!


AI techniques can be used to reduce the element of randomness in farming. Identification of crop condition and the classification of likely causes of poor plant condition would allow remedial action to be taken earlier in the life cycle. This can also help prevent similar circumstances arising the following season.


Computer Vision to the Rescue


Computer vision is the most appropriate candidate technology for such systems. Images or video streams taken from fields could be fed into computer vision pipelines in order to detect features of interest.




A key issue in the development of computer vision systems is the availability of data; a potentially large number of images are required to train models. Ideal image datasets are often not available for public use; this is certainly the case in an agricultural context. Nor is the acquisition of such data a trivial exercise. Sample data is required over the entire life cycle of the plants - it takes many months for the plants to grow, and given the potential variation in environmental conditions, it could take years to gather a suitable dataset.


How Synthetic Data Can Help


The use of synthetic data offers a solution to this problem. The replication of nature synthetically poses a significant problem: the element of randomness. No two plants develop in the same way. The speed of growth, age, number and dimensions of plant features, and external factors such as sunlight, wind and precipitation all have an impact on the plant’s appearance.


Plant development can be modelled by the creation of L-systems for specific plants. These mathematical models can be implemented in tools such as Houdini. The Digica team used this approach to create randomised models of wheat plants.




The L-system we developed allowed many aspects of the wheat plants to be randomised, including height, stem segment length and leaf location and orientation. The effects of gravity were applied randomly and different textures were applied to modify plant colouration. The Houdini environment is scriptable using Python; this allows us to easily generate a very large number of synthetic wheat plants to allow the modelling of entire fields.


The synthetic data is now suitable for training computer vision models for the detection of healthy wheat, enabling applications such as:


  • filtering wheat from other plants
  • identifying damaged wheat
  • locating stunted and unhealthy wheat
  • calculation of biomass
  • assessing maturity of wheat


With the planet’s food needs projected to grow by 50% by 2050, radical solutions are required. AI systems will provide a solution to many of these problems; the use of synthetic data is fundamental to successful deployments.


Digica’s team includes experts in the generation and use of synthetic data; we have worked with it in a variety of applications since our inception 5 years ago. We never imagined that it could be used in such complex, rich environments as agriculture. It seems that there are no limits for the use of synthetic data in the Machine Learning process! 



The semiconductor industry stands as a driving force behind technological advancements, powering the devices that have become integral to modern life. As the demand for faster, smaller and more energy-efficient chips continues to grow, the industry faces new challenges in scaling down traditional manufacturing processes.

Application of Computer Vision in the Industrial Sector


Inventory management is a key process for all industrial companies, but the inventory process is both time-consuming and error-prone. Mistakes can be very costly, and it is highly undesirable to store more raw materials or fully completed and ready-to-ship products than are required at any given time. On the other hand, any shortfall in elements that make up a product may leave customer orders unfulfilled on time.  In a warehouse which stores, for example, 10,000,000 items with an average value of $10, the loss of 0.1% of these items represents a cost of $100,000. The per annum cost of such a loss may run into millions of dollars. An automated object-counting system based on computer vision (CV) could speed up the process, reduce errors and lower costs.


Why is Inventory Management so complex?


There are many complexities to the art of inventory management, including the following factors:

  • Range - the variety of stock keeping units (SKUs) to be tracked
  • Accessibility - objects may be placed on high shelves in warehouses, out of reach and perhaps out of direct sight of workers
  • Human error - objects may be miscounted or misrecorded in tracking systems
  • Time management - taking an inventory of SKUs at the optimal frequency


These problems can be solved using an automated object counting system which is based on CV.  For such a system to be genuinely useful, it must display a high degree of accuracy. An appropriately designed and trained CV application can then significantly reduce the possibility of mistakes and the time taken to execute the process.


An Automated Object Counting System


Digica developed an object counting system based on CV that is both highly accurate and easily customisable. For example, the system is able to detect, classify and count objects by class when they are located on a pallet. The initial system was designed to count crates of bottles.


Example of detected crates when stacked on a pallet


A practical system deployed in a warehouse must be able to cope with a range of inconsistencies in the incoming data. It is unlikely that pallets are always placed in exactly the same locations or are always oriented in the same way. In the example above, all of the crates are detected in spite of the fact that the visible regions of the crates are not consistent. Crates are also recognised from both front and side views.


This system is clearly well suited for use with the CCTV systems which are typically installed in warehouse environments. However, the technology could be adapted to run on automated vehicles or drones, which are devices that often run an embedded operating system that is capable of running Machine Learning (ML) applications. This could lead to a fully automated inventory process in which humans are responsible only for controlling the work of the machines.


Note that this system does not need SKU-specific barcodes or QR codes, which simplifies the deployment of the system in existing warehouses. Therefore, existing processes do not require any modification, and it is not necessary to place objects so that any existing barcode is kept visible.


A Customisable System


This computer vision system is highly customisable. At its core is a pre-trained neural network which can be readily retrained to support a specific target environment. The possibilities are almost limitless! The system could  be used for purposes such as:

  • Detecting and counting small objects, such as screws or nails on a conveyor belt
  • Detecting boxes on pallets during packing for the purposes of quality control prior to shipping
  • Aggregating information about certain objects in large physical areas, such as shipping ports for example, by carrying out an inventory on shipping containers


Integration with a wider range of systems is also possible. As the system provides real-time inventory data, it is possible to automatically make orders for resources for which stocks are running low. Integration with other ML systems could allow predictive ordering to optimise prices. Sensor-fusion techniques can also be easily applied, by combining a CCTV signal with IR cameras for certain objects that present variable temperature spectra. Such a system makes it possible to monitor objects, such as batteries, which are at risk of overheating.


This system was trained using a combination of publicly-available and self-generated data. Whilst this works well in a demonstration environment, training on target environment data will give a higher level of accuracy. Such target environment data may not be available, but the problem lends itself well to the use of synthetic data for training purposes. Furthermore, such data can be easily integrated into the training pipeline.


The Digica team has completed a large range of projects which make use of computer vision.  With the advent of Industry 4.0, the time has come to give to industries that rely on Inventory Management the technology upgrade that they need to stay competitive!



As smart home technology continues to evolve, our homes are becoming equipped with an increasing number of sensors, each capable of generating valuable data. From security systems and motion detectors to monitoring electricity and water usage, these sensors provide a wealth of information that can be integrated and analysed to enhance our daily lives. In this article, we'll explore the different types of sensors commonly found in smart homes and discuss the potential benefits of leveraging the data that they produce. We will make sure that your home is actually smart, not just annoying. Get ready to unlock the true potential of your smart home as we delve into the possibilities that lie within these inconspicuous devices and the data that they collect.


Review of Yoshua Bengio’s lecture at the Artificial General Intelligence 2021 Conference

At the 2021 Artificial General Intelligence Conference, a star keynote speaker was Yoshua Bengio. He has been one of the leading figures of deep learning with neural networks, for which he was granted the Turing Award last year.


Thanks to science-fiction movies and books, the general public views AI as a creature that knows everything and can solve any problem better than we humans. In this vision, there is only one type of AI, which is General AI. Although work first started in the field of AI about one hundred years ago, at this moment in time, we are still far from AI being the way that is envisaged by the general public.




A few months ago, OpenAI presented ChatGPT and anyone can access this technology via OpenAI's website. Isn’t that amazing? Many people were eager to use it and are now asking ChatGPT all manner of questions, from simple definitions to more complex prompts related to philosophical problems. Sometimes the ChatGPT produces the correct answer and sometimes the ChatGPT says that it doesn't know the answer.  Of course, the best (and worst) responses have gone viral on the Internet. My favourite answers are related to some logical puzzles, where there still a lot of scope improvement.


So how was it possible to create a model which can pass tests at university levels? To answer this question, let’s step back in time. After the release of the paper entitled “Attention is all you need”, there was a huge interest in transformer models. The original transformer architecture was originally a Neural Machine Translation (NMT) model, but it turned out that, with slight modifications, this model can also deal with some text-generation or text summarization tasks. So the concept of transformers became the basis for Bidirectional Encoder Representations from Transformers (BERTs) and GPT Generative Pre-trained Transformers (GPTs). Transformers are very common in Natural Language Processing, but they can also be used in computer vision. The main “magic” behind transformers is that they have a self-attention mechanism, thanks to which models can find words (or sections of an image) to which they need to pay attention, and hidden states can therefore be passed from the encoder to the decoder. This means that information about each word is still considered relevant in understanding a sentence.


GPT-3 is a third-generation model that is used for text completion. You should know that this model has 175 billion parameters and requires 800GB of memory to store. In other words, it is massive. If you're thinking about training it on your own computer or server then, sadly, that’s impossible. Training this model took place on a v100 cloud instance and cost 4.6 million dollars. This sounds completely crazy, and only a sizable company was able to pay for the resources required for this training. The model was trained on resources such as Common Crawl, WebText2, books and Wikipedia. OpenAI shared this model in 2020 and anyone can use it through the API. GPT-3 made a huge impression on the public, but also scared them a bit. Some responses from the model do not meet the appropriate social standards. On the other hand, GPT-3 is able to generate poetry or blocks of computer code. It can also tackle generating a recipe for apple pie and summarising very complex scientific texts. Still, this model is not perfect, even though the amount of data on which it was trained is impressive. Farhad Manjoo, the New York Times author, described it as “a piece of software that is, at once, amazing, spooky, humbling and more than a little terrifying”. Noam Chomsky, the cognitive scientist, is also very sceptical about GPT-3 because it can even work for impossible languages and, in his opinion, this tells us nothing about language or cognition. Even though this model looks amazing as a tool, a lot of criticism is aimed at it. One major criticism is related to the environmental impact of this model, especially when it comes to the amount of power needed for training the model and the required storage space. There is also a problem in defining the rights of used resources and the plagiarism of texts generated by AI. This problem with copyrighting rights is already well known for models that generate images, and no legal sytem in any country in the world has yet solved this problem. GTP-3 was influential for “prompt engineering”, and this is the only way that normal users can play with this model at this moment. It is very simple to use whether or not you are familiar with programming because you can just type in some text, and you will get a response from the model.


Because GPT-3 was not designed to follow users' instructions, InstructGPT was created next. In addition, OpenAI wanted to create a model that was more truthful and less toxic. This model was trained with reinforcement learning from human feedback to align the language model better. The most important difference between InstructGPT and GPT-3 is that the former has 100 times fewer parameters than the latter. The authors of the paper entitled “Training language models to follow instructions with human feedback” describe the following main findings:

  • Labelers significantly prefer InstructGPT outputs over outputs from GPT-3
  • InstructGPT models show improvements in truthfulness over GPT-3
  • InstructGPT shows small improvements in toxicity over GPT-3, but no improvement in bias
  • InstructGPT models show promising generalization of instructions outside the RLHF fine-tuning distribution
  • InstructGPT still makes simple mistakes.


GPT-3 was succeeded by InstructGPT, which is dedicated to following instructions. ChatGPT is a sibling model to InstructGPT.


Now we have come to the present day. The ChatGPT model is trained to interact conversationally, so that it can “answer questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests”. Sounds amazing, doesn’t it? It was trained in the same way as its sibling, InstructGPT, with the main difference being in data collection setup. It is a fine-tuned version of GPT-3.5 (trained on data from before Q4 2021). In comparison to its sibling, it can provide long and detailed responses.


After ChatGPT was released, the world went crazy. People started to ask ChatGPT about all manner of things, such as recipes for burgers, the explanations of physics phenomena and existential problems. It turns out that ChatGPT can generate even long forms of text, such as school essays. The output from the model is very impressive, so people started to find it difficult to tell if a text was generated by ChatGPT or humans. This led to asking some fundamental questions about AI, especially related to who holds copyrighting rights in generated texts or the effect of this model on the education system. I have even read stories that students use the model to cheat during tests and that, conversely, teachers have used it to generate questions for tests.




So now you may be wondering how to distinguish a human-written text from a text that is generated by AI. Fortunately OpenAI already created a classifier for that. Unfortunately, at this time, the classifier is far from perfect. According to OpenAI, the classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written”. There is also a problem with correctly identifying human-written text, especially with literature, which is sometimes incorrectly classified as generated by AI. This means that, in fact, we are only at the beginning of identifying AI-generated texts.


At this moment, humanity has access to an AI model that can generate texts, answer questions and even write pieces of literature. Of course, this model isn’t perfect. On the other hand, it can generate texts that humans cannot recognise as generated by AI. This raises questions about the future of some jobs like novelists, journalists, and social media content creators. In my opinion, their jobs will be safe in the near future, and the tool will serve them well as a means to be more productive. Accessing the model is a significant problem. Right now, this tool can be used by anyone, but there is no access to the actual model and its weights. If you are a Data Scientist, it is therefore not possible for you to fine-tune the model. In addition, you almost certainly do not have the computing resources to carry out fine-tuning. However, without fine-tuning, there is always a risk that ChatGPT was trained on text that carries some bias. Another problem is the lack of any legal regulations related to the copyright rights for the generated text and also to rights for the data that is used for training.


What will the future look like?

This is a difficult question to answer. There is a huge chance that ChatGPT will be connected to the Bing search engine. And new versions of ChatGPT will be released over time. I personally hope that new versions of the model will include fewer parameters




jakies roboty2



Artificial intelligence is one of the fastest evolving science domains. It influences the route we take when we drive to work, how we use our phones and computers, what shopping we buy and what we will see in cinemas in the near future. Artificial intelligence is having a remarkable impact on the development of the film industry, which is an area that has been growing almost as fast as AI itself in recent years. As with AI, the development of computers and the increase in computing capabilities has resulted in taking the industry to a whole new level. We have now reached a level where tools using machine learning models are being used in the filmmaking process on a daily basis. This could herald the next revolution in the field. One thing that most movies today cannot do without are special effects. On the one hand, advanced special effects are expected by the viewer, but on the other hand, they often consume a large percentage of a film's budget. The question must therefore be asked Is it not possible, in this age of ubiquitous automation, to use artificial intelligence to create special effects more quickly and cheaply? Or could we completely replace CGI (computer-generated imagery) artists with AI systems? Let’s start from the beginning.



The history of CGI begins about 50 years ago, in the 1970s. Of course, this is not the beginning of special effects itself. Various tricks have been used since the 1920s  to achieve the desired illusion on the big screen. As special effects of that time were based mainly on costumes and systems often using advanced engineering they were called engineering effects. The most famous examples of such effects are Alien (1979) and The Thing (1982).


Alien obcy

The first movie to make use of CGI was Westworld (1973), followed by Star Wars: Episode IV (1977) and Tron (1982). Despite the simplicity of CGI effects used in these films, it was undoubtedly a breakthrough in the film industry. These movies showed a completely new path for filmmakers, which has been explored extensively ever since. The following years saw films such as Indiana Jones And The Last Crusade (1989), Terminator 2: Judgement Day (1991), Independence Day (1996), The Matrix (1999), The Lord of the Rings (2001) and King Kong (2005). Each of these films brought new, more advanced solutions which made special effects an increasingly important part of the film. In 2009 the premiere of Avatar took the importance of special effects to a whole new level. Special effects used in the film were the main marketing value for the whole piece. At this point, we have reached a time where a movie can actually be generated entirely through the use of CGI and is only distinguished from animation by the use of actors and possibly some residual choreography. A great example of this type of film is the Hobbit series which were shot almost entirely on a green screen.






But why have special effects progressed so quickly? Well, it’s not only them that have developed over the last 50 years. An area which has an incredible impact on the lives of everyone on our planet and which is also one of the fastest developing areas in the world today is artificial intelligence. At the moment we are witnessing a great interest in generative models which are capable of producing previously unseen original content. The potential of such models is enormous, and when used correctly, they can facilitate the work of both artists and computer graphic designers. It is AI that has made special effects such a high level today. We must therefore ask ourselves whether artificial intelligence is capable of reaching such a level to replace CGI artists and automatically generate special effects for films?


We have to start with the fact that no matter how impressive the results of generative models are at the moment, a human being does not really have a full influence on what the model generates. The model is not taught to interpret human ideas and is not able to understand what the director has in mind and propose some other solutions or interesting ideas. In order for a model to be able to fully replace a human being as a CGI artist, it would need to be able to interpret human ideas at a level of abstraction equal to a human being. A model would need to be able to understand not only the command given to it, but also the idea behind that command and the way of thinking of the human giving it. Such a level of interpretability of human ideas is still the domain of science fiction movies.


However, this doesn't mean that artificial intelligence is completely useless when it comes to generating special effects. What’s more, it can be said that in recent years, it is artificial intelligence and its development that has driven the development of software for film editing and the generation of special effects. This can be seen in the amount of functionality of these types of software which are based on machine learning models. From the technical side, however, machine learning models are advanced tools that require a trained specialist to operate. What is more, he or she does not even need to be aware that he or she is using artificial intelligence models. It is enough that he or she knows the function in the video editing software, understands the parameters of this function and is able to use it correctly. The fact that there is an artificial intelligence model running underneath does not really matter to the user, because it is the effect that counts. And those are good and fast.


In addition, the current level of sophistication of machine learning models in image processing allows a task to be completed with a single click, a task which several years ago may have taken several days or even required the use of some professional equipment. For the producer, of course, this means faster production, which translates into lower costs for both employees and the use of equipment. A good example of this is the Motion Capture system, which has been very popular among filmmakers for several years now. It allows human or animal movements to be read and then transferred to computer-generated characters so that their movements are more realistic. In February of this year, a paper was published by scientists from the ETH Zurich and the Max Planck Institute, which makes it possible to generate accurate 3D avatars from just a 2D film [1]. The results are astounding and there is no denying that there is huge potential in this type of model to replace the Motion Capture system. Considering the size and the cost of renting and operating a Motion Capture system and comparing it to the cost of recording an actor performing a specific sequence of movements, the savings for film producers could be enormous without loss of quality.







So the answer for the title question seems simple and hardly surprising: AI will not replace CGI artists, at least for now. However, there’s no doubt machine learning has an enormous impact on the filmmaking process making it more quickly and cheaper. The evolution of special effects and film editing software, as well as the example of Vid2Avatar cited above, show that just as the introduction of computers began a revolution in filmmaking, so the use of machine learning models begins a whole new era in the filmmaking process.





How can we help you?


To find out more about Digica, or to discuss how we may be of service to you, please get in touch.

Contact us