ALL ARTICLES FOR Machine learning

 

Increasing yields has been a key goal for farmers since the dawn of agriculture. People have continually looked for ways to maximise food production from the land available to them. Until recently, land management techniques such as the use of fertilisers have been the primary tool for achieving this.

 

Challenges for Farmers

 

Whilst these techniques give a much improved chance of an increased yield, problems beyond the control of farmers have an enormous impact:

 

  1. Parasites - “rogue” plants growing amongst the crops may hinder growth; animals may destroy mature plants
  2. Weather - drought will prevent crops from flourishing, whilst heavy rain or prolonged periods of cold can be devastating for an entire season
  3. Human error - ramblers may trample on crops inadvertently, or farm workers may make mistakes
  4. Chance - sometimes it’s just the luck of the draw!

 

AI techniques can be used to reduce the element of randomness in farming. Identification of crop condition and the classification of likely causes of poor plant condition would allow remedial action to be taken earlier in the life cycle. This can also help prevent similar circumstances arising the following season.

 

Computer Vision to the Rescue

 

Computer vision is the most appropriate candidate technology for such systems. Images or video streams taken from fields could be fed into computer vision pipelines in order to detect features of interest.

 

          

 

A key issue in the development of computer vision systems is the availability of data; a potentially large number of images are required to train models. Ideal image datasets are often not available for public use; this is certainly the case in an agricultural context. Nor is the acquisition of such data a trivial exercise. Sample data is required over the entire life cycle of the plants - it takes many months for the plants to grow, and given the potential variation in environmental conditions, it could take years to gather a suitable dataset.

 

How Synthetic Data Can Help

 

The use of synthetic data offers a solution to this problem. The replication of nature synthetically poses a significant problem: the element of randomness. No two plants develop in the same way. The speed of growth, age, number and dimensions of plant features, and external factors such as sunlight, wind and precipitation all have an impact on the plant’s appearance.

 

Plant development can be modelled by the creation of L-systems for specific plants. These mathematical models can be implemented in tools such as Houdini. The Digica team used this approach to create randomised models of wheat plants.

 

                    

 

The L-system we developed allowed many aspects of the wheat plants to be randomised, including height, stem segment length and leaf location and orientation. The effects of gravity were applied randomly and different textures were applied to modify plant colouration. The Houdini environment is scriptable using Python; this allows us to easily generate a very large number of synthetic wheat plants to allow the modelling of entire fields.

 

The synthetic data is now suitable for training computer vision models for the detection of healthy wheat, enabling applications such as:

 

  • filtering wheat from other plants
  • identifying damaged wheat
  • locating stunted and unhealthy wheat
  • calculation of biomass
  • assessing maturity of wheat

 

With the planet’s food needs projected to grow by 50% by 2050, radical solutions are required. AI systems will provide a solution to many of these problems; the use of synthetic data is fundamental to successful deployments.

 

Digica’s team includes experts in the generation and use of synthetic data; we have worked with it in a variety of applications since our inception 5 years ago. We never imagined that it could be used in such complex, rich environments as agriculture. It seems that there are no limits for the use of synthetic data in the Machine Learning process! 

 

Data is all around us, and we don't even see it.

 

 

Data Scientists usually work on projects related to well known topics in Data Science and Machine Learning, for example, projects that rely on Computer Vision, Natural Language Processing (NLP) and Preventive Maintenance. However, in Digica, we're working on a few projects that do not really focus on processing either visual data, text or numbers. In fact, these unusual projects focus on types of data that are flowing around us all the time, but this data nevertheless remains invisible to us because we cannot see it.

 

       1. WiFi           

wifi router gbe15093f9 1920

WiFi technology generates a lot of waves around us. And this data can convey more information than you think. Having just a WiFi router and some mobile devices in a room is enough for us to easily detect what is happening in the room. With this technique, movement distorts the waves in such a way that we can then easily detect that movement, for example, if someone raises a hand. 

 

The nature of WiFi itself makes it pretty easy to set up this technique. Firstly, as mentioned above, we don't even need to use any extra instruments or tools, such as cameras. It's enough to have just a router and some mobile devices. And secondly, this technique can even work through the walls. So this means we can easily use it throughout the whole house, and without thinking about cables or adding any extra equipment to each room.


Some articles have already been published on human gesture recognition using this type of wave, for example, this article

 In that and other articles, you can read about how the algorithm can generate a pretty detailed picture. For example, the algorithm can recognize a person's limbs one by one, and then construct a 3D skeleton of that person. In this way, it is possible to reproduce many elements of a person's position and gestures. It’s actually a really cool effect as long as a stranger is not looking at someone else's data, which would be quite creepy! 

 

         2. Microwaves

 

 microwave gf33c4f3f5 1280

 

I'm sure that you have used microwaves to heat up a meal or cook food from scratch. And you may also be familiar with the idea of medical breast imaging. However, you might not know that those two topics use the same technology, but in different ways. 

 

It turns out that, after sending a microwave at breasts, the waves that are reflected back from healthy tissue looks different from the waves that are reflected from malignant tissue. "So what", you may say, "we already have mammography for that." Yes, but mammograms give a higher exposure to radiation. And it is really difficult to distinguish healthy tissue from malignant tissue in dense breast mammogram images, as described in this link. Microwaves were first studied in 1886, but, as you see, they are now being put to new uses, such as showing up malignant tissue in a way that is completely non-invasive and harmless to people.

 

By the way, microwaves are also perfect for weather forecasting. This is because water droplets scatter microwaves, and using this concept helps us to recognize clouds in the sky!

 

            3. CO2

 

Last but not least, we have Carbon Dioxide. This chemical compound is actually a great transmitter of knowledge. Did you know that CO2 can very accurately indicate the number of people in a room? Well, it does make sense because we generate CO2 all the time as a result of breathing. However, it’s not that obvious that we can be around 88% accurate in indicating the number of people in a given room! 

 

When this approach is set up, we can seamlessly detect, for example, that a room is unoccupied, and therefore it would be a good idea to save money by switching off all the electronics in that room. So this can be a great add-on to every smart home or office.

 

You might think that the simplest way to find out if a given room is unoccupied is to employ hardware specifically for this purpose, such as cameras and RFID tags. However, such a high-tech approach entails additional costs and, most importantly these days, carries the risk of breaching the privacy of people. On the other hand, as described above, the data is already there, and just needs to be found and utilized to achieve the required.

 

In the simplest case, we just read the levels of CO2 gas in a room, and plot those levels against time. Sometimes, for this task, we can also track the temperature of the room, as in thisexperiment. However, note that temperature data is often already available, for example, in air-conditioning systems. We only need to read the existing data, and then analyse that data correctly in order to provide the insight that is required in the particular project.



There are many, many more types of data that are invisible to the human eye, but offer an amazing playground for Data Scientists. For example, there are radio waves, which are a type of electromagnetic wave which have a wavelength that is longer than microwaves. And there are infrared waves, which are similar to radio waves but are shorter than microwaves, and are great for thermal imaging. And then there are sound waves, which we can use for echo-location (like bats). The above waves were the first ones that came to mind, but I'm sure that there are many other sources of invisible data that can be re-used for the purposes of Data Science.

 

 

Since it’s 2021, it’s probably no surprise to you that heart rate can be measured using different gadgets like smartphones or smartwatches.

 

For some reason, it is quite natural for people to argue with each other all the time. Wives argue with husbands. Children argue with their parents. Facebook users argue with other Facebook users. United fans argue with City fans. And it goes without saying that … Data Scientists argue with other Data Scientists!

 

 

Nowadays, no one needs to be convinced of the power and usefulness of deep neural networks. AI solutions based on neural networks have revolutionised almost every area of ​​technology, business, medicine, science and military applications. After the breakthrough win of Geoffrey Hinton's group in the ImageNet competition in 2012, neural networks have become the most popular machine learning algorithm. Since then, 21st century technology has come to increasingly rely on AI applications. We encounter AI solutions in almost every step of our daily lives - in cutting-age technologies, entertainment systems, business solutions, protective systems, the medical domain and many more areas. In many of these areas, AI solutions work in a way which is self-sufficient and under only little or no human supervision.

 

According to some sources, over 40% of all Internet traffic is made up of bot traffic. And we know that malicious bots are a significant proportion of current bot traffic. This article describes a number of strategies (Machine Learning, user authentication using simple input devices, and behavioral biometrics) which you can use to distinguish automatically between humans using the Internet (on the one hand), and bots (on the other hand).

 

 

Quantum Machine

 

 

 

 

 

 

 

 

 

 

 

 

Image generated with Midjourney API, “Quantum Computing”

 

 

The blogs so far have been based on facts and knowledge backed by both relentless theory, empirical experience and evidence in the form of practical solutions. Today's article will be of a slightly different nature and will focus on a field still in development, rather in its empirical infancy, which is still far from generating practical real-life solutions. We are talking about quantum computing (QC) - a field with which the technological world on the one hand associates computational salvation, and on the other fears irreversible change in some of the systems on which our technological civilization operates. It is worth quoting at this point the words of the great Polish visionary and science-fiction writer Stanislaw Lem:

The risk can be of any magnitude, but the very fact of its existence implies the possibility of success

~ Stanisław Lem, The Magellanic Cloud

 

Quantum computing from a theoretical perspective is the computing power to break today's ciphers securing our money in the bank in a short period of time, but there is still uncertainty whether the barrier between theory and practice will not prove to be too high, in terms of physical possibilities, in the context of the coming years. I believe, however, that it is worth pointing your gaze to the future, even if uncertain. After all, even 30 years ago few would have believed in solutions that today we take for granted and perhaps even boring everyday life.

Known areas where quantum computing can lead to a revolution are, e.g., the design of new drugs and materials, the improvement of artificial intelligence and optimization tasks such as fleet management of taxis, trucks, ships, etc. As quantum computers are increasing in maturity, research on algorithms that are dedicated to utilising the power of quantum computers is moving from being a niche that only a few people look at theoretically, to an active area of research on a larger scale. More practically relevant application areas are expected in the future.

In today's blog we will focus on a particular field where QC is likely to find its application - quantum machine learning. An area combining quantum computing with classical machine learning, something we at Digica like the most :)

 

The basis of quantum computing - qubit as a carrier of information

 

Where lies the computational power of quantum computing? A qubit, a bit of quantum information, has an unusual feature, derived from the laws of quantum mechanics: it can be not only in state 0 and 1 at the same time, but also a little bit in 0 and a little bit in 1 - it is in a superposition of two states. Similarly, eight qubits can be in all states, from 0 to 255, at once. The implications of this are momentous. A classical byte requires reading a sequence of bits, and the processor processes only one bit at a time. A qubit, which is actually a kind of probability cloud that determines the possibility of each state, allows all these states to be processed simultaneously. So we are dealing with parallel processing, which in modern electronics would correspond to the use of multiple processors.

 

Comparision of bit and qubit

Comparison of bit and qubit states.

[https://blog.sintef.com/digital-en/diving-deep-into-quantum-computing/]

 

The performance of quantum computers does not depend on any clocks - there are none here at all. Performance is determined by the number of qubits. Adding each additional qubit pays off by doubling the speed of computation. In a single act of reading we would then receive information, for the processing of which a classical computer would consume centuries. The juxtaposition is striking: in order to achieve performance exceeding the best modern supercomputer, it is “enough” to construct a device consisting of around 1 million qubits (equivalent to 21000000 classical bits).

 

IBM Q Quantum Computer

IBM Q Quantum Computer; Photo by LarsPlougmann [https://newrycorp.com/insights/blog/technology-readiness-of-quantum-computing/]

 

When discussing the possibilities of quantum computing, we must also take into account the existing problems the technology faces in the empirical field. Quantum computers are exceedingly difficult to engineer, build and program. As a result, they are crippled by errors in the form of noise, faults and loss of quantum coherence, which is crucial to their operation and yet falls apart before any nontrivial program has a chance to run to completion. This loss of coherence (called decoherence), caused by vibrations, temperature fluctuations, electromagnetic waves and other interactions with the outside environment, ultimately destroys the exotic quantum properties of the computer. Given the current pervasiveness of decoherence and other errors, contemporary quantum computers are unlikely to return correct answers for programs of even modest execution time. While competing technologies and competing architectures are attacking these problems, no existing hardware platform can maintain coherence and provide the robust error correction required for large-scale computation. A breakthrough is probably several years away.

 

Quantum Machine Learning

 

Machine learning (ML) is a set of algorithms and statistical models that can extract information hidden in data. By learning a model from a dataset, one can make predictions on previously unseen data taken from the same probability distribution. For several decades, machine learning research has focused on models that can provide theoretical guarantees of their performance. But in recent years, heuristics-based methods have dominated, in part because of the abundance of data and computational resources. Deep learning is one such heuristic method that has been very successful.

 

With the increasing development of deep machine learning, there has been a parallel significant increase in interest in the ever-expanding field of quantum computing. Quantum computing involves the design and utilisation of quantum systems to perform specific computations, with quantum systems defined as a generalisation of probability theory that introduces unique system behaviours such as superposition or quantum entanglement into reality. Such novel system behaviours are difficult to simulate in classical computers, so one area of research that has growing attention is the design of machine learning algorithms that would rely on quantum properties to accelerate their performance.

 

The ability to perform fast linear algebra on a state space that grows exponentially with the number of qubits has become a key feature motivating the application of quantum computers for machine learning. These quantum-accelerated linear algebra-based techniques for machine learning can be considered the first generation of quantum machine learning (QML) algorithms, which address a wide range of applications in both supervised and unsupervised learning, including principal component analysis, support vector machines, k-means clustering and recommender systems (https://arxiv.org/pdf/2003.02989.pdf). The main deficiency that arises in these algorithms is that proper data preparation is necessary, which amounts to embedding classical data in quantum states. Not only is such a process poorly scalable, but it also deprives the data of the specific structure that gives advantages with classical algorithms while questioning the practicality of quantum acceleration.

 

Quantum Deep Learning

 

When we talk about quantum computers, we usually mean fault-tolerant devices. They will be able to run Shor's algorithm for factorization (https://arxiv.org/abs/quant-ph/9508027), as well as all the other algorithms that have been developed over the years. However, power comes at a price: in order to solve a factorization problem that is unfeasible for a classical computer, we will need many qubits. This overhead is needed for error correction, since most quantum algorithms we know are extremely sensitive to noise. Even so, programs running on devices larger than 50 qubits quickly become extremely difficult to simulate on classical computers. This opens up the possibility that devices of this size could be used to perform the first demonstration of a quantum computer doing something that is unfeasible for a classical computer. It will probably be a highly abstract task and useless for any practical purpose, but it will be proof-of-principle nonetheless. It would be a stage when we know that devices can do things that classical computers can't, but they won't be big enough to provide fault-tolerant implementations of familiar algorithms. John Preskill coined the term "Noisy Intermediate-Scale Quantum" (https://arxiv.org/abs/1801.00862) to describe this stage. Noisy because we don't have enough qubits for error correction, so we will have to directly exploit imperfect qubits in the physical layer. And "Intermediate-Scale" because of the small (but not too small) number of qubits.

 

By analogy, just as machine learning evolved into deep learning with the emergence of new computational capabilities, with the theoretical availability of Noisy Intermediate-Scale Quantum (NISQ) processors, a second generation of QML has emerged based on heuristic methods that can be studied empirically due to the increased computational capabilities of quantum systems. These are algorithms using parameterized quantum transformations called parameterized quantum circuits (PQCs) or quantum neural networks (QNNs). As in classical deep learning, the parameters of PQCs/QNNs are optimised against a cost function using black-box optimization heuristics or gradient-based methods to learn representations of the training data. Quantum processors in the near term will still be quite small and noisy, so distinguishing and generalising quantum data will not be possible using quantum processors alone, so NISQ processors will have to work with classical coprocessors to become effective.

 

Abstract pipeline for inference and training of hybrid quantum model

Abstract pipeline for inference and training of a hybrid classical-quantum model

[https://arxiv.org/pdf/2003.02989v2.pdf]

 

 

Conclusions

Let's answer the question in the title of the article - Is Quantum Machine Learning hot or not? The possibilities opened up by the laws of quantum mechanics when used for quantum computing are extremely enticing and promising. In the context of machine learning, the acceleration of many of the algorithms of both classical machine learning and deep learning instils excitement and seems to be a technology that offsets some of the pains that existing classical solutions encounter. Unfortunately, things that are possible in theory sometimes have a technological barrier in front of them, which is especially true for quantum computing. Nevertheless, in my opinion Quantum Machine Learning is hot, and although it may seem like a distant prospect, it does not detract from its advantages and the many solutions it carries. As is often the case in life, let time tell.

 

 

 

 

 

 

The main focus in machine learning projects is to optimize metrics  like accuracy, precision, recall, etc. We put effort into hyper-parameter tuning or designing good data pre-processing. What if these efforts don’t seem to work? 

If I was to point out one most common mistake of a rookie Data Scientist, it’s their focus on the model, not on the data.

How can we help you?

 

To find out more about Digica, or to discuss how we may be of service to you, please get in touch.

Contact us