Automation & Intelligence in Network Operations

Automation & Intelligence in Network Operations

With over 5 billion mobile subscribers, the telecom industry has transformed our world. It connects us, entertains us, informs us and inspires us.

Operators have been at the epicenter of this change but today are facing increasing competition from internet and OTT players. Operators are playing catch-up and need to transform themselves operationally if they are to compete against these highly successful digital companies.

Networks today are predominantly managed by people. Every day engineers monitor thousands of alarms in the network operations center and create trouble tickets to resolve the problem. This operations model - based on siloed operations software, a high proportion of manual and repetitive tasks, and vertical organizational structures - severely restricts operators from the benefits of digital transformation. In order to improve cost efficiency and quality they need to explore new and innovative operation models and take key learning from the internet world in order to be more agile.

As Frank Qing, Chief Wireless Architect for Canada’s TELUS puts it, “We are using 21st century 4G networks, but network O&M is somehow stuck in the 18th century. Machine manufacturing has become automated but the telco industry is still using manual labor.”

The Need for Automation & Intelligence

The current operation model for operators is characterized by highly unmanageable workflow processes, siloes operations and high operational costs. Most of the work is done manually with high mean time to resolve faults which leads to poor service quality and inferior customer experience.

Operations support software is designed based on rigid closed software architectures. These architectures are deployed in domain-based operational siloes that create a scattered IT estate, which extends software change cycles beyond control, increasing the time to market of new services.

The operations engineers are trained to use the software systems to perform their daily tasks. They do not necessarily have the skills to enhance the software to suit changing needs. They may also be constrained by the functional boundaries of the software that disallows customization. Operations persons lose motivation due to the mundane and repetitive nature of the manual operational processes. This causes high levels of employee attrition.

The operations organization is tiered and bureaucratic. For example, there are usually three tiers of customer care and network operations, often reflecting the siloes software and process siloes, with high levels of manual handoffs between tiers.

These characteristics of the current operational model are the key drivers for automation and intelligence. Through automation and intelligence operators can overhaul their operations to achieve business, service and operational agility.

However transformation is not easy. According to McKinsey over 80% of transformations fail or do not deliver their intended values for many reasons. Here the telecom world should learn from the OTT players and understand what makes them so agile and successful.

Learning from OTT Players

The internet providers and in particular OTT players have disrupted and are transforming almost every industry, forcing well entrenched companies to change.

Amazon disrupted the computer industry by commoditizing the storage and computing through its Amazon Web Services (AWS) business, providing on-demand cloud based infrastructure and software-as-a-service (SaaS) solutions. Uber and Airbnb made a similar impact on transportation and hotel industries.

These companies owe a large share of their success to the way they run their operations and demonstrate a few common characteristics that are foundational to their digital operations such as:

  • highly automated operational processes
  • cloud infrastructure
  • use of DevOps principles for service design and delivery
  • use micro services-based software architectures
  • application programming interfaces (APIs)

Google applies “Site Reliability Engineering” paradigm using DevOps to build software product with an operations mindset whilst automating repetitive and recurring tasks to reduce manual errors. Uber migrated from monolithic operational software architecture to flexible and scalable micro services-based software architecture for rapid, reliable and independent software releases across regions. Netflix relies on software-driven automated operations underpinned by cloud-based service infrastructure and DevOps processes.

Automation – Road to Autonomous Operations

While the goal is autonomous operations, however, the evolution will be gradual, and will only be possible by taking an incremental approach to automation. As part of the process transformation, operators should constantly pursue opportunities to automate. They should work on the principle that everything that can be automated should be automated. Figure 1 below illustrates the evolution to automated autonomous operations.

software graphic

Fig 1

The dependence on repetitive manual processes means that they are either written down in manuals and/or the private knowledge of operations persons. Despite detailed instructions and experienced operations people, manual processes are prone to errors. The risk of performing inaccurate analysis or making an incorrect configuration change is high, leading to service disruption, lost revenue and customer churn. It is therefore vital that the tasks are completed accurately and consistently every time.

Manual processes are most conducive for software automation and the component-based software engineering approach enables the identification of repeatable manual tasks at the most granular level. The first task to automate are these simple and recurring manual processes, and the goal should be to package the software routines as reusable components so they can be automatically triggered and executed based on data-driven decision points and rules.

Intelligence – Road to Predictive Operations

A mobile network is designed and managed by engineers who rely heavily on their extensive knowledge of the network topology and the customer’s mobility and usage patterns. As these topologies get more complex and denser the usage patterns will become less predictable and harder for engineers to compute. To overcome this we need to leverage on the data, not just operational data but all data from different areas of the network. This data can be fed into models to derive deep and actionable insights to further optimize operations.

To do this operators must first implement a single unified database with the ability to record, process and aggregate every data point originating from the infrastructure and the network and IT application layers, such as log files, network counters, transaction data and network telemetry data. Through analytics we will get insights on all aspects of operations and develop intelligence which allows it to learn from the environment to make better decisions when presented with the same operational context in the future, much like the human brain. This is where machine learning’s pivotal role in operations automation becomes clear.

Figure 2 below provides an illustrative framework of how operators can apply various analytics and machine learning techniques such as supervised, reinforced and unsupervised learning techniques to bolster operational impact.

huawei article image2

Fig 2

Using reams of historical operations data, supervised machine-learning algorithms can be trained to spot patterns (e.g. degrading network performance) and trigger remediation routines (e.g. supplement network capacity). Continuous calibration of the algorithms can increase the accuracy of pattern matching and to a point where there is sufficient confidence to establish predictive operations. In a predictive operations context, the models predict network or service issues, hours, days or even weeks in advance, allowing sufficient time to take remediation action.

On the other hand, unsupervised learning algorithms have not had prior training on how to classify or label patterns, but would employ grouping or clustering to organize data to understand potential structures and patterns before predicting outcomes. Reinforcement learning is when the machine-learning algorithm makes a single action and receives a notification on how good the decision was, and calibrates its next move based on the feedback. Of the three machine-learning paradigms, supervised machine learning is the most widely used technique, and requires the skills of data scientists to set up and continuously calibrate the algorithms. All three machine-learning techniques are expected to play a critical role in achieving the vision of full operations automation.

Machine learning augments the analytics models with learning abilities, and provides the basic mechanisms for continuously enhancing the intelligence of the model. For example, applying machine-learning-based analytics models, even to partially automated processes, offers excellent opportunities to calibrate the models. Using supervised and reinforced machine learning approaches, the operations persons can tune the analytics models as they make decisions while executing the workflow.

As confidence grows in machine-learning-led automations, unsupervised machine learning models can be gradually introduced to work with automated workflows, taking CSPs into the realm of AI-led operations. The self-learning and self-calibrating nature of unsupervised learning models constantly tunes themselves to increase the accuracy of the operational decisions.

Conclusions & Recommendations

Our industry is at a critical juncture where the future success of operators will depend on their ability to successfully transform to a Digital Service Provider. To achieve this, operators must move away from siloes operations with high levels of repetitive manual processes to autonomous operations enabled by automation and intelligence. This is supported by an operational workforce with software skills capable of creating and continuously automating operational processes using an operations platform powered by unified monitoring, analytics and machine learning. It should have the capability of supporting the existing physical infrastructures and the services, but also continuously adapt as operators implement new strategic initiatives such as NFV, IoT and 5G.

We must learn from other industries, especially successful digital companies such as Google and Amazon, embracing ideas that have contributed to their success. Transforming operation is not achieved overnight. Operators need to identify and implement continuous incremental automations and intelligence to achieve immediate benefits whilst supporting the broader transformation journey.

Asit Tandon is the Vice President of Operations at Hutchison 3 Indonesia.

Sponsored Article

huawei logo large600




Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. With integrated solutions across four key domains – telecom networks, IT, smart devices, and cloud services – we are committed to bringing digital to every person, home and organization for a fully connected, intelligent world.


Sign-up to our weekly newsletter

Keep up-to-date with all the latest news, articles, event and product updates posted on Developing Telecoms.
Subscribe to our FREE weekly email newsletters for the latest telecom info in developing and emerging markets globally.
Sending occasional e-mail from 3rd parties about industry white papers, online and live events relevant to subscribers helps us fund this website and free weekly newsletter. We never sell your personal data. Click here to view our privacy policy.