Haridas S1, Dr. A. Rama Prasath2
1 Research Scholar, Hindustan Institute of Technology and Science, Chennai, India
Assistant Professor, Dept. of Computer Science, Government First Grade College, Tumkur, Karnataka.
Email: harigoleson@gmail.com
2Assistant Professor (Selection Grade), Department of Computer Applications,
Hindustan Institute of Technology and Science, Chennai, India.
E-mail: mraprasath@gmail.com
Abstract
It is obvious that MANETs are dynamic; as a result, network performance declines as the network size grows. Such a problem can be mitigated by implementing clustering. Clustering improves wireless network scalability while decreasing network overhead. In the MANET context, mobile nodes are clustered to reduce processing complexity. A cluster is a group of divided nodes. In a MANET, clustering separates a collection of mobile nodes into virtual logical groupings based on specific criteria. Each cluster comprises a cluster head referred to as CH, cluster members, and cluster gateway; all play distinct functions in the cluster during data transfer in a MANET. Cluster head selection and cluster creation are the two stages of node clustering. Energy state, node degree, distance, trust level, and node mobility are considered when determining the cluster heads’ score values. The cluster leader is chosen from among the nodes having the highest score to maintain the maximum cluster size and enhance cluster stability. In the present work, cluster head selection is implemented using Reward optimized DQN-based algorithm. Appropriate rewards are selected by timely environmental surrounding awareness information. The algorithm develops an ideal strategy for CH’s selection by continuously learning the network state through forwarding packets and feedback from packets. The RoDQL clustering algorithm simulation ensured that the number of nodes in each cluster was balanced. The cluster head will access each node’s connection information, allowing the malicious nodes involved in the wormhole attacks to be discovered and destroyed during routing. Better network energy consumption and routing overhead ensured the effectiveness of the proposed algorithm.
Keywords: MANET, Clustering, DQN, Security, Energy.
1 Introduction
It is obvious that MANETs are dynamic; as a result, network performance declines as the network size grows. Such a problem can be mitigated by implementing clustering. Clustering improves wireless network scalability while decreasing network overhead. In the MANET context, mobile nodes are clustered to reduce processing complexity. Clustering offers a robust and resource-efficient network while resolving most of MANET’s problems. We considered cluster-based environment which minimizes complexity and overhead in packets and control messages forwarding.
2. Related work
Clustering is a crucial strategy for resolving various MANET issues. It also increases network lifespan and scalability. Furthermore, cluster-based routing improves network administration by reducing the number of nodes in the routing table. Cluster Heads, on the other hand, can handle the extra work burden of communication. As a result, CH energy is drained sooner, and the demise of CHs later divides the network, which shortens the network lifespan. Furthermore, node mobility is the primary cause of connection failure. (Mehrkanoon et al., 2014).
The Stabilized Clustering technique enhances cluster formation stability while simultaneously increasing efficiency. The Moth Flame Optimization technique is used in this approach to determine the CH using the QoS standard. In this technique, the helper CH helps to prevent the CH from deteriorating, allowing it to run successfully (Wang and Qing, 2010).
The QoS-guided Dynamic Scheduling technique identifies the ideal locations for all workloads and schedule clusters. It is a lightweight technique that ensures QoS (Alowish et al., 2020). QoS-oriented scheduling and auto-scaling technique are utilized to schedule jobs in the cluster. This strategy focuses on the critical QoS need. This inevitability predicts if a job will be talented before its target and forecasts appropriate resource formation. However, this method is ineffective. The fuzzy-based clustered network increases complexity since more fuzzy rules increase communication overhead, leading to uncertainty.
The routing problem is treated as a Markov decision process, which considers how to route packets in an ideal communication channel (MDP). Unsupervised learning methods, such as reinforcement learning (RL), have also shown useful in learning appropriate MDP rules. Early routing studies relied on the Q-Learning algorithm, the most widely used RL approach (T. Hu and Y. Fei, 2010). On the other hand, the Q-Learning method suffers from slow convergence for larger action spaces. This disadvantage was recently resolved with the introduction of the deep Q-network (DQN) method. (R. Ding et al. 2019) uses DQN for routing in a high-traffic network to minimize network congestion, whereas (A. M. Koushik et al.,2019) create a DQN model to identify the ideal link between nodes. Both, however, need a central unit powerful enough to compute and manage the actions of every node. Deep reinforcement learning emerges as a potential substitute to solve decision-making type problems in this case. DRL, in contrast to traditional reinforcement learning methods, can solve practical problems with large-scale state and action space.
However, no effective scheme exists for intelligently selecting cluster heads in a dynamic network environment. Hence, an efficient clustering strategy is proposed for a large-scale MANET, reducing network energy consumption and increasing the attack detection rate.
Current research proposes an energy-efficient, lifetime-aware, adaptive, and Environment-aware stable Clustering to address the abovementioned challenges. It uses the Rewards optimized Deep-Q-Learning (RoDQL) process to model the dynamic cluster head selection.
3. Problem Statement.
Providing security in the MANET is a big issue because of upcoming factors such as dynamic topology, communication latency, network scalability and high processing security algorithms. Authentication is the main process in MANET security which verifies the credentials provided during the time of registration process. The number of users in the MANET environment is huge and it is not possible to manage the users individually hence clustering of mobile nodes is introduced, this process significantly reduces the complexity of managing the mobile nodes. We considered cluster-based environment which minimizes complexity and overhead in packets and control messages forwarding. Environment aware Clustering is proposed which uses rewards optimized deep-Q-learning (RoDQL) that considers energy status, number of neighbors, mobility and distance. To overcome the challenges faced in routing message packets, the RoDQL model is proposed over clustered MANET environment
4. Comparison Study
This subsection describes evaluation of the proposed blockchain based security model in terms of several QoS metrics. The proposed model is compared with state-of-the-art works. In particular, we considered the following performance metrics as attack detection rate, false positive rate, end-to-end delay, packet delivery ratio, energy consumption, throughput, route overhead ratio, and security strength. Table 1 shows the comparison of existing approaches.
Table.1. Drawbacks of Existing Approaches
Existing work |
Contributions |
Drawbacks |
E2SR [32] |
(1). A hash chain dependent certificate authentication (HCCA) is proposed for authentication (2). Then clusters are formed and here dual cluster heads are elected for data transmission. (3). Secure route established between the sources to destination via worst case particle swarm optimization algorithm. (4). Data packets are encrypted before transmission to secured path by means of XOR RC6 encryption with fuzzy logic |
|
Multi-Path [33] |
(1). Clustered formed using Fuzzy Naïve Bayes algorithm. (2). Secure nodes are selected by hybrid optimization (BSO + WOA). (3). The selection of optimal route is based on the fitness factors as energy, trust, connectivity and throughput. |
|
5.General Model of Deep Q-learning
The Reinforcement Learning (RL) approach offers a paradigm in which a system may learn to achieve a target in control issues built on its experience. Reinforcement learning approaches are required to solve optimum control tasks by interacting with surroundings data. RL aims to maximize an agent’s reward by performing a series of behaviours in response to a changing environment.
State Si Reward Ri Action Ai
|
Agent |
Environment |
Figure 1: The collaboration between agent and environment in the DQL.
In RL, an agent chooses actions depending on the current state of a system and the reinforcement learning it gets from the environment. Most RL methods are based on approximating value functions, which are functions of state-action pairs that assess how good it is to do an action in a particular state. A reinforcement learning process with the Markov property is known as a Markov Decision Process-MDP (R.S. Sutton & A.G. Barto,1998), which is critical for understanding the idea of RL. Properties of a certain MDP are represented by a tuple of (A,S,R,P), where A, S, R, and P are the collection of actions, states, rewards, and state transition probabilities. Information probability functions are used to develop by sampling the environment and using experiences to find the best action-value functions q(s; a) for a particular state s.
DQN estimates Q by combining a convolutional neural network (CNN) and Q-Leaning (s; a). Because the CNN can output Q(s; a) of all actions when the current state s is input, it can tackle large-scale RL issues.
Meanwhile, DQ-Learning abandons the Q-table in favour of an experience replay pool to preserve each experience tuple e = (s; a; r; s'). The state is given as input, and the Q-value of all possible actions is given as output experience tuple e = (s; a; r; s'). Its behaviour is influenced by the reward function, which provides negative or positive reinforcement to the agent once it makes a decision. Its efficiency depends on the careful design of the rewards function. The reward function indicates the current quality of the action decision, which should be developed with the primary goal of intelligent network routing control in mind.
6. Proposed methodology
Environment aware Clustering is proposed which uses rewards optimized deep-Q-learning (RoDQL) that considers energy status, number of neighbors, mobility and distance.
Clustering consists of two processes such as cluster formation and cluster management. In cluster formation, the network splits into different clusters. In each cluster, one node is elected as a CH and others are members of CH. CH are elected using several metrics. The prime motive of clustering is to efficient use of energy resources, maintain and manage routing, and location issue for solving communication and computational complexities. There are two types of cluster maintenance are given follows:
< >Inter cluster maintenance – For packet forwarding/routing using more chs Intra cluster maintenance – For packet forwarding/routing within a cluster. Distance: It is defined by the distance between two nearest nodes. Assume that is the distance between node and . It is computed based on its angular position information and radius information (). It is expressed as: (1)
< > Node Mobility: It is defined by the node speed. However node mobility is computed for dynamic network topology and it cause several issues such as link breakage, route failure and degrades network throughput due to increase of mobility. It is expressed as: = (2)
Where is the nearest node speed, which is calculated for each node in the network with coverage and ,, & are the coordinates of the node at time t and t-1.
< >Residual Energy Level: It is defined by level of energy that nodes consist after certain process at a time scale t. A level of energy per bit/byte consumed for node at time . It is expressed as: = (3)
Where is the node residual node, is the power consumption of the node in the network, and is the transmission power of a node. Therefore energy consumption of a mobile node is expressed as:
= (4)
where is the energy consumption of node , is the power spend for transmission, is the data size, is the data rate, is the power for receive and is loss due to overhearing.
Fig.2.Flow of RoDQL algorithm
In this work, number of neighbors is known as node relative degree which is computed by,
(5)
where is the node density which is computed by:
(6)
RoDQL algorithm follows three principles such as (1) utilize deep neural network for representing the policy, value functions and model, (2). Optimize the policy, end-to-end model and value functions, and (3). It uses stochastic gradient descent. Fig 2 depicts the RoDQL algorithm.
For each node in the network , the MDP model consists of following elements:
(7)
In a given time scale , the state of is the residual energy of node , trust values , distance , transmission distance , delay . The description of the reinforcement learning algorithm is follows:
< >States : For each node , states of nodes computed and change by the node , , , , and . In this model denotes the available set of state transitions in the environment. This element results any of the node as next hop (1st relay node R1) for packets transmission from the source node. Actions : This denotes the set of agents action or behavior in a given time period t. It may possible to change from current state to the next state. All set of actions are self-possessed by all the nodes energy value that each node can choose the next node. Thus the finite set of actions is follows: = (8)
Where is the step size. For any node the possible action covers: (1) Choose one of the nodes from the set of possible nodes (2). Data packets are terminated and never route the packets.
< >Transition Model : This model is depends on the action and states transition. It defines the state transition probabilities from state to and the state transition probability function is defined in below: (9)
From the result of the action . The selection probability of a particular forwarder node is a basis of neighbouring node routing score.
< >Reward functions : It is also known as reinforcement function, which purpose is to compute the immediate action . It represents the state transition from one state to another state. It is computed as: (10)
In this stage, routing policy maximize throughput of each node by reward functions. Routing policy is mapped from the given to that should be elected and it is written by:
= () (11)
is determined using action value function such that (). It is an exact reward function computed starting from state , and .
The optimal policy is the policy whose value function is greater than or equal to any other policy for all states. The final action value for the optimal policy is also known as and () is an optimal action for the selection of large probability score at every hop that increases reward function at all the destination
If any attack patterns found by guard node, then it will immediately isolate the particular malicious node and inform this message throughout the network.
Algorithm1: Reward optimized Deep Q-learning Algorithm Input: Output: p(s) Cluster head |
|||
1 |
Set buffer capacity to |
||
2 |
Set the action value function and random weight ; |
||
3 |
Copy original model parameters to build the network
|
||
4 |
for do |
||
5 |
|
The initial state is |
|
6 |
|
|
for do |
7 |
|
|
Choose an action with the possibility of , or select the existing best with the possibility of ; |
8 |
|
|
|
9 |
|
|
Complete , increment and |
10 |
|
|
Store in to ; |
11 |
|
|
Arbitrarily sample from replay ; |
12 |
|
|
Calculate |
13 |
|
|
Use the Stochastic gradient descent function to solve ; |
14 |
|
|
For every , Update parameters |
15 |
|
|
end for |
16 |
end for |
7. Result Discussion
Cluster head selection is implemented using Reward optimized DQN-based algorithm. Appropriate rewards are selected by timely environmental surrounding awareness information. The algorithm develops an ideal strategy for CH’s selection by continuously learning the network state through forwarding packets and feedback from packets. As the guard node is deployed in the network, which performs node verification, the cluster head will access each node’s connection information, allowing the malicious nodes involved in the Worm Hole attacks to be discovered and destroyed during routing.
7.1 Impact of Energy Consumption
Energy consumption is a QoS based metric that determines the difference between the initial energy of node and then residual energy after the energy consumption for packet transmission or any other operations implementation in the network. However, energy consumed for several processes as packet transmission, route request, reply message reception, waiting to sleep after packet acknowledgement. Fig shows the energy consumption for the number of malicious nodes. From the graphical analysis, it is observed that the proposed blockchain model consumes lesser energy compared to E2SR and multi-path model. In particular, the proposed work has obtained 5J for 2 malicious nodes.
Fig.3.Energy Consumption vs. # of Malicious Nodes
Fig 3 shows the performance of energy consumption in terms of simulation time. In our proposed work, timer is used to listen the node’s current state namely, sleep, listen or active. According to the network density, mobility of node, energy consumption is affected over a time. The proposed work learns environment and adaptively changes the rewards for deep reinforcement learning (DRL) algorithm.
Fig.4.Energy Consumption vs. Simulation Time
7.2 Impact of Throughput
In MANET, throughput is defined as sum of data forwarded from the sender to the receiver node. On the other hand, it is defined as the complete data transmission through communication link to the receiver node. We compute the throughput with respect to the malicious nodes count, which is depicted in fig.5.
Fig.5.Throughput (Kbps) vs. # of Malicious Nodes
7.3 Impact of Routing Overhead Ratio
This metric is defined as the ratio between sums of packets generated for route selection to the sum of packets transmitted. However, routing overhead refers to the amount of routing packets forward in route discovery and maintenance.
Fig.6.Routing Overhead Ratio vs. # of Malicious Nodes
These control messages forwarding introduce the routing overhead. Active route is determined by control messages to the neighbor nodes. Under low mobility environment, routing overhead is less, but highly dynamic networks produce frequent control messages forwarding. Another reason behind a high routing overhead is size of packet header transmitted through a link. Fig shows the performance of routing overhead ratio for number of malicious nodes. From the result, it is observed that the proposed model has obtained small routing overhead due to monitoring of link stability. Further, the selected route transfers packets in a reliable way. Network topology is controlled by CH, which reduces the sum of routing packets transmitted. For instance, when number of malicious node is 2, then the routing overhead by proposed model is 0.85, and the previous works are 0.95 and 0.99 for E2SR and multi-path, respectively.
The RoDQL clustering algorithm simulation ensured that the number of nodes in each cluster was balanced. Better network energy consumption and routing overhead ensured the effectiveness of the proposed algorithm. When this algorithm is compared with other traditional approaches, it is clear that the RoDQL algorithm maintains higher residual energy and the network operation cycle will be longer.
7. References