## 04 Mar Python experts only ?Implement this paper attached by using either python or matlab. My conditions are: 1- Do not use the already made framework or function call of the D

Python experts only

Implement this paper attached by using either python or matlab.

My conditions are:

1- Do not use the already made framework or function call of the DQN from the AI baseline or the matlab toolbox (Do not do that), I want to build the DQN agent from scratch so I can understand it more.

2- Explain every line of code that you write so I can understand clearly, and the simpler code the better.

Please let me know If you want any article that you do not have access to so I can provide it. Also please let me know If you have any question or inquiry as you go along the work.

Citation: Amer, A.; Shaban, K.;

Massoud, A. Demand Response in

HEMSs Using DRL and the Impact of

Its Various Configurations and

Environmental Changes. Energies

2022, 15, 8235. https://doi.org/

10.3390/en15218235

Academic Editor: Surender Reddy

Salkuti

Received: 26 September 2022

Accepted: 28 October 2022

Published: 4 November 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional affil-

iations.

Copyright: © 2022 by the authors.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

energies

Article

Demand Response in HEMSs Using DRL and the Impact of Its Various Configurations and Environmental Changes Aya Amer 1,*, Khaled Shaban 2 and Ahmed Massoud 1

1 Electrical Engineering Department, Qatar University, Doha 2713, Qatar 2 Computer Science and Engineering Department, Qatar University, Doha 2713, Qatar * Correspondence: [email protected]

Abstract: With smart grid advances, enormous amounts of data are made available, enabling the training of machine learning algorithms such as deep reinforcement learning (DRL). Recent research has utilized DRL to obtain optimal solutions for complex real-time optimization problems, including demand response (DR), where traditional methods fail to meet time and complex requirements. Although DRL has shown good performance for particular use cases, most studies do not report the impacts of various DRL settings. This paper studies the DRL performance when addressing DR in home energy management systems (HEMSs). The trade-offs of various DRL configurations and how they influence the performance of the HEMS are investigated. The main elements that affect the DRL model training are identified, including state-action pairs, reward function, and hyperparameters. Various representations of these elements are analyzed to characterize their impact. In addition, different environmental changes and scenarios are considered to analyze the model’s scalability and adaptability. The findings elucidate the adequacy of DRL to address HEMS challenges since, when appropriately configured, it successfully schedules from 73% to 98% of the appliances in different simulation scenarios and minimizes the electricity cost by 19% to 47%.

Keywords: deep learning; reinforcement learning; deep Q-networks; home energy management system; demand response

1. Introduction

To address complex challenges in power systems associated with the presence of dis- tributed energy resources (DERs), wide application of power electronic devices, increasing number of price-responsive demand participants, and increasing connection of flexible load, e.g., electric vehicles (EV) and energy storage systems (ESS), recent studies have adopted artificial intelligence (AI) and machine learning (ML) methods as problem solvers [1]. AI can help overcome the aforementioned challenges by directly learning from data. With the spread of advanced smart meters and sensors, power system operators are producing massive amounts of data that can be employed to optimize the operation and planning of the power system. There has been increasing interest in autonomous AI-based solutions. The AI methods require little human interaction while improving themselves and becoming more resilient to risks that have not been seen before.

Recently, reinforcement learning (RL) and deep reinforcement learning (DRL) have become popular approaches to optimize and control the power system operation, including demand-side management [2], the electricity market [3], and operational control [4], among others. RL learns the optimal actions from data through continuous interactions with the environment, while the global optimum is unknown. It eliminates the dependency on accurate physical models by learning a surrogate model. It identifies what works better with a particular environment by assigning a numeric reward or penalty to the action taken after receiving feedback from the environment. In contrast to the perfor- mance of RL, the conventional and model-based DR approaches, such as mixed interlinear

Energies 2022, 15, 8235. https://doi.org/10.3390/en15218235 https://www.mdpi.com/journal/energies

Energies 2022, 15, 8235 2 of 20

programming [5,6], mixed integer non-linear programming (MINLP) [7], particle swarm optimization (PSO) [8], and Stackelberg PSO [9], require accurate mathematical models and parameters, the construction of which is challenging because of the increasing system complexities and uncertainties.

In demand response (DR), RL has shown effectiveness by optimizing the energy consumption for households via home energy management systems (HEMSs) [10]. The motivation behind applying DRL for DR arises mainly from the need to optimize a large number of variables in real time. The deployment of smart appliances in households is rapidly growing, increasing the number of variables that need to be optimized by the HEMS. In addition, the demand is highly fluctuant due to the penetration of EVs and RESs in the residential sector [11]. Thus, new load scheduling plans must be processed in real-time to satisfy the users’ needs and adapt to their lifestyles by utilizing their past experiences. RL has been proposed for HEMSs, demonstrating the potential to outperform other existing models. Initial studies focused on proof of concept, with research such as [12,13] advocating for its ability to achieve better performance than traditional optimization methods such as MILP [14], genetic algorithms [15], and PSO [16].

More recent studies focused on utilizing different learning algorithms for HEMS problems, including deep Q-networks (DQN) [17], double DQN [18], deep deterministic policy gradients [19], and a mixed DRL [20]. In [21], the authors proposed a multi-agent RL methodology to guarantee optimal and decentralized decision-making. To optimize their energy consumption, each agent corresponded to a household appliance type, such as fixed, time-shiftable, and controllable appliances. Additionally, RL was utilized to control heating, ventilation, and air conditioning (HVAC) loads in the absence of thermal modeling to reduce electricity cost [22,23]. Work in [24] introduces an RL-based HEMS model that optimizes energy usage considering DERs such as ESS and a rooftop PV system. Lastly, many studies have focused on obtaining an energy consumption plan for EVs [25,26]. However, most of these studies only look at improving their performance compared to other approaches without providing precise details, such as the different configurations of the HEMS agent based on an RL concept or hyperparameter tuning for a more efficient training process. Such design, execution, and implementation details can significantly influence HEMS performance. DRL algorithms are quite sensitive to their design choices, such as action and state spaces, and their hyperparameters, such as neural network size, learning and exploration rates, and others [27].

The DRL adoption in real-world tasks is limited because of the reward design and safe learning. There is a lack in the literature of in-depth technical and quantitative descriptions and implementation details of DRL in HEMSs. Despite the expert knowledge required in DR and HEMS, DRL-based HEMSs pose extra challenges. Hence, a performance analysis of these systems needs to be conducted to avoid bias and gain insight into the challenges and the trade-offs. The compromise between the best performance metrics and the limiting characteristics of interfacing with different types of household appliances, EV, and ESS models will facilitate the successful implementation of DRL in DR and HEMS. Further, there is a gap in the literature regarding the choice of reward function configuration in HEMSs, which is crucial for their successful deployment.

In this paper, we compare different reward functions for DRL-HEMS and test them using real-world data. In addition, we examine various configuration settings of DRL and their contributions to the interpretability of these algorithms in HEMS for robust performance. Further, we discuss the fundamental elements of DRL and the methods used to fine-tune the DRL-HEMS agents. We focus on DRL sensitivity to specific parameters to better understand their empirical performance in HEMS. The main contributions of this work are summarized as follows:

• A study of the relationship between training and deployment of DRL is presented. The implementation of the DRL algorithm is described in detail with different con- figurations regarding four aspects: environment, reward function, action space, and hyperparameters.

Energies 2022, 15, 8235 3 of 20

• We have considered a comprehensive view of how the agent performance depends on the scenario considered to facilitate real-world implementation. Various environments account for several scenarios in which the state-action pair dimensions are varied or the environment is made non-stationary by changing the user’s behavior.

• Extensive simulations are conducted to analyze the performance when the model hyperparameters are changed (e.g., learning rates and discount factor). This verifies the validity of having the model representation as an additional hyperparameter in applying DRL. To this end, we choose the DR problem in the context of HEMS as a use case and propose a DQN model to address it.

The remainder of this paper is structured as follows. The DR problem formulation with various household appliances is presented in Section 2. Section 3 presents the DRL framework, different configurations to solve the DR problem, and the DRL implementation process. Evaluation and analysis of the DRL performance results are discussed in Section 4. Concluding along with the future work and limitation remarks are presented in Section 5.

2. Demand Response Problem Formulation

The advances in smart grid technologies enable power usage optimization for cus- tomers by scheduling their different loads to minimize the electricity cost considering various appliances and assets, as shown in Figure 1. The DR problem has been tackled in different studies; however, the flexible nature of the new smart appliances and the high- dimensionality issue add a layer of complexity to it. Thus, new algorithms and techniques, such as DRL, are proposed to address the problem. The total electricity cost is minimized by managing the operation of different categories of home appliances. The appliances’ technical constraints and user comfort limit the scheduling choices. Thus, the DR problem is defined according to the home appliances’ configuration and their effect on user comfort. The home appliances can be divided into three groups as follows:

Energies 2022, 15, x FOR PEER REVIEW 3 of 20

• A study of the relationship between training and deployment of DRL is presented. The implementation of the DRL algorithm is described in detail with different con- figurations regarding four aspects: environment, reward function, action space, and hyperparameters.

• We have considered a comprehensive view of how the agent performance depends on the scenario considered to facilitate real-world implementation. Various environ- ments account for several scenarios in which the state-action pair dimensions are var- ied or the environment is made non-stationary by changing the user’s behavior.

• Extensive simulations are conducted to analyze the performance when the model hyperparameters are changed (e.g., learning rates and discount factor). This verifies the validity of having the model representation as an additional hyperparameter in applying DRL. To this end, we choose the DR problem in the context of HEMS as a use case and propose a DQN model to address it. The remainder of this paper is structured as follows. The DR problem formulation

with various household appliances is presented in Section 2. Section 3 presents the DRL framework, different configurations to solve the DR problem, and the DRL implementa- tion process. Evaluation and analysis of the DRL performance results are discussed in Section 4. Concluding along with the future work and limitation remarks are presented in Section 5.

2. Demand Response Problem Formulation The advances in smart grid technologies enable power usage optimization for cus-

tomers by scheduling their different loads to minimize the electricity cost considering var- ious appliances and assets, as shown in Figure 1. The DR problem has been tackled in different studies; however, the flexible nature of the new smart appliances and the high- dimensionality issue add a layer of complexity to it. Thus, new algorithms and techniques, such as DRL, are proposed to address the problem. The total electricity cost is minimized by managing the operation of different categories of home appliances. The appliances’ technical constraints and user comfort limit the scheduling choices. Thus, the DR problem is defined according to the home appliances’ configuration and their effect on user com- fort. The home appliances can be divided into three groups as follows:

Figure 1. HEMS interfacing with different types of household appliances and assets. Figure 1. HEMS interfacing with different types of household appliances and assets.

2.1. Shiftable Appliances

The working schedule of this appliance group can be changed, e.g., from a high-price time slot to another lower-price time slot, to minimize the total electricity cost. Examples of this type are washing machines (WMs) and dishwashers (DWs). The customer’s discomfort may be endured due to waiting for the appliance to begin working. Assume a time-shiftable

Energies 2022, 15, 8235 4 of 20

appliance requires an interval of dn to achieve one operation cycle. The time constraints of the n shiftable appliance are defined as:

tint,n ≤ tstart,n ≤ (tend,n − dn) (1)

2.2. Controllable, Also Known as Thermostatically Controlled, Appliances

This group of appliances includes air conditioners (ACs) and water heaters (WHs), among others, in which the temperature can be adjusted by the amount of electrical energy consumed. Their consumption can be adjusted between maximum and minimum values in response to the electricity price signal, as presented in (2). Regulating the consumption of these appliances reduces charges on the electricity bill. However, reduced consumption can affect the customer’s thermal comfort. The discomfort is defined based on the vari- ation (Emax

n − En,t). When this deviation decreases, customer discomfort decreases and vice versa.

Emin n ≤ En,t ≤ Emax

n (2)

2.3. Baseloads

These are appliances’ loads that cannot be reduced or shifted, and thus are regarded as a fixed demand for electricity. Examples of this type are cookers and laptops.

2.4. Other Assets

The HEMS controls the EVs and ESSs charging and discharging to optimize energy usage while sustaining certain operational constraints. The EV battery dynamics are modeled by:

SOEEV n,t+1 =

{ SOEt + ηEV

ch ·E EV n,t , EEV

n,t > 0

SOEt + ηEV dis ·E

EV n,t , EEV

n,t < 0 (3)

− EEV/max n,t ≤ EEV

n,t ≤ EEV/max n,t t ∈ [ta,n, tb,n] (4)

EEV n,t = 0, otherwise (5)

SOEmin ≤ SOEt ≤ SOEmax (6)

The ESS charging/discharging actions are modeled the same as EV battery dynamics, as presented by Equations (3)–(6). However, the ESS is available at any time during the scheduling horizon t ∈ T.

3. DRL for Optimal Demand Response

The merit of a DR solution depends, in part, on its capability to the environment and the user preferences and integrate the user feedback into the control loop. This section illustrates the Markov decision process (MDP), followed by the DRL setup for the DR problem with different element representations.

3.1. Deep Reinforcement Learning (DRL)

DRL combines RL with deep learning to address environments with a considerable number of states. DRL algorithms such as deep Q-learning (DQN) are effective in decision- making by utilizing deep neural networks as policy approximators. DRL shares the same basic concepts as RL, where agents determine the optimal possible actions to achieve their goals. Specifically, the agent and environment interact in a sequence of decision episodes divided into a series of time steps. In each episode, the agent chooses an action based on the environment’s state representation. Based on the selected action, the agent receives a reward from the environment and moves to the next state, as visualized in Figure 2.

Energies 2022, 15, 8235 5 of 20

Energies 2022, 15, x FOR PEER REVIEW 5 of 20

their goals. Specifically, the agent and environment interact in a sequence of decision ep- isodes divided into a series of time steps. In each episode, the agent chooses an action based on the environment’s state representation. Based on the selected action, the agent receives a reward from the environment and moves to the next state, as visualized in Fig- ure 2.

Figure 2. Agent and environment interaction in RL.

Compared to traditional methods, RL algorithms can provide appropriate techniques for decision-making in terms of computational efficiency. The RL problem can be modeled with an MDP as a 5-tuple (𝒮, 𝒜, 𝑇, ℛ, 𝜆), where 𝒮 is a state-space, 𝒜 is an action space, 𝑇 ∈ [0, 1] is a transition function, ℛ is a reward function, and 𝛾 ∈ [0, 1) is a discount factor. The main aim of the RL agent is to learn the optimal policy that maximizes the expected average reward. In simple problems, the policy can be presented by a lookup table, a.k.a., Q-table, that maps all the environment states to actions. However, this type of policy is impractical in complex problems with large or continuous state and/or action spaces. DRL overcomes these challenges by replacing the Q-table with a deep neural net- work model that approximates the states to actions mapping. A general architecture of the DRL agent interacting with its environment is illustrated in Figure 3.

Figure 3. The general architecture of DRL.

The optimal value 𝑄∗(𝑠, 𝑎) presents the maximum accumulative reward that can be achieved during the training. The looping relation between the action–value function in two successive states 𝑠 and 𝑠 , is designated as the Bellman equation. 𝑄∗(𝑠, 𝑎) = 𝑟 + 𝛾 𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ) (7)

Figure 2. Agent and environment interaction in RL.

Compared to traditional methods, RL algorithms can provide appropriate techniques for decision-making in terms of computational efficiency. The RL problem can be modeled with an MDP as a 5-tuple (S , A, T, R, λ), where S is a state-space, A is an action space, T ∈ [0, 1] is a transition function,R is a reward function, and γ ∈ [0, 1) is a discount factor. The main aim of the RL agent is to learn the optimal policy that maximizes the expected average reward. In simple problems, the policy can be presented by a lookup table, a.k.a., Q-table, that maps all the environment states to actions. However, this type of policy is impractical in complex problems with large or continuous state and/or action spaces. DRL overcomes these challenges by replacing the Q-table with a deep neural network model that approximates the states to actions mapping. A general architecture of the DRL agent interacting with its environment is illustrated in Figure 3.

Energies 2022, 15, x FOR PEER REVIEW 5 of 20

their goals. Specifically, the agent and environment interact in a sequence of decision ep- isodes divided into a series of time steps. In each episode, the agent chooses an action based on the environment’s state representation. Based on the selected action, the agent receives a reward from the environment and moves to the next state, as visualized in Fig- ure 2.

Figure 2. Agent and environment interaction in RL.

Compared to traditional methods, RL algorithms can provide appropriate techniques for decision-making in terms of computational efficiency. The RL problem can be modeled with an MDP as a 5-tuple (𝒮, 𝒜, 𝑇, ℛ, 𝜆), where 𝒮 is a state-space, 𝒜 is an action space, 𝑇 ∈ [0, 1] is a transition function, ℛ is a reward function, and 𝛾 ∈ [0, 1) is a discount factor. The main aim of the RL agent is to learn the optimal policy that maximizes the expected average reward. In simple problems, the policy can be presented by a lookup table, a.k.a., Q-table, that maps all the environment states to actions. However, this type of policy is impractical in complex problems with large or continuous state and/or action spaces. DRL overcomes these challenges by replacing the Q-table with a deep neural net- work model that approximates the states to actions mapping. A general architecture of the DRL agent interacting with its environment is illustrated in Figure 3.

Figure 3. The general architecture of DRL.

The optimal value 𝑄∗(𝑠, 𝑎) presents the maximum accumulative reward that can be achieved during the training. The looping relation between the action–value function in two successive states 𝑠 and 𝑠 , is designated as the Bellman equation. 𝑄∗(𝑠, 𝑎) = 𝑟 + 𝛾 𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ) (7)

Figure 3. The general architecture of DRL.

The optimal value Q∗(s, a) presents the maximum accumulative reward that can be achieved during the training. The looping relation between the action–value function in two successive states st and st+1, is designated as the Bellman equation.

Q∗(s, a) = rt+1 + γmax︸︷︷︸ at+1

Q(st+1, at+1) (7)

Energies 2022, 15, 8235 6 of 20

The Bellman equation is employed in numerous RL approaches to direct the estimates of the Q-values near the true values. At each iteration of the algorithm, the estimated Q-value Qt is updated by:

Qt+1(st, at)← Q(st, at) + α[ rt + γmax︸︷︷︸ at+1

Q(st+1, at+1)−Q(st, at)] (8)

3.2. Different Configurations for DRL-Based DR

To solve the DR by DRL, an iterative decision-making method with a time step of 1 h is considered. An episode is defined as one complete day (T = 24 time steps). As presented in Figure 4, the DR problem is formulated based on forecasted data, appliances’ data, and user preferences. The DRL configuration for the electrical home appliances is defined below.

Energies 2022, 15, x FOR PEER REVIEW 6 of 20

The Bellman equation is employed in numerous RL approaches to direct the esti- mates of the Q-values near the true values. At each iteration of the algorithm, the esti- mated Q-value 𝑄 is updated by: 𝑄 (𝑠 , 𝑎 ) ← 𝑄(𝑠 , 𝑎 ) + 𝛼[ 𝑟 + 𝛾 𝑚𝑎𝑥 𝑄(𝑠 , 𝑎 ) − 𝑄(𝑠 , 𝑎 )] (8)

3.2. Different Configurations for DRL-Based DR To solve the DR by DRL, an iterative decision-making method with a time step of 1

h is considered. An episode is defined as one complete day (T = 24 time steps). As pre- sented in Figure 4, the DR problem is formulated based on forecasted data, appliances’ data, and user preferences. The DRL configuration for the electrical home appliances is defined below.

Figure 4. DRL-HEMS structure.

3.2.1. State Space Configuration The state 𝑠 at time 𝑡 comprises the essential knowledge to assist the DRL agent in

optimizing the loads. The state space includes the appliances’ operational data, ESS’s state of energy (SoE), PV generation, and the electricity price received from the utility. The time resolution to update the data is 1 h. In this paper, the state representation is kept the same throughout the presented work. The state 𝑠 is given as: 𝑠 = 𝑠 , , … , 𝑠 , , 𝜆 , 𝑃 , ∀𝑡 (9)

3.2.2. Action Space Configuration The action selection for each appliance depends on the environment states. The

HEMS agent performs the binary actions {1—‘On’, —‘Off’} to turn on or off the shiftable appliances. The controllable appliances’ actions are discretized in five different energy levels. Similarly, the ESS and EV actions are discretized with two charging and two dis- charging levels. The action set 𝐴, in each time step 𝑡 determined by a neural network, is the ‘ON’ or ‘OFF’ actions of the time shiftable appliances, the power levels of the control- lable appliances, and the charging/discharging levels of the ESS and EV. The action set 𝑎 is given as: 𝑎 = 𝑢 , , 𝐸 , , 𝐸 , , 𝐸 , , ∀𝑡 (10)

Figure 4. DRL-HEMS structure.

3.2.1. State Space Configuration

The state st at time t comprises the essential knowledge to assist the DRL agent in optimizing the loads. The state space includes the appliances’ operational data, ESS’s state of energy (SoE), PV generation, and the electricity price received from the utility. The time resolution to update the data is 1 h. In this paper, the state representation is kept the same throughout the presented work. The state st is given as:

st = (

s1,t, . . . , sN,t, λt, PPV t

) , ∀t (9)

3.2.2. Action Space Configuration

The action selection for each appliance depends on the environment states. The HEMS agent performs the binary actions {1—‘On’, —‘Off’} to turn on or off the shiftable appliances. The controllable appliances’ actions are discretized in five different energy levels. Similarly, the ESS and EV actions are discretized with two charging and two discharging levels. The action set A, in each time step t determined by a neural network, is the ‘ON’ or ‘OFF’ actions of the time shiftable appliances, the power levels of the controllable appliances, and the charging/discharging levels of the ESS and EV. The action set at is given as:

at = (

ut,n, Et,n, EEV t,n , EESS

t,n

) , ∀t (10)

where ut,n is a binary variable to control the shiftable appliance, Et,n is the energy consump- tion of the controllable appliance, EEV

t,n is the energy consumption of EV and EESS t,n is the

energy consumption of ESS.

Energies 2022, 15, 8235 7 of 20

3.2.3

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

**About Wridemy**

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer **HIGH QUALITY & PLAGIARISM FREE** Papers.

**How It Works**

To make an Order you only need to click on **“Order Now”** and we will
direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

**Are there Discounts?**

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

__Hire a tutor today __**CLICK HERE** to make your first order

**CLICK HERE**to make your first order