Copyright © 2022 Oh, Jung, Onen and Lee.The demand response (DR) program is a promising way to increase the ability to balance both supply and demand, optimizing the economic efficiency of the overall system. This study focuses on the DR participation strategy in terms of aggregators who offer appropriate DR programs to customers with flexible loads. DR aggregators engage in the electricity market according to customer behavior and must make decisions that increase the profits of both DR aggregators and customers. Customers use the DR program model, which sends its demand reduction capabilities to a DR aggregator that bids aggregate demand reduction to the electricity market. DR aggregators not only determine the optimal rate of incentives to present to the customers but can also serve customers and formulate an optimal energy storage system (ESS) operation to reduce their demands. This study formalized the problem as a Markov decision process (MDP) and used the reinforcement learning (RL) framework. In the RL framework, the DR aggregator and each customer are allocated to each agent, and the agents interact with the environment and are trained to make an optimal decision. The proposed method was validated using actual industrial and commercial customer demand profiles and market price profiles in South Korea. Simulation results demonstrated that the proposed method could optimize decisions from the perspective of the DR aggregator.