High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical applications. Thus, the control policy based on a quantum system model may be unpractical for quantum gate design. Also, quantum measurements collapse quantum states, which makes it challenging to obtain information through measurements during the control process. In this paper, we propose a novel training framework using deep reinforcement learning for model-free quantum control. The proposed framework relies only on the measurement at the end of the control process and offers the ability to find the optimal control policy without access to quantum systems during the learning process. The effectiveness of the proposed technique is numerically demonstrated for model-free quantum gate design and quantum gate calibration using off-policy reinforcement learning algorithms.