XOR based clock gating & implementation:

Clock gating is way to save power in synchronous logic by temporarily shutting-off clocks in sequential logic. The clock gating logic could be based on functional behavior of sequential logic or could be purely based of detection of Traffic activity through the logic block. More details are specified in clock-gating section here : 

XOR clock gating : In order to understand XOR based clock gating, let us first understand the property of XOR logic. Here X and Y are inputs and Y is output of XOR gate.

X   Y  Output
0   0   0
0   1   1
1   0   1
1   1   0

XOR logic has a property that allows us to detect if 2 inputs are different i.e if 2 inputs are different the output is 1 otherwise its 0. Also, XOR allows the bits to be inverted if the bits are XORed with 1.

In case of XOR clock gating, the information(A) to be stored is inverted using XOR with 1s (Abar), then, the information to be stored is compared with current information(B) in the flops (again using XOR) and if more than 50% of the bits differ in terms of polarity, then the inverted information (Abar) is stored instead of information(A). This reduces number of bit-flips required for storage of information of A into B thereby saving switching power.

For E.g:
Consider a 10-bit Sequential logic (B[9:0]) storing a value on a valid and this value needs to be written and read (bout) every clock cycle.

logic [9:0] A;         // new info to be written
logic [9:0] Ainv;      // Xored info
logic [9:0] Abar;      // inverted info
logic [9:0] B;         // info to be Stored
logic [3:0] Ainv_cnt;  // Inverted count 
logic       save_inv; // Inversion indicator
logic       save_inv_ff; // Inversion indicator
logic       valid;    // incoming valid 
logic [9:0] b_out;     // info to be read out

assign Abar =  A ^ {10{1'b1}};
assign Ainv =  A ^ B;

// Count number of 1s
always_comb begin
  Ainv_cnt = '0;  
  for (int cnt = 0; cnt < 10; cnt++) begin
    Ainv_cnt += Ainv[cnt];
  end
end

// Detect if bit-flips are more than 50%
assign save_inv = (Ainv_cnt > 5) ? 1'b1 : 1'b0;

always_ff (@posdege clk or negedge reset) begin
 if(reset) 
   sav_inv_ff <= 1'b0;
 else
   save_inv_ff <= save_inv; 
end

// Store A or Abar to minimize switching
always_ff (@posdege of clk or negedge reset) begin
  if(reset) 
    b <= 10'b0;
  elsif(save_inv & Valid)
    b <= Abar;
  elsif (~save_inv & Valid)
    b <= A;
  else
    b <= b;
end
  
// On read-out, make sure to read correct stored info.
bout = save_inv_ff ? b ^ {10{1'b1}} : b;


Please note that XOR logic for inversion and adders gates for counting the bits increases the combinational gate count of the logic. Also an extra bit(save_inv_ff) is stored additionally. Therefore, any power savings here comes at a cost of Area increase. This Area increase will increase static power but will reduce dynamic or switching power of Flops. Therefore careful analysis is recommended before using this technique. 

In general, this technique is more suitable for highly correlated data. For E.g Media or Video type workloads.