diff --git a/docs/allapi.rst b/docs/allapi.rst
deleted file mode 100644
index ca48fa77c6..0000000000
--- a/docs/allapi.rst
+++ /dev/null
@@ -1,7 +0,0 @@
-=======
-All API
-=======
-
-.. doxygenindex::
-
-
diff --git a/docs/api-library.rst b/docs/api-library.rst
new file mode 100644
index 0000000000..b9458a6772
--- /dev/null
+++ b/docs/api-library.rst
@@ -0,0 +1,11 @@
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
+
+.. _api-library:
+
+=============
+API library
+=============
+
+.. doxygenindex::
diff --git a/docs/index.rst b/docs/index.rst
index aacc95593b..a9960b2019 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -1,11 +1,28 @@
-****
-RCCL
-****
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
 
-The ROCm Collective Communication Library (RCCL) is a stand-alone library which provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
+.. _index:
 
-RCCL (pronounced “Rickel”) implements routines such as all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, all-to-allv, and all-to-all as well as direct point-to-point (GPU-to-GPU) send and receive operations.
+===========================
+RCCL documentation
+===========================
 
-The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, RDMA based collectives. RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
+Welcome to the ROCm Collective Communication Library (RCCL) docs home page! To learn more, see :ref:`what-is-rccl`.
 
-RCCL supports an arbitrary number of GPUs installed in a single-node or multi-node platform. It can be easily integrated into either single- or multi-process (e.g., MPI) applications.
+Our documentation is structured as follows:
+
+
+.. grid:: 2
+  :gutter: 3
+
+  .. grid-item-card:: API reference
+
+    * :ref:`Library specification<library-specification>`
+    * :ref:`api-library`
+       
+To contribute to the documentation refer to
+`Contributing to ROCm  <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
+
+Licensing information can be found on the
+`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
diff --git a/docs/api.rst b/docs/library-specification.rst
similarity index 84%
rename from docs/api.rst
rename to docs/library-specification.rst
index 0b4cdafed3..280c88ecb6 100644
--- a/docs/api.rst
+++ b/docs/library-specification.rst
@@ -1,14 +1,16 @@
-.. toctree::
-   :maxdepth: 4
-   :caption: Contents:
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
 
-===
-API
-===
+.. _library-specification:
 
-This section provides details of the library API
+============================
+RCCL library specification
+============================
 
-Communicator Functions
+This document provides details of the API library. 
+
+Communicator functions
 ----------------------
 
 .. doxygenfunction:: ncclGetUniqueId
@@ -27,7 +29,7 @@ Communicator Functions
 
 .. doxygenfunction:: ncclCommUserRank
 
-Collective Communication Operations
+Collective communication operations
 -----------------------------------
 
 Collective communication operations must be called separately for each communicator in a communicator clique.
@@ -58,7 +60,7 @@ Since they may perform inter-CPU synchronization, each call has to be done from
 
 .. doxygenfunction:: ncclAllToAll
 
-Group Semantics
+Group semantics
 ---------------
 When managing multiple GPUs from a single thread, and since NCCL collective
 calls may perform inter-CPU synchronization, we need to "group" calls for
@@ -78,7 +80,7 @@ of ncclGroupStart/ncclGroupEnd.
 
 .. doxygenfunction:: ncclGroupEnd
 
-Library Functions
+Library functions
 -----------------
 
 .. doxygenfunction:: ncclGetVersion
@@ -108,7 +110,3 @@ This section provides all the enumerations used.
 .. doxygenenum:: ncclRedOp_t
 
 .. doxygenenum:: ncclDataType_t
-
-
-
-
diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in
index 55cfb98019..ce1d262a08 100644
--- a/docs/sphinx/_toc.yml.in
+++ b/docs/sphinx/_toc.yml.in
@@ -1,10 +1,15 @@
 root: index
 subtrees:
 - entries:
-  - file: api
-  - file: allapi
-  - file: attributions
+  - file: what-is-rccl
+- caption: API reference 
+  entries:
+  - file: library-specification
+    title: Library specification
+  - file: api-library
+  
 - caption: About
   entries:
   - file: license
+  - file: attributions
 
diff --git a/docs/what-is-rccl.rst b/docs/what-is-rccl.rst
new file mode 100644
index 0000000000..110b4651c8
--- /dev/null
+++ b/docs/what-is-rccl.rst
@@ -0,0 +1,16 @@
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
+
+.. _what-is-rccl:
+
+=====================
+What is RCCL?
+=====================
+
+RCCL (pronounced “Rickel”) is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
+It implements routines such as `all-reduce`, `all-gather`, `reduce`, `broadcast`, `reduce-scatter`, `gather`, `scatter`, `all-to-allv`, and `all-to-all` as well as direct point-to-point (GPU-to-GPU) send and receive operations.
+The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, and RDMA based collectives. 
+
+RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
+It supports an arbitrary number of GPUs installed in a single-node or multi-node platform and can be easily integrated into single- or multi-process (e.g., MPI) applications.