Граф коммитов

15 Коммитов

Автор SHA1 Сообщение Дата
Bill(Shuzhou) Liu 66e4e790c3 Add SSL mutual authentication support for rdci
The RDC API is changed to pass the certificates to the gRPC.

Add the support to add all GPUs in the host to a group. Also before
add a GPU to a group, the RDC API will verify that GPU exists or not.

Add the support to fetch the temperature metrics.

Change-Id: I5857ef03fede233d16e8b2836be120f33172da93
2020-08-17 14:07:25 -05:00
Chris Freehill 023de40df7 Add return value for Make RdcStandaloneHandler::error_handle
Quiets compile warning.

Change-Id: I5e7454f56e824e2304c790bac729cfa0fcf78603
2020-08-17 14:07:25 -05:00
Chris Freehill 47fdfa4c7e Add support for gRPC authenticated communications
Also, make a few namespace corrections and some minor refactoring.

Change-Id: Iedcaf6b43cb7576bc11dfefe980abd190c838831
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 020f6939f7 SWDEV-209060 - Create the Skeleton RDC CLI and daemon
Create the skeleton implementation of rdc_client.so and rdci. Modify current rdcd to
integrate the RDC API service:

rdc.proto is changed to add a new RdcAPI service which defined the interfaces for the RDC API.

RdcStandaloneHandler.cpp is added to send the request using gRPC to the rdcd. It is built into
the rdc_client.so

rdci.cc, RdciDisCoverySubSystem.cc and RdciSubSystem.cc are added to implement skeleton rdci.
Currently, the discovery subsystem is supported.

rdc_api_service.cc is added to the server as a skeleton to implement the RdcAPI service. Currently,
only discovery API is implemented. Note: we disabled the rdc_rsmi_service, which will be removed
in the future. The original rdc_client.so is renamed to rdc_client_smi.so which should also be
removed in the future.

Add the instruction how to run the rdcd and rdci in the build folder in the README.md.

Change-Id: Id232f9f83787e5812d4a295dc8cf0daa7728b06c
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 1ff1c7b617 SWDEV-223878 - Add cache manager and watch table to skeleton rdc_lib
Support cache manager and watch table in rdc_lib

RdcCacheManagerImpl.cc is added to implement cache of metrics. Currently, only
integer mertics are supported. The cache manager provids function to retrieve the
latest and history metrics from cache. It also provides interfaces to update and evict the cache.

RdcWatchTableImpl.cc is added to implement watch and unwatch fields. It uses the
field settings to control how frequently a field needs to be updated. We have a preliminarily
performance optimization for this class as it may be called very frequently.

RdcMetricsUpdaterImpl.cc is added to run the update at background thread when
RDC_OPERATION_MODE_AUTO is set.

After this code change, the rdcd/rdci should be able to implement basic discovery, group and dmon
function. The job management function is not implemented in the skeleton rdc_lib yet.

Change-Id: I26cff8c2ec85d1ad8e7df24c66b02f0060838d37
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu a5f063f8b3 Support discovery and group management in rdc_lib
The rdc.h is modified for new discovery and grouping APIs.

The RdcGroupSettingsImpl.cc is added to implement the GPU group and
the field group management.

The RdcMetricFetcherImpl.cc is added to fetch the metrics from
rocm_smi_lib. Currently, only support power, memory, GPU utilization,
temperature, GPU clock, total device and device name.

A new example field_value_example.cc is added to demo how to record
the fields and retrieve data from cache.

Change-Id: I57acfa048fe9b3d848e2d441e768b3a63ccae3f8
2020-08-17 14:07:25 -05:00
Bill(Shuzhou) Liu 85006053ed Create the rdc.h header file and librdc_bootstrap.so
The rdc.h is the only header file will be provided to the user.
The inital version only includes the data structure and function
required for the job stats example.

The example folder has one example demonstrated how to use the API
to collect the job summary stats.

The RdcBootStrap.cc will dynamically load different libraries when user
select either the standalone or embbed mode. We also created a
dummy RdcEmbeddedHandler.cc for librdc.so.

In order to run the example after build, it needs to specify the
LD_LIBRARY_PATH. Assume current folder is the build folder:
LD_LIBRARY_PATH=$PWD/rdc_libs $PWD/example/jobstats

The folder is structured in following ways:
example
include
    - rdc - rdc.h (the only header file exposed to the user)
    - rdc_libs
          - impl
rdc_libs
    - boostrap
         - src
    - rdc
         - src
    - rdc_client
         - src
    - rdc_server
         - src

Change-Id: Ia386ddf4cabcb2dc4fe82de6464ca0619cb3d959
2020-08-17 14:07:25 -05:00
Chris Freehill 5cc498c6aa Make rdcd run as user "rdc"
The rdc account will be created on installation if it does
not already exist. It will be a system account with no
home directory.

rdcd will be started as a systemd service, but change to
user "rdc". The rdc user will drop all priviliges except
CAP_DAC_OVERRIDE, permitted. This means the default mode
will have no special privileges, but have the ability to
gain write access (e.g., to sysfs) when needed.

rdc tests were being inadvertantly added to the
installation. This was adversely impacting the new
functionality, so it was corrected in this commit.

Also included are a few small formatting changes.

Change-Id: I9c6bb132fee28119fd3960594dfb97bd2e7b282a
2020-08-17 14:07:25 -05:00
Chris Freehill 4729c47866 Add read fan values and associated tests
Change-Id: I89322e93d5f3110adace15e5a576f00d4934be79
2020-08-17 14:07:25 -05:00
Chris Freehill 02c6d3fb4d Add use of namespaces
Change-Id: I962eb808b3b874d1c3bf4cb418bf36952f88e3e2
2020-08-17 14:07:25 -05:00
Chris Freehill ca4344f5fa Add Google test based tests.
Initial testing include an "id test", which really just a
template test at this point, and a temperature sensor test.

The google test code is included in this commit. It will
eventually be taken out and replaced with a pull from a google
external repo.

Change-Id: I591818a9c169f4654fc8d8f17cf648f227d72545
2020-08-17 14:06:56 -05:00
Chris Freehill dc6f6f3e9a Break srvs. into rsmi & admin srvs. Add VerifyConnection api.
Change-Id: I67567264c37e31f3409062a14e56eba4801cd944
2020-01-09 20:02:33 -06:00
Chris Freehill 5898345d17 Initial RDC commit
Includes server, client and example targets.

Change-Id: I30596fb0453af71d49b8390a8468a6d073200836
2020-01-09 17:57:29 -06:00
Chris Freehill 0de56e087a Initial commit
Change-Id: I30d87413f6771d1d9d67cd4b2d65ed788d275533
2020-01-09 17:57:19 -06:00
Robert Gregory c13375cd45 Initial empty repository 2019-11-26 10:10:15 -05:00