Data handling

This package provides methods for convenient handling of x-y values data (e.g., time-concentration profiles) and organizing these data as objects of the class XYData. These objects can be used for direct access to the data, creating figures such as time-values profiles or predicted-vs-observed plots, or calculating residuals between observed and simulated data. Data for creation of an XYData object can come from a simulation, an excel file (e.g. observed data), or from array created in R.

`XYData` overview

An XYData object describes the following properties of data:

x values
y values
Error of y values
Dimension of x and y values
Units the values are stored in (e.g., minutes or hours for the dimension Time).

An object of XYData can be created from scratch using pre-defined x- and y- (and, optionally, y-error) values. Each object of XYData must have a label which serves as a name for the particular data set.

xVals <- c(0, 1, 2, 3)
yVals <- c(4, 5, 6, 7)
yError <- c(0.1, 0, 0.1, 1)

# Create new object of type XYData
xyData <- XYData$new(xVals, yVals, yError = yError, label = "My XY Data")
print(xyData)
#> XYData: 
#>    label: My XY Data 
#>    x factor: 1 
#>    y factor: 1 
#>    x offset: 0 
#>    y offset: 0 
#>    Color: 
#>    Type: p 
#>    pch: 
#>    lty: 
#>    Data type: Unspecified 
#>    X dimension: Time 
#>    X unit: min 
#>    Y dimension: Dimensionless 
#>    Y unit:  
#>    Y error unit:

Following fields are available:

Access to the x, y, and y-error values of the object:

xyData$xValues
#> [1] 0 1 2 3
xyData$yValues
#> [1] 4 5 6 7
xyData$yError
#> [1] 0.1 0.0 0.1 1.0

Return the maximal and minimal values of x and y. For y values, the error (if specified) is added (subtracted for yMin) to/from the basal values!

xyData$xMax
#> [1] 3
xyData$xMin
#> [1] 0
xyData$yMax # Maximal value of 7 plus the error 1
#> [1] 8
xyData$yMin # Minimal value of 4 minus the error 0.1
#> [1] 3.9

x and y values have a Dimension and a Unit. The list of supported dimensions is provided in the enum Dimension. By default, x values have the dimension Time and the unit Minute.

xyData$xDimension
#> [1] "Time"
xyData$xUnit
#> [1] "min"
xyData$yDimension
#> [1] "Dimensionless"
xyData$yUnit
#> [1] ""
# List of all supported dimensions
length(ospDimensions)
#> [1] 91
ospDimensions[[1]]
#> [1] "Abundance per mass protein"

Setting the dimension will set the unit to the default unit of the dimension (even if the dimension did not actually change). For the list of default units, please refer to OSPS manual. The unit of the y error values is automatically set to the base unit of the dimension when changing dimensions.

xyData$yDimension
#> [1] "Dimensionless"
xyData$yUnit
#> [1] ""
xyData$yErrorUnit
#> [1] ""
# Change the dimension of y values
xyData$yDimension <- ospDimensions$`Concentration (molar)`
xyData$yUnit
#> [1] "µmol/l"
xyData$yErrorUnit
#> [1] "µmol/l"

Be aware that changing the unit of y values will not automatically change the unit of the error - so it is possible to have the y values in “nmol/l” and the error in “µmol/l”.

XYData also inherits several properties from the class Plotable which are described in the section Figure creation.

`XYData` as a `Plotable` object

The class XYData inherits properties from the class Plotable that are being used when creating figures with XYData (see also Figure creation). Among other properties, the class Plotable provides properties for transformation of x- and y-values. The properties xOffset and yOffset are added to x- and y-values, respectively, when processing (e.g., creating a figure) XYData. The properties xFactor and yFactor allow to scale the values by multiplying them by a certain factor. Keep in mind that the output of XYData$xValues and XYData$yValues is not affected by setting the offset and the factor, so it is always possible to access the untransformed values. In contrast, values of the fields xMax, xMin, yMax, and yMin are transformed before output.

xVals <- c(0, 1, 2, 3)
yVals <- c(4, 5, 6, 7)
yError <- c(0.1, 0, 0.1, 1)

xyData <- XYData$new(xVals, yVals, yError = yError, label = "My XY Data")
xyData$xValues
#> [1] 0 1 2 3
xyData$yValues
#> [1] 4 5 6 7

xyData$xMax
#> [1] 3
xyData$xMin
#> [1] 0
xyData$yMax # Maximal value of 7 plus the error 1
#> [1] 8
xyData$yMin # Minimal value of 4 minus the error 0.1
#> [1] 3.9

# Offset x values by -1 and y values by 2
xyData$xOffset <- -1
xyData$yOffset <- 2

# xValues and yValues are unaffected
xyData$xValues
#> [1] 0 1 2 3
xyData$yValues
#> [1] 4 5 6 7

xyData$xMax # Maximal value of 3 plus offset -1
#> [1] 2
xyData$xMin # Minimal value of 0 plus offset -1
#> [1] -1
xyData$yMax # Maximal value of 7 plus the error 1 plus offset 2
#> [1] 10
xyData$yMin # Minimal value of 4 minus the error 0.1 plus offset 2
#> [1] 5.9

# Set y factor to 0.5
xyData$yFactor <- 0.5

xyData$yMax # Maximal value of 7 plus the error 1 plus offset 2 multiplied by 0.5
#> [1] 5
xyData$yMin # Minimal value of 4 minus the error 0.1 plus offset 2 multiplied by 0.5
#> [1] 2.95

Loading data from excel as `XYData`

A special extension of XYData is XYData, a class designed for time-values data used in modeling and simulation activities with the OSPS. In addition to information stored by the XYData class, following meta data fields are available.

StudyId
PatientId
Organ
Compartment
Species
Gender
Molecule
MW (molecular weight in case the data describes concentrations or amounts of a chemical molecule)
GroupId

If the data set describes amounts or concentrations of a chemical compound, the field $MW should be set to the molecular weight (in g/mol) of the molecule to enable conversion between dimensions, as often used for data plotting.

Usually, XYData objects are created from an excel file by calling the method readOSPSTimeValues. The excel file must have the structure as defined in the Work Instruction “Modeling & Simulation Data Structure”. Additionally, for each Molecule imported, a sheet in the “IN VIVO/IN VITRO COMPOUND PROPERTIES” file (see the Work Instruction) must be present to retrieve the molecular weight of the imported molecule.

The method readOSPSTimeValues reads the contents of the excel file and generates a list of XYData objects according to the groupings of the data. The data are generally grouped by the sheets of the excel file and the specified columns.

To read the data using readOSPSTimeValues, an object of the class DataConfiguration must be created and passed to the method. DataConfiguration is a structure containing following information:

dataFolder: Path to the folder where the excel files are located
dataFile: Name of the excel file with time-values data. Must be located in dataFolder.
compoundPropertiesFile: Name of the excel file with compound properties. Must be located in dataFolder.
dataSheets: Name of the data sheet(s) with time-values data within the excel file. Multiple sheets can be processed.
columnsToSplitBy: A list containing the names of the columns by which the data will be split into separate data sets.
XValuesColumn: Index of the column that contains the x values.
YValuesColumn: Index of the column that contains the y values.
YErrorColumn: Index of the column that contains the error values of the y values.

The values of columnsToSplitBy, XValuesColumn, YValuesColumn, and YErrorColumn are set to default values but can be changed by the user.

dataConf <- DataConfiguration$new(
  dataFolder = file.path(getwd(), "..", "tests", "data"),
  dataFile = "CompiledDataSet.xlsx",
  compoundPropertiesFile = "Compound_Properties.xlsx",
  dataSheets = c("TestSheet_1")
)
print(dataConf)
#> DataConfiguration: 
#>    Data file directory: C:/Users/IndrajeetPatil/Documents/GitHub/esqlabsRLegacy/vignettes/../tests/data 
#>    Data file: CompiledDataSet.xlsx 
#>    Sheets: TestSheet_1 
#>    CompoundProperties file: Compound_Properties.xlsx 
#>    Columns to split by: Group Id Gender Patient Id Dose Route Molecule Organ Compartment 
#>    X values column: 10 
#>    Y values column: 11 
#>    Y error column: 12

dataConf$columnsToSplitBy
#> [1] "Group Id"    "Gender"      "Patient Id"  "Dose"        "Route"      
#> [6] "Molecule"    "Organ"       "Compartment"

Reading the file produces two data sets, as there are two different values for IndividualId contained in the data sheet. The method automatically recognizes and sets the dimensions and units of y and y values.

observedData <- readOSPSTimeValues(dataConfiguration = dataConf)
#> Warning in stringToNum(yVals): NAs introduced by coercion
names(observedData)
#> [1] "TestSheet_1"

observedData[[1]]
#> $Male
#> $Male$Ind1
#> $Male$Ind1$iv
#> $Male$Ind1$iv$Dapagliflozin
#> $Male$Ind1$iv$Dapagliflozin$PeripheralVenousBlood
#> $Male$Ind1$iv$Dapagliflozin$PeripheralVenousBlood$Plasma
#> XYData: 
#>    label: TestSheet_1.Male.Ind1.iv.Dapagliflozin.PeripheralVenousBlood.Plasma 
#>    x factor: 1 
#>    y factor: 1 
#>    x offset: 0 
#>    y offset: 0 
#>    Color: 
#>    Type: p 
#>    pch: 
#>    lty: 
#>    Data type: Observed 
#>    X dimension: Time 
#>    X unit: h 
#>    Y dimension: Concentration (mass) 
#>    Y unit: ng/ml 
#>    Y error unit: ng/ml 
#> 
#> 
#> 
#> 
#> 
#> 
#> $Female
#> $Female$Ind2
#> $Female$Ind2$iv
#> $Female$Ind2$iv$Dapagliflozin
#> $Female$Ind2$iv$Dapagliflozin$PeripheralVenousBlood
#> $Female$Ind2$iv$Dapagliflozin$PeripheralVenousBlood$Plasma
#> XYData: 
#>    label: TestSheet_1.Female.Ind2.iv.Dapagliflozin.PeripheralVenousBlood.Plasma 
#>    x factor: 1 
#>    y factor: 1 
#>    x offset: 0 
#>    y offset: 0 
#>    Color: 
#>    Type: p 
#>    pch: 
#>    lty: 
#>    Data type: Observed 
#>    X dimension: Time 
#>    X unit: h 
#>    Y dimension: Concentration (mass) 
#>    Y unit: ng/ml 
#>    Y error unit: ng/ml

Note that the dataType field is automatically set to “Observed”.

XYData overview

XYData as a Plotable object

Loading data from excel as XYData

`XYData` overview

`XYData` as a `Plotable` object

Loading data from excel as `XYData`