User Guide

Loading data into workspace

AnalytiXagility has been designed to handle the import and processing of numerous types of data from multiple sources into a secure, multi-tenanted environment.

  • Users can securely upload CSV data using the web interface and via the SFTP protocol, into any workspace they are a member of, where it is stored as data tables.
  • Mapping of CSV fields to data table columns takes place automatically, however it is also possible to provide definition files that specifically inform the platform as to how the data should be loaded.
  • Aridhia can also provide custom file transfer mechanisms (e.g. from genomics labs) where required.
  • Multiple users within one workspace can be granted permissions to upload data.
 

Uploading data using the web interface

drop down menu, grabbed on 17.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#/workspaces/1150Users can upload tabular data formatted as .csv files from the web interface. To do so users must navigate to the desired workspace they want to uploaded data into. add icon, grabbed on 17.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#/workspaces/1150
When in workspace, navigate to ‘Add’ menu and select ‘Upload Data’. In the ‘Upload Data’ panel browse and select the .csv file to upload and click ‘Upload File’.

, grabbed on 24.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#Users can import data as a new table or into an existing table.

The next menu allows the user to configure columns and change the name of the table.

Users can select the delimiter used to separate data fields and select whether to include column headers in the new table.

, grabbed on 24.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#Pressing ‘Import Data’ will upload data from your .csv file to the workspace. When upload is completed a new note will appear in the ‘Summary tab’ notifying if upload was completed successfully.
, grabbed on 24.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#

 

Uploading data using SFTP

This section provides guidance on how to use an FTP client to securely copy data from your machine into your AnalytiXagility workspace via SFTP.

Before you can upload a data file:

  • You must have FTP client software which supports the SFTP protocol and AES-256 cypher strength (most recent clients offer this support as standard). See examples in the Before you can upload data section below.
  • You must have the domain name of the AnalytiXagility SFTP server to upload data to. See more details in the Before you can upload data section below.
 

Before you can upload data

You must have SFTP client software in order to start uploading data to your workspace.

Several free or low-cost Windows SFTP clients have been used successfully, including WinSCP, FileZilla and the PuTTY PSFTP program. Other third-party SFTP clients are available and there is no barrier to using them at present.

  • You must have the network address of the AnalytiXagility SFTP server.
    • These will be supplied by email.
    • If you do not have this access to this SFTP account, you must ask the user who does to upload the data for you.
 

Connecting to the SFTP using WinSCP

SFTP using WinSCP, grabbed on 17.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#These details are supplied to the identified workspace Administrator by Aridhia’s service desk during the setup process. If you need to contact the service desk, please email servicedesk@aridhia.com. If you are not the workspace owner or Administrator, and do not have access to these credentials, they may provide them to you at their discretion, or they can upload the data for you.

As an example, this is how data may be uploaded using WinSCP, a popular SFTP client.

  • Get WinSCP client – https://winscp.net/download/winscp577setup.exe
  • Run winscp577setup.exe to install WinSCP.
  • Start WinSCP client.
  • Insert your SFTP credentials
    • host name: xxxxxxx@host.com
    • username: your AnalytiXagility username
    • password: your password
    • click “Login”
  • Select the files to upload from your local machine and drag them to the upload folder, this example uses the Iris dataset.
  • Drag and drop iris.csv to data folder.SFTP using WinSCP, grabbed on 17.01.17, using XAP version 1.19.1 2740-8759a, from https://edcvaatest04.aridhiatest.net/#
  • Once the data file has been copied in full to the data folder, the platform will automatically start to load the data into the workspace.
  • Once the platform has completed processing the data, the following files are created in the upload folder.
    • A log file reporting the success or failure of the upload, giving detailed information on the process of how data was loaded in case of the successful upload, or why the upload failed. The resulting file has a name in the following format:
      <filename>_<success/fail>_<year>
      <month>
      <day>_<hours>
      <mins>
      <secs>.txt

      For example:

      iris_success_20160306_121535.txt
      iris_fail_20160307_142743.txt

The time needed to upload and process the data is dependent on the size of the data file and the speed of the network connection. If your SFTP client supports it, enabling compression can reduce the transfer time.

 

Preparing CSV files for uploading with SFTP

To load data into the platform, you must supply CSV files containing the data you wish to upload. The platform will attempt to determine the following from the files:

  • the character encoding
  • the character used as the column delimiter
  • the names of the columns
  • the character used as the text-qualifier

There are limitations on what the platform can automatically determine, and sometimes you need to help the process along by supplying some of this information in a Table Definition file, as described below in Using Table Definition files with SFTP uploads. To get started, upload your CSV file and let the platform attempt to identify the contents. By modifying the contents of the Table Definition file returned by the platform, you can quickly define a correct Table Definition file.

The rest of this section describes:

  • the file name that should be used
  • the contents of the CSV file
 

The file name

For a CSV file to be successfully uploaded, the file name must:

  • start with an alphabetic character (a-z/A-Z)
  • only contain alphanumeric characters (a-z/A-Z/0-9)
  • have the extension .csv

Data from the CSV file is loaded into a table in your workspace. The name of the table is based on the name of the CSV file, with the following changes:

  • the leading and trailing spaces are removed from the name
  • the name is trimmed to a maximum of 60 characters

Alternatively, the name of the table can be specified in a Table Definition file supplied with the CSV file.

For best results, the table name and file name should:

  • contain only lowercase letters
  • not include any punctuation
  • not include any special characters
 

Contents of the csv file

The platform can automatically process CSV files with the following format characteristics:

  • The first row is a header row defining the column names.
  • Values are separated by one of the following characters:
    • comma (,)
    • colon (:)
    • pipe (|)
    • space ( )
    • tab ( )
  • The text-qualifier character either a double-quote (“) or a single-quote (‘).
  • Line endings are either LF or CRLF (and the line endings are consistent throughout the file).
  • Files are encoded using UTF-8 or ISO-8859-1.

If your file format does not match these characteristics, then you need to prepare and submit a Table Definition file along with the CSV file. Table Definition files allow detailed control over the format of your CSV files, see Using Table Definition files with SFTP uploads for more details.

 

Using table definition files with SFTP uploads

The platform attempts to automatically interpret CSV files, determining details like the character encoding, column delimiter, text-qualifier, the names of the table and columns from the data in the CSV file. Sometimes, the values for these properties fall outside the range that the platform can automatically determine and in such cases, the details need to be supplied in a Table Definition file sent along with the CSV file. Note that if a Table Definition file is to be supplied, it should be uploaded before the associated CSV file.

If supplied, the Table Definition file must have the same name as the CSV file, but with the extension “.xml”. For example, if you supply a data file called myresearchdata.csv, the corresponding Table Definition file must be called myresearchdata.xml.

 

The format of the table definition file

Table Definition files are XML documents containing the following elements:

<?xml version="1.0" encoding="utf-8"?>
<TableDefinition TableName='the filename or what the user specifies'Action='create or append'>
<Columns>
<Column Name="column-name" Type="##PostgreSQL data-type##" />...</Columns>
<Format 
Delimiter='column-delimiter'
TextQualifier='text-qualifier'
NullQualifier='replace these values with null'
Encoding='UTF-8'
Header='first line of csv is the header'
HeaderCase='change the column-header casing'
DateFormat='the format of all dates in the csv'
/>
<Success RemovefromUpload='delete source files on SFTP'/>
<Fail RemovefromUpload='delete source files on SFTP'/>
</TableDefinition>

Please note that:

  • A PostgreSQL data type definition cannot contain column constraints or default values.  See http://www.postgresql.org/docs/8.2/static/datatype.html for more details about PostgreSQL data types. Note that the platform is built on the Greenplum database, which in turn is based on PostgreSQL 8.2.
  • Where columns are supplied they must match the number of columns in the accompanying CSV file.
  • The file is case-sensitive, so ensure that all attributes and elements follow the casing set out in the templates outlined here.
  • A limited set of XML Named Character References are supported for the Table Definition file: T
 

The table definition element

Providing the table definition element is mandatory; any missing or blank attributes will be handled as follows:

TableName

The destination table for the accompanying data file that will be loaded into the platform. If not supplied or blank, use the file name.

Action

Specifies whether the data is loaded into a brand new table (create) or inserted at the end of an existing table (append). If not supplied or blank, “create” is assumed.

For create:

  • If a table of the same name already exists, it is renamed with a timestamp before the new table is created

For append:

  • If a table with the name does not exist, then a new one is created.
  • If data being appended is narrower than the destination table, then all missing values are set to null.
  • If data being appended is wider than the destination table, then the surplus columns are quietly omitted.
 

Data transformation during upload processes

When your .csv file is loaded to the platform it is stored in the AnalytiXagility platform as a table. Note, that when loading without supplying table definition file, all special characters [, . ? , “ $ % ^ ( ! #  )] in the file name will be transformed to underscore “_” and capital letters will become lowercase.

E.g. if you upload file Aridhia-1.csv table would be saved as aridhia_1 This also applies to table column headings

E.g. Column.1 would change to column_1

Here is the list of all changes which are made by the platform when uploading via SFTP.

  • Within the file name, the following special characters are replaced by an underscore (_):
    – caret (^)
    – full stop (.)
    – single quote (‘)
    – double quote (“)
    – space ( )
    – dash (-)
  • An empty value with no quotations is translated to NULL.
  • An empty value with quotations is translated to an empty string.
  • All columns in the table are nullable.
  • Missing fields at the end of rows are assumed to be NULL.
  • If a column name is not supplied, the column name is given a unique name: column1, column2
  • Column names:
    – Will be trimmed to remove any leading or trailing space characters.
    – Will be trimmed to 60 characters.
    – Will be made unique by appending a number to the name.
  • Within a column name, the following special characters are replaced by an underscore (_):
    – caret (^)
    – full stop ()
    – single quote (‘)
    – double quote (“)
    – space ( )
  • The platform does not support the data type “timestamp with timezone”. Any field of this type is treated as a “timestamp without time zone”.

We could otherwise use a Table Definition file to specify table names and column headers. Please refer to The Table Definition Element section above.

 

To contact support, send an email to the Aridhia service desk: servicedesk@aridhia.com.