CSV upload

Ingest csv data into data warehouse flows:

  1. Detect schema from csv file

  2. Register detected schema to data warehouse system

  3. Ingest data into registered schema

Detect schema from csv file

Parameter
Type
Description

csv_setting.delimiter

string

char to distinguish between fields

csv_setting.include_header

boolean

true means this csv contains column name line, else false

sample

string

first n lines of csv file, may or may not contain header line

Sample curl:

curl --request POST \
  --url http://[client].datainsider.co/api/ingestion/csv/detect \
  --header 'Content-Type: application/json' \
  --data '{
    "csv_setting": {
   	 "include_header": true,
   	 "delimiter": ","
    },
    "sample": "user_id,name,date_of_birth,last_payment\n1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5"
}'

Sample response:

{
  "schema": {
	"name": "",
	"db_name": "",
	"organization_id": 0,
	"display_name": "",
	"columns": [
  	{
    	"class_name": "int32",
    	"name": "user_id",
    	"display_name": "user_id",
    	"is_nullable": true
  	},
  	{
    	"class_name": "string",
    	"name": "name",
    	"display_name": "name",
    	"is_nullable": true
  	},
  	{
    	"class_name": "datetime",
    	"name": "date_of_birth",
    	"display_name": "date_of_birth",
    	"input_as_timestamp": false,
    	"input_formats": [
      	"dd/MM/yyyy"
    	],
    	"is_nullable": true
  	},
  	{
    	"class_name": "double",
    	"name": "last_payment",
    	"display_name": "last_payment",
    	"is_nullable": true
  	}
	],
	"primary_keys": [],
	"partition_by": [],
	"order_bys": [],
	"temporary": false
  },
  "csv_setting": {
	"include_header": true,
	"delimiter": ",",
	"add_batch_info": false
  },
  "records": [
	[
  	1,
  	"Nguyen Van A",
  	"1997-05-15T17:00:00.000+00:00",
  	99.9
	],
	[
  	2,
  	"Nguyen Thi B",
  	"2000-01-24T17:00:00.000+00:00",
  	100.0
	],
	[
  	3,
  	"Tran Thi C",
  	"2020-12-31T17:00:00.000+00:00",
  	25.5
	]
  ]
}

Register csv schema

Parameter
Type
Description

api_key

string

key that has csv ingestion permission

schema

json

JSON object return from detect schema api, need to provide this object with two additional field: db_name and name to denote destination of data to data warehouse

schema.db_name

string

name of database in data warehouse

schema.name

string

name of table in data warehouse

Sample curl:

curl --request POST \
  --url http://[client].datainsider.co/api/ingestion/csv/schema \
  --header 'Content-Type: application/json' \
  --data '{
    "api_key": "cccccccc-14a1-4eb1-8964-000000000000",
    "schema": {
   	 "organization_id": 0,
   	 "name": "sale_records_100",
   	 "db_name": "csv_ingestion",
   	 "display_name": "",
   	 "columns": [
   		 {
   			 "class_name": "string",
   			 "name": "user_id",
   			 "display_name": "user_id",
   			 "is_nullable": true
   		 },
   		 {
   			 "class_name": "string",
   			 "name": "name",
   			 "display_name": "name",
   			 "is_nullable": true
   		 },
   		 {
   			 "class_name": "datetime",
   			 "name": "date_of_birth",
   			 "display_name": "date_of_birth",
   			 "input_formats": [
   				 "dd/MM/yyyy"
   			 ],
   			 "is_nullable": true
   		 },
   		 {
   			 "class_name": "double",
   			 "name": "last_payment",
   			 "display_name": "last_payment",
   			 "is_nullable": true
   		 }
   	 ]
    }
}'

Sample response:

{
  "name": "sale_records_100",
  "db_name": "csv_ingestion",
  "organization_id": 0,
  "display_name": "",
  "columns": [
	{
  	"class_name": "string",
  	"name": "user_id",
  	"display_name": "user_id",
  	"is_nullable": true
	},
	{
  	"class_name": "string",
  	"name": "name",
  	"display_name": "name",
  	"is_nullable": true
	},
	{
  	"class_name": "datetime",
  	"name": "date_of_birth",
  	"display_name": "date_of_birth",
  	"input_as_timestamp": false,
  	"input_formats": [
    	"dd/MM/yyyy"
  	],
  	"is_nullable": true
	},
	{
  	"class_name": "double",
  	"name": "last_payment",
  	"display_name": "last_payment",
  	"is_nullable": true
	}
  ],
  "primary_keys": [],
  "partition_by": [],
  "order_bys": [],
  "temporary": false
}

Ingest data into warehouse

Parameter
Type
Description

api_key

string

key that has csv ingestion permission

db_name

string

name of destination database

tbl_name

string

name of destination table

csv_setting

json

setting of csv file

data

string

lines of csv file to be ingested. A batch is a string which contains csv lines separated by '\n' character, and does not contain header line. Recommended number of lines in a batch is 1000 lines and are called sequentially.

Sample curl:

curl --request POST \
  --url http://[client].datainsider.co/api/ingestion/csv \
  --header 'Content-Type: application/json' \
  --data '{
    "api_key": "cccccccc-14a1-4eb1-8964-000000000000",
    "db_name": "csv_ingestion",
    "tbl_name": "sale_records_100",
    "csv_setting": {
   	 "include_header": true,
   	 "delimiter": ","
    },
    "data": "1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5\n4,Nguyen Van D,26/06/2006,919.90\n5,Nguyen Thi E,05/10/2001,200.0\n6,Tran Thi F,31/12/2020,75.5"
}'

Sample response:

{
  "succeed": true,
  "row_inserted": 6
}

Last updated