Ingest csv data into data warehouse flows:
Detect schema from csv file
Register detected schema to data warehouse system
Ingest data into registered schema
Detect schema from csv file
Parameter Type Description char to distinguish between fields
csv_setting.include_header
true means this csv contains column name line, else false
first n lines of csv file, may or may not contain header line
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv/detect \
--header 'Content-Type: application/json' \
--data '{
"csv_setting": {
"include_header": true,
"delimiter": ","
},
"sample": "user_id,name,date_of_birth,last_payment\n1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5"
}'
Sample response:
Copy {
"schema": {
"name": "",
"db_name": "",
"organization_id": 0,
"display_name": "",
"columns": [
{
"class_name": "int32",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_as_timestamp": false,
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
],
"primary_keys": [],
"partition_by": [],
"order_bys": [],
"temporary": false
},
"csv_setting": {
"include_header": true,
"delimiter": ",",
"add_batch_info": false
},
"records": [
[
1,
"Nguyen Van A",
"1997-05-15T17:00:00.000+00:00",
99.9
],
[
2,
"Nguyen Thi B",
"2000-01-24T17:00:00.000+00:00",
100.0
],
[
3,
"Tran Thi C",
"2020-12-31T17:00:00.000+00:00",
25.5
]
]
}
Register csv schema
Parameter Type Description key that has csv ingestion permission
JSON object return from detect schema api, need to provide this object with two additional field: db_name
and name to denote destination of data to data warehouse
name of database in data warehouse
name of table in data warehouse
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv/schema \
--header 'Content-Type: application/json' \
--data '{
"api_key": "cccccccc-14a1-4eb1-8964-000000000000",
"schema": {
"organization_id": 0,
"name": "sale_records_100",
"db_name": "csv_ingestion",
"display_name": "",
"columns": [
{
"class_name": "string",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
]
}
}'
Sample response:
Copy {
"name": "sale_records_100",
"db_name": "csv_ingestion",
"organization_id": 0,
"display_name": "",
"columns": [
{
"class_name": "string",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_as_timestamp": false,
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
],
"primary_keys": [],
"partition_by": [],
"order_bys": [],
"temporary": false
}
Ingest data into warehouse
Parameter Type Description key that has csv ingestion permission
name of destination database
name of destination table
lines of csv file to be ingested. A batch is a string which contains csv lines separated by '\n' character, and does not contain header line. Recommended number of lines in a batch is 1000 lines and are called sequentially.
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv \
--header 'Content-Type: application/json' \
--data '{
"api_key": "cccccccc-14a1-4eb1-8964-000000000000",
"db_name": "csv_ingestion",
"tbl_name": "sale_records_100",
"csv_setting": {
"include_header": true,
"delimiter": ","
},
"data": "1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5\n4,Nguyen Van D,26/06/2006,919.90\n5,Nguyen Thi E,05/10/2001,200.0\n6,Tran Thi F,31/12/2020,75.5"
}'
Sample response:
Copy {
"succeed": true,
"row_inserted": 6
}