Ingest csv data into data warehouse flows:
Detect schema from csv file
Register detected schema to data warehouse system
Ingest data into registered schema
Detect schema from csv file
char to distinguish between fields
csv_setting.include_header
true means this csv contains column name line, else false
first n lines of csv file, may or may not contain header line
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv/detect \
--header 'Content-Type: application/json' \
--data '{
"csv_setting": {
"include_header": true,
"delimiter": ","
},
"sample": "user_id,name,date_of_birth,last_payment\n1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5"
}'
Sample response:
Copy {
"schema": {
"name": "",
"db_name": "",
"organization_id": 0,
"display_name": "",
"columns": [
{
"class_name": "int32",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_as_timestamp": false,
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
],
"primary_keys": [],
"partition_by": [],
"order_bys": [],
"temporary": false
},
"csv_setting": {
"include_header": true,
"delimiter": ",",
"add_batch_info": false
},
"records": [
[
1,
"Nguyen Van A",
"1997-05-15T17:00:00.000+00:00",
99.9
],
[
2,
"Nguyen Thi B",
"2000-01-24T17:00:00.000+00:00",
100.0
],
[
3,
"Tran Thi C",
"2020-12-31T17:00:00.000+00:00",
25.5
]
]
}
Register csv schema
key that has csv ingestion permission
JSON object return from detect schema api, need to provide this object with two additional field: db_name
and name to denote destination of data to data warehouse
name of database in data warehouse
name of table in data warehouse
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv/schema \
--header 'Content-Type: application/json' \
--data '{
"api_key": "cccccccc-14a1-4eb1-8964-000000000000",
"schema": {
"organization_id": 0,
"name": "sale_records_100",
"db_name": "csv_ingestion",
"display_name": "",
"columns": [
{
"class_name": "string",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
]
}
}'
Sample response:
Copy {
"name": "sale_records_100",
"db_name": "csv_ingestion",
"organization_id": 0,
"display_name": "",
"columns": [
{
"class_name": "string",
"name": "user_id",
"display_name": "user_id",
"is_nullable": true
},
{
"class_name": "string",
"name": "name",
"display_name": "name",
"is_nullable": true
},
{
"class_name": "datetime",
"name": "date_of_birth",
"display_name": "date_of_birth",
"input_as_timestamp": false,
"input_formats": [
"dd/MM/yyyy"
],
"is_nullable": true
},
{
"class_name": "double",
"name": "last_payment",
"display_name": "last_payment",
"is_nullable": true
}
],
"primary_keys": [],
"partition_by": [],
"order_bys": [],
"temporary": false
}
Ingest data into warehouse
key that has csv ingestion permission
name of destination database
name of destination table
lines of csv file to be ingested. A batch is a string which contains csv lines separated by '\n' character, and does not contain header line. Recommended number of lines in a batch is 1000 lines and are called sequentially.
Sample curl:
Copy curl --request POST \
--url http://[client].datainsider.co/api/ingestion/csv \
--header 'Content-Type: application/json' \
--data '{
"api_key": "cccccccc-14a1-4eb1-8964-000000000000",
"db_name": "csv_ingestion",
"tbl_name": "sale_records_100",
"csv_setting": {
"include_header": true,
"delimiter": ","
},
"data": "1,Nguyen Van A,16/05/1997,99.90\n2,Nguyen Thi B,25/01/2000,100.0\n3,Tran Thi C,01/01/2021,25.5\n4,Nguyen Van D,26/06/2006,919.90\n5,Nguyen Thi E,05/10/2001,200.0\n6,Tran Thi F,31/12/2020,75.5"
}'
Sample response:
Copy {
"succeed": true,
"row_inserted": 6
}