Overview
The TWAICE Pull Stack is designed for batch data processing. It enables the secure and efficient extraction of data from your databases and APIs in a batched manner.
Key Value Propositions
1. Flexible Connectivity
Connection with Data Sources
On-Site, On-Premise, or Cloud: Whether your data source is hosted on-site, on-premise, or in another cloud environment, the TWAICE Pull Stack can connect seamlessly, providing flexibility and adaptability to your existing infrastructure.
2. Efficient Data Retrieval
Batch Data Pull
Optimized Pull Intervals: The Pull Stack is designed for batch data processing, with pull intervals optimized based on your infrastructure. This ensures efficient data retrieval without overloading the network or the data source. It is not suitable for stream data processing typical in bus systems.
3. Reliability and Continuity
Alerting on Data Failures
Avoiding Data Gaps: The system includes robust alerting mechanisms to notify stakeholders of any data failures, helping to avoid data gaps and ensure continuous data availability.
Automatic Backfilling
After Connection Issues: In case of connection issues, the Pull Stack supports automatic backfilling, ensuring that no data is lost and all historical data is captured once the connection is restored.
4. Security and Compliance
Static IP Addresses
For Network Security: To enhance security, static IP addresses are provided for integration with your network. This facilitates easier and more secure network configurations and whitelisting.
Site-to-Site VPN Support
Additional VPN: If additional security measures are required, an additional Site-to-Site VPN can be established separately with additional costs. This ensures secure and encrypted data transmission between your data source and the TWAICE Pull Stack.
5. Designed for Batch Processing
Optimized for Batches
Efficient Handling: The Pull Stack is specifically designed for handling batches of data, making it ideal for periodic data extraction and processing tasks. This design ensures optimal performance and reliability in batch processing environments.
System Requirements
Python Library: A Python library (version 3.12 or higher) is required for TWAICE to connect and retrieve data from your data source.
Data Source Accessibility: The Pull Stack must be able to access your data source in batches.β
Implementation Overview
Pull stack setup and testing takes approximately 2 weeks when the necessary data connection details, credentials, and parameters are provided to TWAICE.
Setup Pull Stack: TWAICE will generate and deploy a pull client to interface with your database or API. You must have a Python library (version 3.12 or higher) that is installed and configured.
Configure Site-to-Site VPN: If the database is on-premises, set up a Site-to-Site VPN to securely connect the Pull Stack to the local database.
Note: Site2Site VPN comes with additional costs and requires additional scoping.
Pull Stack is deployed in AWS Cloud: TWAICE will provision the Pull Stack in an isolated VPC within the AWS Cloud.
Configure Security Settings (optional): TWAICE will apply any necessary security configurations, including IP whitelisting and other access controls.
Schedule Batch Data Retrieval: TWAICE creates the batch processing schedule to periodically pull data from the your data source.
What do we need for configuration?
1. Credentials of the Datastore
REST Endpoint
If your data source is a REST endpoint, please provide:
URL: The endpoint URL where the data can be accessed.
Authentication Details:
API Key: If your endpoint uses API key-based authentication, provide the API key.
Token: If token-based authentication is used, provide the access token.
Username/Password: If basic authentication is used, provide the username and password.
Datastore
If your data source is a traditional datastore (e.g., SQL database, NoSQL database), please provide the following:
Connection String: The connection string or URL to access the datastore.
Authentication Details:
Username: The username with necessary permissions to access the datastore.
Password: The corresponding password for the provided username.
2. Additional Configuration Parameters
TWAICE has a required questionnaire that must be answered prior to any configuration or setup efforts. These parameters ensure that the pull client setup is aligned with your system capabilities and infrastructure.
Responses should be provided to your Customer Success contact or to support@twaice.com.
3. Data Format Requirements
To ensure compatibility with our Pull Stack, the data returned from your datastore or REST endpoint should be structured with the following fields:
Sensor Tag: A unique identifier for each sensor.
Relative Time: The time at which the data was recorded, relative to a specific reference point or epoch.
Sensor Value: The value recorded by the sensor.
Example Data Format
Here is an example of how the data should be formatted:
[ { "sensor_tag": "temperature_sensor_01", "relative_time": 1622547800, "sensor_value": 22.5 }, { "sensor_tag": "humidity_sensor_02", "relative_time": 1622547860, "sensor_value": 55.2 } ]
In this example:
sensor_tag: Identifies each sensor (e.g.,
temperature_sensor_01,humidity_sensor_02).relative_time: The timestamp of the recorded data (e.g.,
1622547800).sensor_value: The value recorded by the sensor (e.g.,
22.5,55.2)
