1 year ago
#72209
M. Phys
Performance tuning AWS Athena alternative query to using LIKE function
I'm querying 20GB of domain data with AWS Athena. I'm trying to get all IP values that match a domain. My query goes like this:
SELECT ip_value FROM domains
WHERE domain_name LIKE '%<some_domain>';
This is slow and scanning all records because of the wildcard at the start of the LIKE. I'm wondering what the best solution would be here?
I could try and do a substring operation to get the middle part of the domain (e.g. 'google' from 'www.google.com') but I'm not sure what the performance benefits would be here, or how simple this is with Athena.
One other solution I had was to create anither field in my data that would be the domain_name in reverse. I could then change the query to this:
SELECT ip_value FROM domains
WHERE reversed_domain_name LIKE '<some_domain>%';
Not sure if this is a good idea though...
amazon-web-services
query-optimization
sql-like
amazon-athena
presto
0 Answers
Your Answer