1 year ago
#77306
SidC
How to Consistently Return 2.5% of a Result Set using NEWID() and Table Variable?
I'm using the NEWID() approach to obtain a sample of records from my population. I've created a table variable to define my population and I need to consistently return 2.5% of records from it. The query is as follows:
DECLARE @FirstofLastMonth date = DATEADD(DD,1,EOMONTH(Getdate(),-2)),
@EndofLastMonth date = EOMONTH(Getdate(), -1)
DECLARE @Sample table (ID uniqueidentifier PRIMARY KEY, ClaimNO varchar(50), Company_ID
varchar(10), FinancialResp varchar(30), ProviderType varchar(50),
DateOfService date, ClaimType varchar(20), TotalBilled numeric(10,2), TotalPaid numeric(10,2) )
Insert into @Sample (ID, ClaimNo,Company_ID, FinancialResp, ProviderType, DateOfService,
ClaimType, TotalBilled, TotalPaid)
Select distinct NEWID() as ID, cd.ClaimNo, cd.Company_ID, cd.FinancialResp,
CASE
when cd.FinancialResp in ('KEYMA', 'SHP', 'SFKD') and lob.LOB_Network = '1' then 'Contracted'
when cd.FinancialResp in ('KEYMA', 'SHP', 'SFKD') and lob.LOB_Network = '0' then 'Non-Contracted'
when cd.FinancialResp = 'KDMA' and pu.UDF_Value = 'Y' then 'Contracted'
when cd.FinancialResp = 'KDMA' and pu.UDF_Value <> 'Y' then 'Non-Contracted'
else 'Missing'
end as ProviderType,
Format( Min(cd.FromDateSvc), 'MM/dd/yyyy') as DateOfService,
CASE
when substring(cd.ClaimNo, 9,1) = '8' then 'Institutional'
when substring(cd.ClaimNO, 9,1) = '9' then 'Professional'
end as Claim_Type,
sum(cd.Billed) as TotalBilled,
SUM(cd.Net) as TotalPaid
from Claim_Details cd
inner join Claim_Masters_V cm on cd.ClaimNo = cm.ClaimNo and cd.Company_ID = cm.Company_ID
inner join Prov_VendInfo pv on cm.Prov_KeyID = pv.Prov_KeyID and cm.Company_ID = pv.Company_ID
inner join Vend_Masters vm on pv.Vendor = vm.VendorID
and pv.Company_ID = vm.Environment_ID
inner join Prov_Master_V pm on cm.Prov_KeyID = pm.Prov_KeyID
and cm.Company_ID = pm.Company_ID
and pv.Default_YN = 1
inner join BM_Prov_HP_LineOfBuss lob on cm.Prov_KeyID = lob.Prov_KeyID
and cm.Company_ID = lob.Company_ID
and cm.HPCode = lob.HPCode
and pv.Vendor = lob.Vendor
and lob.LOB_Start <= GETDATE()
left join Prov_UDF pu on cm.Prov_KeyID = pu.Prov_KeyID
and cm.Company_ID = pu.Company_ID
and pu.UDFNO = 10
where cm.HPCode = 'KD'
and cd.Status = '9'
and cd.FromDateSvc between @FirstofLastMonth and @EndofLastMonth
group by cd.ClaimNo, cd.FinancialResp, cd.Company_ID, pu.UDF_Value, lob.LOB_Network
--Select Count(*)
--from @Sample --3,517 records
select ClaimNo, FinancialResp, ProviderType, DateOfService, ClaimType, TotalBilled, TotalPaid from @Sample
WHERE 0.025 >= CAST(CHECKSUM(NEWID(),ID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
--We need to receive ~88 records each time above query is run.
When I run the above query, my results vary from between 80 and 100 records. The business case requires just 2.5% of qualifying records to be returned, no more and no less. How do I ensure that this is always the case, all things being equal?
UPDATE I've modified the approach a bit. I'm now loading all qualifying records into a table. I've refined the query which now returns the required 2.5% of the table population. However, each time I run the query, the results do not appear to be random samples in that the same RecordIDs are returned each time.
select TOP 2.5 PERCENT RecordID, ClaimNo, HPCode,
FinancialResponsibility, ProviderType, ProcessorType, DateOfService,
DatePaid, ClaimType, TotalBilled, TotalPaid
from [Z_Monthly_Quality_Review]
order by RecordId
The table definition is:
CREATE TABLE [dbo].[Z_Monthly_Quality_Review] (
[RecordId] UNIQUEIDENTIFIER DEFAULT (newid()) NOT NULL,
[ClaimNO] VARCHAR (50) NULL,
[Company_ID] VARCHAR (10) NULL,
[HPCode] VARCHAR (10) NULL,
[FinancialResponsibility] VARCHAR (30) NULL,
[ProviderType] VARCHAR (50) NULL,
[DateOfService] DATE NULL,
[DatePaid] DATE NULL,
[ClaimType] VARCHAR (50) NULL,
[TotalBilled] NUMERIC (11, 2) NULL,
[TotalPaid] NUMERIC (11, 2) NULL,
[ProcessorType] VARCHAR (100) NULL,
PRIMARY KEY CLUSTERED ([RecordId] ASC) WITH (FILLFACTOR = 90)
);
Should I modify the query or table structure in order to consistently return a 2.5% random sample (different RecordIDs each time its run) of the table data?
tsql
0 Answers
Your Answer