Using Javascript to create an acquisition funnel dataset

Using Javascript to create an acquisition funnel dataset

Photo by Wim Arys via Unsplash

I work as a data analyst on the growth team at Echo, where a key goal is to improve signup conversion rates. We use acquisition funnel datasets to get visibility on where users are dropping off in the sign up process. To manually write all of the SQL queries needed for this would be time consuming and difficult to manage, but using a combination of SQL and Javascript makes this process much easier.

I use a platform called Dataform to do this. Dataform provides a place to write and save the queries, and runs them on schedules which publish up to date data every hour to a table in our warehouse (BigQuery). This table can then be queried for analyses and to create visualisations.

This post aims to explain the steps to create this dataset.

Step 1: Define the funnel steps

Firstly, we need to set up a data structure that defines each stage of the funnel.

In a file called funneldata.js, a list is created that will contain one entry for each funnel. Each funnel has a name, and a list of steps. Each step has a name, and a list of tracks that indicate the user has reached that stage of the funnel. In our case, we use specific frontend tracks that we get from Segment. Some steps have multiple tracks because there are multiple ways a user could reach that step — for example on the landing step, users can either reach it through landing on the homepage on web, or installing the app on mobile.

const funnels = [
     {
        name: "signup_funnel",
        steps: [
            {
                name: "landing",
                events: [
                  "application_installed",
                  "home_page"
                ]
            },
            {
                name: "start_signup",
                events: [
                  "signup_started"
                ]
            },
						{
                name: "verify_email",
                events: [
                  "email_verified"
                ]
            },
            {
                name: "signup_complete",
                events: [
                    "signup_complete"
                ]
            }
        ]
    }
]

module.exports = funnels;

Step 2: Create one table per step

Next we create another file called funnel_steps.js. This creates one table per funnel step, which has one row per user, recording when they first reached that step of the funnel.

  1. First we reference the funneldata.js file that we created in the previous step.

    const funnels = require('./funneldata.js');
  2. Then we use the forEach Javascript function to loop through each funnel - we only have one for now, but we may add more later - and then loop through each step in the funnel.

    funnels.forEach(funnel => {
        funnel.steps.forEach((step, i) => {
  3. For each step, we use the publish and config functions (Dataform-specific functions) to create a table in our data warehouse with a unique name (the funnel name & step name combined).

    publish(`${funnel.name}_${step.name}`).config({
                type: "table",
                schema: "funnel_data"
                }).query( 
    
                ctx => {
  4. We add a variable to the SQL query which means for every step after the first, the where clause will have an additional filter on the user id being in the previous step. This ensures that the funnel is sequential, i.e. users have to have hit the first step to be recorded in the second step, and so on. The ${ctx.ref( )} syntax is another Dataform-specific feature, which ensures that the previous step's table is published before the current table, to ensure data is up to date.

    let where = "" 
    if (i > 0) {
      let previousStep = `${funnel.name}_${funnel.steps[i-1].name}`
      where = `and user_id in (SELECT user_id from ${ctx.ref(previousStep)})` 
      }
  5. Lastly we write the SQL query, plugging in the variables. The bulk of the query is hard-coded as it’s the same for each funnel step. The variables are the parts that can change for each step - the step name, step number, the frontend tracks for the step that we defined in funneldata.js, and the optional additional part of the where clause.

    return `
                SELECT * EXCEPT(row_number) FROM (
                  SELECT
                    user_id, 
                    "${step.name}" as step_name,
                    ${i} as step_number,
                    timestamp as step_started_at,
                    row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) 
                    as row_number
                  FROM ${ctx.ref("segment")}
                  where lower(event) in ("${step.events.join('", "')}")
                  ${where}
                  GROUP BY 1,2,3,4
                  )
                where row_number = 1
            `
                }
            );
        });
    })

This is the SQL that the Javascript generates for the first step:

SELECT * EXCEPT(row_number) FROM (
  SELECT
    user_id, 
    "landing" as step_name,
    0 as step_number,
    timestamp as step_started_at,
    row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) 
    as row_number
  FROM `api.segment`
  WHERE lower(event) in ("application_installed", "home_page")
  GROUP BY 1,2,3,4
  )
where row_number = 1

This is an example of the SQL generated by the Javascript for the subsequent steps:

SELECT * EXCEPT(row_number) FROM (
	SELECT
	  user_id, 
	  "start_signup" as step_name,
	  1 as step_number,
	  timestamp as step_started_at,
	  row_number() OVER (PARTITION BY user_id ORDER BY timestamp asc) as row_number
	FROM `api.segment`
	where lower(event) in ("signup_started")
		and user_id in (SELECT user_id from `funnel_data.signup_funnel_landing`) 
	GROUP BY 1,2,3,4
	)
where row_number = 1

Step 3: Combine the data into one funnel table

Finally, we combine the data from each of the step tables into one table that summarises the funnel - this is the one we query for analyses. We do this in a file called funnel_combine.js.

Again, we start by referencing the funneldata.js file. Then we use the forEach function to loop through each funnel (we only have one for now). A select * statement for each step is created, and they're joined with union all. The publish and config functions are again used to create a table in the warehouse.

const funnels = require('./funneldata.js');

funnels.forEach(funnel => {
    publish(`${funnel.name}`, {
        type: "table",
        schema: "funnel_test"
    })
    .query(ctx => 
      funnel.steps
      .map(step => `select * from ${ctx.ref(`${funnel.name}_${step.name}`)}`)
      .join(`\n union all \n`)
    );
})

This is the SQL that the Javascript generates:

  select * from `funnels.signup_funnel_landing`
	  UNION ALL
  select * from `funnels.signup_funnel_start_signup`
	  UNION ALL
  select * from `funnels.signup_funnel_verify_email`
	  UNION ALL
  select * from `funnels.signup_funnel_signup_complete`

Running the Javascript in the final file publishes a table that looks like this to the data warehouse:

This dataset can be used to quickly get insights on activity during sign up, such as what % of users that start the funnel get to each step, and how long users spend on each step. Future funnels can be created easily too — all that needs to be done is to define the steps in the funneldata.js file, and to run the funnel_combine.js script. Anyone that wants to create a funnel dataset can do so now — no prior SQL experience is needed.

Photo by Wim Arys on Unsplash.

Any questions or feedback? Want to hear about new posts?
Follow us on @BuildingLloydsD
© 2024 Metabolic Healthcare Ltdlloydsdirect.co.uk