How to Create a Rough Cut from Interview Footage: Step-by-Step Guide
Jacinto Salz · CEO & Co-Founder · April 21, 2026
To create a rough cut from interview footage, follow a five-phase process: transcribe and review (1-2 hours), build a paper edit (30-60 minutes), pull selects (2-3 hours), assemble the narrative structure (2-4 hours), and do a pacing pass (1-2 hours). The total timeline for a typical corporate project with 2-3 hours of raw footage is 8-12 hours. AI tools can compress the first three phases to under 30 minutes, cutting the total to 3-5 hours.
I have been building rough cuts from interview footage for over a decade. This guide documents the process I use on every project, whether it is a 90-second testimonial or a 30-minute documentary short. The steps scale to any project size. The editorial logic stays the same.
Phase 1: Transcribe and Review the Raw Material
Before you open your NLE, you need to know what you have. This phase is about building a mental map of the interview content, not making edit decisions yet.
Start by running transcription on all interview footage. In 2026, built-in NLE transcription (Premiere Pro's speech-to-text, DaVinci Resolve's transcription panel) or standalone tools like Otter.ai produce usable transcripts in minutes. The accuracy is not perfect, but it does not need to be. You are using the transcript as a navigation tool, not a final deliverable.
Read the full transcript for each interview subject. As you read, highlight moments that jump out: strong quotes, emotional shifts, unexpected insights, anything that makes you pause. Do not worry about structure yet. You are collecting raw material.
While reading, note the timecodes of highlighted moments. Most transcription tools link text to timecodes, making this automatic. If you are working with a plain text transcript, note approximate timestamps as you go.
This phase typically takes 1-2 hours for a project with 2-3 hours of raw footage. If you skip transcription and go straight to scrubbing footage in the timeline, expect this phase to take 4-6 hours instead.
Phase 2: Build a Paper Edit
A paper edit is a written plan for your rough cut. It is the most underused tool in professional editing, and it is the single biggest time-saver available to you without any technology investment.
Open a document (Google Docs, Word, even a text file) and write a simple outline of the story you want to tell. For most interview projects, the structure follows a predictable pattern: introduce the subject or topic, develop the central theme through specific examples or anecdotes, build to an emotional or informational peak, and resolve with a forward-looking statement or reflection.
Under each section of your outline, paste the specific quotes from your transcript highlights that fit that section. Include the timecodes. You are essentially writing the script of your rough cut using the subject's own words.
This step forces you to make editorial decisions before you commit time to the timeline. Moving text in a document takes seconds. Moving clips in a timeline takes minutes. The paper edit lets you try three or four structural approaches in the time it would take to build one timeline.
For a standard corporate interview project, a paper edit takes 30-60 minutes. For a multi-subject documentary, it might take 2-3 hours. Either way, the time invested here compresses the assembly phase dramatically.
Phase 3: Pull Selects
With your paper edit complete, you know exactly which moments you need from the raw footage. Open your NLE and navigate directly to those timecodes.
Create a selects sequence (or timeline, or project, depending on your NLE terminology). Pull each clip referenced in your paper edit into this sequence, roughly in the order your paper edit specifies. Do not trim precisely yet. Give yourself handles (extra footage at the head and tail of each clip) so you have room to adjust during assembly.
As you pull selects, watch each clip rather than just jumping to the timecode and grabbing it. The paper edit told you what was said. Now you need to confirm how it was said. Sometimes a quote that reads beautifully in the transcript sounds flat on camera. Other times a quote you almost cut from the paper edit turns out to have delivery that elevates it. This is where editorial judgment meets the footage.
If you find the transcript-recommended take does not work, scrub forward and backward in the raw footage to find a better version. Subjects often express the same idea multiple times. The delivery varies. Your job is to find the version where the delivery serves the story.
For a 2-3 hour interview project, pulling selects based on a paper edit takes 2-3 hours. Without a paper edit, this phase takes 4-6 hours because you are making selection decisions and structural decisions simultaneously.
Phase 4: Assemble the Narrative Structure
With your selects sequence built, assembly becomes a process of arrangement rather than discovery. You have all your ingredients. Now you cook.
Work through your paper edit section by section. Drag clips from your selects into a new rough cut sequence in the order your paper edit specifies. At this stage, focus on narrative flow, not precision. Leave gaps where B-roll will go. Do not fine-tune audio transitions. You are building the skeleton of the story.
The key editorial decisions in this phase are about transitions between sections. How does the story move from the introduction to the development? Is there a natural bridge in the subject's language, or do you need a visual transition (B-roll, title card, music shift)? Where does the energy need to build, and where does it need to breathe?
This is the phase where your editorial voice shapes the piece. Two editors with the same selects and the same paper edit will produce different rough cuts because their narrative instincts differ. That is not a bug. That is editing.
Assembly from a pre-built selects sequence typically takes 2-4 hours for a standard corporate project. The range depends on how many structural decisions you need to make and how closely your paper edit mapped to the actual footage.
Phase 5: Pacing Pass
The first assembly will feel too long. It always does. The pacing pass is where you tighten.
Watch the entire rough cut from beginning to end without stopping. Note moments where your attention drifts, where a clip runs too long, where the energy sags between sections. These are your cut points.
Then go through systematically. Trim the heads and tails of clips. Remove any filler or repetition you missed during selects. Tighten transitions. The goal is not a final cut. The goal is a rough cut that holds attention from start to finish, even if the audio transitions are rough and the B-roll has not been placed.
A thorough pacing pass takes 1-2 hours. After this phase, you have a rough cut ready for review.
Where AI Fits Into This Process
The five-phase process I just described is the manual approach. It works. It has worked for decades. And it takes 8-12 hours for a typical project.
AI tools can compress the timeline significantly, but they do so at different points in the process.
Transcription AI (built into Premiere Pro, DaVinci Resolve, or standalone tools) compresses Phase 1 from hours to minutes. This is the most widely adopted AI application in editing today.
Transcript-based rough cut tools like Eddie AI and Descript can partially automate Phases 2 and 3 by searching for topics and generating initial assemblies. The output is a starting point that you reshape rather than building from scratch. As I wrote in our comparison of editing approaches, the limitation is that transcript-based selection misses delivery quality.
Prosodic analysis tools like Threadline Studio compress Phases 1 through 4 into a single step. The AI analyzes raw footage, evaluates speaker delivery quality (not just transcript content), and outputs a narrative-structured rough cut as an XML file for your NLE. You skip directly to Phase 5: the pacing pass and creative refinement. This is the approach we built because it mirrors how experienced editors actually think about footage, prioritizing delivery quality over keyword matching.
Regardless of which tools you use, understanding the manual process matters. AI generates better output when you understand what "good" looks like. The editor who has built hundreds of rough cuts manually will get more value from AI tools than the editor who skips straight to automation without learning the craft.
Frequently Asked Questions
How long does a rough cut take? A rough cut for a standard corporate interview project with 2-3 hours of raw footage takes 8-12 hours manually. AI-assisted workflows can reduce this to 3-5 hours. Fully AI-automated rough cuts (using prosodic analysis) can generate a structured first cut in under 30 minutes, though editorial refinement still takes 2-3 hours.
What is the difference between a rough cut and a fine cut? A rough cut contains all the narrative elements in the right order with approximate timing. A fine cut has precise pacing, polished audio transitions, B-roll placed, music integrated, and color correction applied. The rough cut is the story. The fine cut is the finished piece.
Should I transcribe before editing? Yes. Transcription lets you scan 2-3 hours of content in 30 minutes rather than scrubbing through footage for 4-6 hours. Even imperfect transcripts save significant time.
What is a paper edit? A paper edit is a written outline of your rough cut using specific quotes and timecodes from transcripts. It forces you to make structural decisions before committing to the timeline, saving hours of rearranging clips.
What NLE should I use for interview editing? Premiere Pro is the most widely used NLE for corporate and documentary interview editing. DaVinci Resolve offers comparable editing features with superior color grading. Final Cut Pro is preferred by some editors for its magnetic timeline. All three support XML import from AI rough cut tools like Threadline Studio.
How do I choose the best soundbite when a subject says something twice? Listen for delivery quality, not just content. The take with more dynamic pitch variation, natural pacing, and genuine vocal energy will always perform better on screen than the take with cleaner grammar. Read our guide on why the best take is never the cleanest transcript for more on this.
